Abstract Algebra_ Structures and Applications ( PDFDrive )
Abstract Algebra_ Structures and Applications ( PDFDrive )
LOVETT
ABSTRACT ALGEBRA ABSTRACT ALGEBRA
STRUCTURES AND APPLICATIONS
STRUCTURES AND APPLICATIONS
Abstract Algebra: Structures and Applications helps you understand the abstraction of modern algebra. It
ABSTRACT ALGEBRA
emphasizes the more general concept of an algebraic structure while simultaneously covering applications.
• Definition of structure
• Motivation
• Examples
• General properties
• Important objects
• Description
• Subobjects
• Morphisms
• Subclasses
• Quotient objects
• Action structures
• Applications
The text uses the general concept of an algebraic structure as a unifying principle and introduces other
algebraic structures besides the three standard ones (groups, rings, and fields). Examples, exercises,
investigative projects, and entire sections illustrate how abstract algebra is applied to areas of science and
other branches of mathematics.
Features
• Emphasizes the general concept of an algebraic structure as a unifying principle instead of just focusing
on groups, rings, and fields
• Describes the application of algebra in numerous fields, such as cryptography and geometry
• Includes brief introductions to other branches of algebra that encourage you to investigate further
• Provides standard exercises as well as project ideas that challenge you to write investigative or
expository mathematical papers
• Contains many examples that illustrate useful strategies for solving the exercises
STEPHEN LOVETT
K23698
w w w. c rc p r e s s . c o m
ABSTRACT ALGEBRA
STRUCTURES AND APPLICATIONS
ABSTRACT ALGEBRA
STRUCTURES AND APPLICATIONS
STEPHEN LOVETT
Wheaton College
Wheaton, Illinois, USA
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2016 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and
information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission
to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic,
mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or
retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact
the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides
licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment
has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation
without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Contents
Preface ix
1 Set Theory 1
1.1 Sets and Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The Cartesian Product; Operations; Relations . . . . . . . . . . . . . . . . . . . . . 14
1.3 Equivalence Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4 Partial Orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.5 Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2 Number Theory 43
2.1 Basic Properties of Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.2 Modular Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.3 Mathematical Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.4 Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3 Groups 69
3.1 Symmetries of the Regular n-gon . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.2 Introduction to Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.3 Properties of Group Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.4 Symmetric Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.5 Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
3.6 Lattice of Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
3.7 Group Homomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
3.8 Group Presentations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
3.9 Groups in Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
3.10 Diffie-Hellman Public Key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
3.11 Semigroups and Monoids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
3.12 Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
5 Rings 207
5.1 Introduction to Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
5.2 Rings Generated by Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
5.3 Matrix Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
5.4 Ring Homomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
5.5 Ideals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
5.6 Quotient Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
5.7 Maximal Ideals and Prime Ideals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
5.8 Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
v
vi CONTENTS
13 Categories 675
13.1 Introduction to Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675
13.2 Functors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 682
A Appendices 689
A.1 The Algebra of Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . 689
A.2 Lists of Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 691
Bibliography 699
Index 703
Preface
Organizing Principles
Algebraic Structure. Many abstract algebra textbooks focus on three specific algebraic struc-
tures: groups, rings, and fields. These particular structures have indeed played important roles
throughout mathematics and arguably deserve considerable attention. However, this book empha-
sizes the general concept of an algebraic structure as a unifying principle. Therefore, we present the
core topics of structures following a consistent order and deliberately introduce the reader to other
algebraic structures besides these standard three.
When studying a given algebraic structure, we follow this outline of topics:
ix
x PREFACE
• Examples—What are some examples that demonstrate the scope and restrictions of a struc-
ture’s definition?
• General Properties—What can we prove about all objects with a given structure?
• Important Objects—Are there some objects in this structure that are singularly important?
• Subobjects—What can be said generally about the internal structure of a given object?
• Subclasses—What are some interesting subclasses of the structure that we can obtain by
imposing additional conditions?
• Quotient Objects—Under what conditions do equivalence classes behave nicely with respect
to the structure?
For convenience in the rest of the text, we will often refer to this list simply as “the Outline.” With
a given structure, some of these topics may be brief and others lead to considerable investigation.
Consequently, we do not give equal relative weight to these topics when studying various algebraic
structures.
Algebraists may dislike the expression “algebraic structure” as it is not a well-defined mathe-
matical term. Nonetheless, we use this term loosely until, in Chapter 13, we finally make the idea
of algebraic structure rigorous by introducing categories.
Applications. The second guiding principle of this book is application of algebra. Examples,
exercises, investigative projects, and whole sections, illustrate how abstract algebra is applied to
other branches of mathematics or to areas of science. In addition, this textbook offers a few sections
whose titles begin with “A Brief Introduction to...” These sections are just the trailhead for a whole
branch of algebra and are intended to whet the student’s appetite for further investigations and
study.
A Note to Instructors
Though covering groups, rings, and fields in detail, this textbook emphasizes the more general
concept of an algebraic structure while simultaneously keeping an eye on applications. The style
deliberately acknowledges the process of discovery, making this book suited for self-study.
This book is designed so that full coverage of all the sections will fill a two-semester sequence, with
the semester split occurring between Chapters 7 and 8. However, it can be used for a one-semester
introductory course in abstract algebra with many possible variations.
There are a variety of pathways to work through this textbook. Some colleges require a robust
discrete mathematics background or transition course before abstract algebra. In this case, Chap-
ters 1 and 2, which cover some basic set theory and a few motivating number theory concepts, might
serve as a review or could be skipped entirely. Some application sections or topic sections are not
essential for the development of later theory.
Each section was written with the intent to fill a one-hour lecture. Occasionally, some subsections
carry the label (Optional). These optional subsections are not needed for further theory outside
that section but offer additional perspective. In the dependency chart below, sections in rectangles
represent core theory and build on each other within the boxes. Sections in ellipses are application
or “brief introduction” sections and can generally be done in any order within the ellipse.
xi
1.1–1.4
2.1–2.3
3.1–3.8
4.1–4.5 3.9–3.11
8.1–8.5 5.1–4.7
6.6–6.7
7.1, 7.2, 7.5–7.7
10.1–10.7
7.3, 7.4
10.8–10.11 12.1–12.5 11.1–11.8
12.6
13.1–13.2
A Note to Students
From a student’s perspective, one of the biggest challenges to modern algebra is its abstraction.
A student taking a first course in modern algebra quickly discovers that most of the exercises are
proofs. Calculus, linear algebra, and differential equations can be taught from many perspectives but
often a preponderance of exercises simply require the student to carefully follow a certain prescribed
algorithm. In algebra, a student does not typically learn many algorithms to solve a specific range
of problems. Instead, he or she is expected to prove new results using the theorems presented in
the text. By doing exercises, the student becomes an active participant in the development of the
field. This common aspect of algebraic textbooks is very valuable because it trains the student in
the methods of mathematical investigation. In this textbook, however, for many exercises (though
not all) the student will find a similar example in the section that will illustrate a useful strategy.
The text includes many properties of the objects we study. However, this does not mean that
everything that is interesting or even useful for some further result is proved or even mentioned in the
text. If every interesting fact were proved in the text, this book would swell to unwieldy proportions
and regularly digress from a coherent presentation. Consequently, to get a full experience of the
material, the reader is encouraged to peruse the exercises in order to absorb many consequences of
the theory.
a specific command is listed or whether a library of commands is given, the reader should visit the
CAS’s help page for specific examples on how to use that given command or to see what functions
are in the library.
Investigative Projects
Another feature of this book are the investigative projects. In addition to regular exercises, at the
end of most chapters, there is a list of ideas for investigative projects. The idea of assigning projects
stems from a pedagogical experiment to challenge all students to write investigative or expository
mathematical papers in undergraduate classes. As a paper, the projects should be (1) Clear: Use
proper prose, follow the structure of a paper and provide proper references; (2) Correct: Proofs
and calculations must be accurate; (3) Complete: Address all the questions or questions one should
naturally address associated to the investigation; (4) Creative: Evidence creative problem-solving
or question-asking skills.
These project ideas stand as guidelines. A reader who tackles one is encouraged to add his or her
own investigations. While some questions associated to a project idea are precise and lead to well-
defined answers, other questions are left vague on purpose, might not have clear cut solutions, or lead
to open-ended problems. Some questions require proofs while others may benefit from technology:
calculator work, a computer program, or a computer algebra system.
The ideas in some projects are known and have been developed in articles, books, or online
resources. Consequently, if the investigative project is an assignment, then the student should
generally not consult online resources besides the ones allowed by the project description. Otherwise,
the project ideas may offer topics for further reading.
Habits of Notation
This book uses without explanation the logic quantifiers ∀, to be read as “for all,” ∃, to be read as
“there exists,” and ∃!, to be read as “there exists a unique.” We also regularly use =⇒ for logical
implication and ⇐⇒ logical equivalence. More precisely, if P (x, y, . . .) is a predicate with some
variables and Q(x, y, . . .) is another predicate using the same variables, then
P (x, y, . . .) =⇒ Q(x, y, . . .) means ∀x∀y . . . P (x, y, . . .) −→ Q(x, y, . . .)
and
P (x, y, . . .) ⇐⇒ Q(x, y, . . .) means ∀x∀y . . . P (x, y, . . .) ←→ Q(x, y, . . .) .
As another habit of language, this textbook is careful to always and only use the expression “As-
sume [hypothesis]” as the beginning of a proof by contradiction. Like so, the reader can know ahead
of time that whenever she sees this expression, the hypothesis will eventually lead to a contradiction.
Acknowledgments
I must first thank the mathematics majors at Wheaton College (IL) who served for many years as the
test environment for many topics, exercises, and projects. I am indebted to Wheaton College (IL)
for the funding provided through the Aldeen Grant that contributed to portions of this textbook.
I especially thank the students who offered specific feedback on the draft versions of this book,
in particular Kelly McBride, Roland Hesse, and David Garringer. Joel Stapleton, Caleb DeMoss,
Daniel Bradley, and Jeffrey Burge deserve special gratitude for working on the solutions manual
to the textbook. I also must thank Justin Brown for test running the book and offering valuable
feedback. I also thank Africa Nazarene University for hosting my sabbatical, during which I wrote
a major portion of this textbook. Finally, I must absolutely thank the publishing and editing team
at Taylor & Francis for their work in making this project become a reality.
1. Set Theory
1.1
Sets and Functions
In mathematics, the concept of a set makes precise the notion of a collection of things. As broad as
this concept appears, it is foundational for modern mathematics.
1.1.1 – Sets
Definition 1.1.1
(1) A set is a collection of objects for which there is a clear rule to determine whether an
object is included or excluded.
(2) An object in a set is called an element of that set. We write x ∈ A to mean “the
element x is an element of the set A.” We write x ∈
/ A if x is not an element of A.
1
2 CHAPTER 1. SET THEORY
In contrast, the people listed as “Friends” or “Contacts” on someone’s preferred social networking
site does form a set. As another nonexample of a set, consider the collection of all chairs. Whether
this is a set is debatable. Indeed, by some artistic or functional failure, a piece of furniture may not
be comfortable enough to sit on. Furthermore, should we consider a rock beside a hiking trail as a
chair if we happen to sit on it?
Some discussion in logic is appropriate here. Set theory based on this idea of a “clear rule” is
called naive set theory [33]. The idea of a clear rule in set theory is as precise as Boolean logic,
which calls a proposition any statement for which there a clear rule to decide whether it is true or
false. However, like Russell’s Paradox in logic (e.g., consider the truth value of the statement “This
sentence is false.”), naive set theory ultimately can lead to contradictions. For example, if S is the
set of all sets that do not contain themselves, does S contain itself? The Zermelo-Fraenkel axioms
of set theory, denoted ZF, offer more technical foundations and avoid these contradictions. (See [47]
for a presentation of set theory with ZF. See [25] for a philosophical discussion of ZF axioms.)
The most widely utilized form of set theory adds one axiom to the standard ZF, the so-called
Axiom of Choice, and the resulting set of axioms is denoted by ZFC. Occasionally, certain theorems
emphasize when their proofs directly utilize the Axiom of Choice. The reason for this is primarily
historical. In the context of ZF, the Axiom of Choice implies many statements that seem down-
right obvious and others that feel counterintuitive. Consequently, there is a habit in mathematical
literature to make clear when a certain result (and all results that use it as a hypothesis) rely on
the Axiom of Choice.
A thorough treatment of axiomatic set theory would detract from an introduction to abstract
algebra. Naive set theory will suffice for our purposes. Whenever we need a technical aspect of set
theory, we provide appropriate references. The interested reader is encouraged to consult [21, 39, 62]
for a deeper treatment of set theory.
Some sets occur so frequently in mathematics that they have their own standard notation. Here
are a few:
• Sometimes we use modifiers to the above sets. For example, R+ denotes the set of nonnegative
reals and R<0 denotes the set negative (strictly) reals.
• A modifier we use consistently in this book is N∗ , Z∗ , etc. to stand for the given number set
excluding 0. In particular, N∗ denotes the set of positive integers.
– [a, b] denotes the closed interval of real numbers between a and b inclusive.
– [a, b) is the interval or reals between a and b, including a but not b.
– [a, ∞) is the interval of all real numbers greater than or equal to a.
– Other self-explanatory combinations are possible such as (a, b); (a, b]; (a, ∞); (−∞, b];
and (−∞, b).
There are two common notations for defining sets. Both of them explicitly provide the clear rule
as to whether an object is in or out. However, in either case, the parentheses { marks the beginning
of the defining rule and } marks the end.
1.1. SETS AND FUNCTIONS 3
(1) List the elements. For example, writing S = {1, 3, 7} means that the set S is comprised of the
three integers 1, 3, and 7. It is important to note that in this notation, order does not matter
and we do not list numbers more than once. We only care about whether a certain object is in
or not; we don’t care about order or repetitions. (It is important to note that what we write in
the list is merely a signifier that points to the actual object. Hence, the symbol 1 is pointing
to the mathematical object of “one.” Similarly, I may write F = {AL, CL, SL} as a set of
three elements that describes my family where the symbols AL, CL, and SL are pointers to
the actual objects in the set, namely my daughter, my wife, and myself.)
means the set of all x such that x is a rational number such that x2 < 2. Since we already
have a set label for the rational numbers, we will usually rewrite this more concisely as
{x ∈ Q | x2 < 2}
and read it as, “the set of rational numbers x such that x2 < 2.” An alternate notation for
this construction is {x ∈ Q : x2 < 2}.
Two sets A and B are considered equal when x ∈ A ⇐⇒ x ∈ B, or in other words, they have
exactly the same elements. We write A = B to denote set equality.
Definition 1.1.2
A set A is called a subset of a set S if x ∈ A =⇒ x ∈ S. In other words, every element of
A is an element of S. We write A ⊆ S.
The symbol ⊆ should remind the reader of the symbol ≤ on the real numbers. This similarity
of notation might inspire us to assume that A ⊂ B would, like the strict inequality symbol <, mean
A ⊆ B and A 6= B. Unfortunately, by a fluke of historical inconsistency in notation, some authors
do use the ⊂ symbol to mean a strict subset, while others use it synonymously with ⊆. To remove
confusion, we use the symbol A ( B to mean A ⊂ B and A 6= B. The symbol A * B means that A
is not a subset of B.
Example 1.1.3. Let C 0 ([2, 5]) denote the set of continuous real-valued functions on the interval
[2, 5] and let C 1 ([2, 5]) denote the set of differentiable functions whose derivative is continuous on
the interval [2, 5]. The statement that
follows from the nontrivial result in analysis that if a function is differentiable over a closed interval,
it is continuous over that interval. 4
There are a few basic operations on subsets of a given set S. In the following list, we define
operations on subsets A and B of S and provide corresponding Venn diagrams, in which the shaded
portion illustrates the result of the operation.
A B
A B
• The complement of A is A = {x ∈ S | x ∈
/ A}.
A B
A B
Example 1.1.4. Let U = {1, 2, . . . , 10} and consider the subsets A = {1, 3, 6, 7}, B = {1, 5, 6, 8, 9},
and C = {2, 3, 4, 5, 6}. We calculate the following examples of operations.
(1) A ∩ B = {1, 6}.
(2) B − (A ∪ C) = B − {1, 2, 3, 4, 5, 6, 7} = {8, 9}.
(3) B ∪ C = {1, 2, 3, 4, 5, 6, 8, 9} = {7, 10}.
(4) (A ∩ B) ∩ C = {1, 6} ∩ C = {6}.
(5) A ∩ (B ∩ C) = A ∩ {5, 6} = {6}. 4
Example 1.1.5. Let A = {x ∈ Z | 4 divides x} and let B = {x ∈ Z | 6 divides x}. The set A ∩ B
consists of all integers that are divisible by 4 and by 6. As we will be reminded in Section 2.1.4, an
integer is divisible by 4 and 6 if and only if it is divisible by the least common multiple of 4 and 6,
written lcm(4, 6) = 12. Hence,
A ∩ B = {x ∈ Z | 12 divides x}
On the other hand, A ∪ B consists of all integers that are divisible by 4 or divisible by 6. 4
Set operations offer concise ways to describe many common properties of sets. The following
definition illustrates this.
Definition 1.1.6
Let A and B be subsets of a set S. We say that A and B are disjoint if A ∩ B = ∅.
The set difference provides a standard notation for all elements of a set except for a few specified
elements. For example, if S is the set of all real numbers except 1 and 5, we write this succinctly as
S = R − {1, 5}.
Proposition 1.1.7
For sets A and B, A = B if and only if A ⊆ B and B ⊆ A.
There are many properties of how set operations relate to each other. We will study these
properties in detail in Section 10.1, but we will not need them all for our purposes prior to then.
However, we present two comments about properties of set operations.
The associativity law for union (respectively intersection) states that for subsets A, B, C ⊆ S,
the following holds
1
1 all fractions n where n is a positive integer are positive, then 0 is not an element of any interval
Since
n , 1 . We conclude that the left-hand side is a subset of the set
on the right-hand side. However,
for any positive real number ε with 0 < ε ≤ 1, setting n = 1ε (i.e., n is the least integer greater
than or equal to 1ε ), we have n ≥ 1ε so
1
ε∈ ,1 .
n
Thus, the set on the right-hand side is a subset of the set on the left-hand side, so the two sets are
equal.
As a second comment, we illustrate a basic abstract proof concerning relations between set
operations with the following proposition.
Proposition 1.1.8
Let A and B be subsets of a context set S. Then A − B = A ∩ B.
We conclude that A − B = A ∩ B.
Definition 1.1.9
If S is a set, the power set of S, denoted by P(S) is the set of all subsets of S.
Note that A ∈ P(S) is equivalent to writing A ⊆ S. Furthermore, taking the power set of a set
is one way to obtain a new set for a previous one.
Example 1.1.10. Let S = {1, 2, 3}. Then the power set of S is
P(S) = {∅, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}} .
Note that P(S) has eight elements. 4
The terminology of “power” set comes from the following proposition.
Proposition 1.1.11
Let n be a nonnegative integer. If S is a set with n elements, then P(S) has 2n elements.
1.1. SETS AND FUNCTIONS 7
Proof. First consider the case n = 0. This means that S = ∅. Then P(S) = {∅} and hence contains
1 = 20 element. Thus, the proposition holds for n = 0.
Write S as S = {s1 , s2 , . . . , sn } for some positive integer n. For every subset, we can ask n
independent questions: whether s1 ∈ A, whether s2 ∈ A, and so forth for all elements of S. Each
question has two possible answers, each of which is independent of the others. Thus, there are
ntimes
z }| {
2 × 2 × · · · × 2 = 2n
subsets of S.
Proposition 1.1.12
Let n be a nonnegative integer and let S be a set with n elements. The number of subsets
with k elements is
n n!
= .
k k! (n − k)!
To count the number of subsets of size k, we must count how many ways we can select (unordered)
k elements from S. This unordered selection corresponds uniquely to a subset. If we select k elements
in order without repetition, we have
n!
n × (n − 1) × · · · × (n − k + 1) =
(n − k)!
choices of how to do this. On the other hand, with a given set of k elements, there are k! ways of
ordering them. Hence,
n n!
k! =
k (n − k)!
and the result follows.
1.1.3 – Functions
Definition 1.1.13
Let A and B be two sets. A function f from A to B, written f : A → B, is an association
that to each element a ∈ A, associates exactly one element b ∈ B. We write b = f (a) if b
is the associate of a via f . The set A is called the domain of f and the set B is called the
codomain.
Functions are ubiquitous in and outside of mathematics. Functions model the mental habit of
uniquely associating various quantities or options to objects in a set. For example, if V is the set
of motor vehicles registered in the United States, then the concept of mileage (at a given date) is
a function f : V → N. At a given point in time a car has a unique number associated to it that
describes mileage.
As another example, if P is the set of people (living or who have passed away), the concept of
biological mother can be represented as a function m : P → P such that m(a) = b means that the
person b is a’s biological mother. The concept of brother is not a function because some people may
have more than one brother.
For a function f : A → B, it is not uncommon to say that f maps the element a to b. Also, the
function is sometimes called a mapping, or more briefly, map from A to B.
Starting in precalculus, we study functions of the form f : I → R, where I is some interval of
real numbers. Sequences of real numbers are also functions f : N → R, though by a historical habit,
we often write terms of a sequence as fn instead of f (n).
8 CHAPTER 1. SET THEORY
Definition 1.1.14
Let A, B, and C be sets, and let f : A → B and g : B → C be two functions. The
composition of g with f is the function g ◦ f : A → C defined by for all x ∈ A,
Function composition arises in a significant manner in linear algebra when considering the com-
position of two linear transformations. Let S : Rm → Rn and T : Rn → Rp be linear transformations
with A and B their respective associated matrices with respect to the standard bases. It is not hard
to show that the composition T ◦ S : Rm → Rp is a linear transformation. Furthermore, the matrix
multiplication is defined as it is precisely so that the matrix product BA is the associated matrix to
T ◦ S with respect to the standard basis.
The following proposition establishes an identity about iterated composition of functions. Though
simple, it undergirds desired algebraic properties for many later situations.
Proposition 1.1.15
Let f : A → B, g : B → C, and h : C → D be functions. Then
h ◦ (g ◦ f ) = (h ◦ g) ◦ f.
Since the functions are equal on all elements of A, the functions are equal.
Definition 1.1.16
We say that a function f : A → B is
(1) injective (one-to-one) if f (a1 ) = f (a2 ) =⇒ a1 = a2 .
(2) surjective (onto) if for all b ∈ B, there exists a ∈ A such that f (a) = b.
(3) bijective (one-to-one and onto), if it is both.
The contrapositive of the definition offers an alternative way to understand injectivity, namely
that a1 6= a2 =⇒ f (a1 ) 6= f (a2 ).
Example 1.1.17. As an example, we prove whether the following functions are injective or surjec-
tive.
(1) Consider f : R − {0} → R with f (x) = 1 + x1 . For injectivity, we solve
1 1 1 1 x1 x2 x1 x2
f (x1 ) = f (x2 ) =⇒ 1 + =1+ =⇒ = =⇒ = =⇒ x2 = x1 .
x1 x2 x1 x2 x1 x2
To prove surjectivity, given any real y, we need to solve f (x) = y for x. We have
1 1
f (x) = y =⇒ 1 + = y =⇒ = y − 1 =⇒ x(y − 1) = 1.
x x
This last equation has no solutions in x if y = 1. Hence, f is not surjective.
1.1. SETS AND FUNCTIONS 9
g(x1 ) = g(x2 ) =⇒ x21 + 2x1 = x22 + 2x2 =⇒ x21 − x22 + 2x1 − 2x2 = 0
=⇒ (x1 − x2 )(x1 + x2 ) + 2(x1 − x2 ) = 0 =⇒ (x1 − x2 )(x1 + x2 + 2) = 0
=⇒ x1 = x2 or x1 = −x2 − 2.
Since g(x1 ) = g(x2 ) does not imply x1 = x2 , then the function is not injective. Of course, to
disprove a universal statement, it suffices to find a counterexample. Remarking that g(1) =
3 = g(−3) shows that g in not injective. The function g is surjective if for all y there exists x
with g(x) = y. However,
x2 + 2x = y =⇒ (x + 1)2 − 1 = y =⇒ (x + 1)2 = y + 1.
By the definition of infinite limits, there exists x1 and x2 such that h(x1 ) < y and h(x2 ) > y.
By the Intermediate Value Theorem, there exists a real number x0 with x1 < x0 < x2 and
h(x0 ) = y0 . This proves surjectivity. Since h is both injective and surjective, it is bijective. 4
The last case above gives an example of a bijection. When a function f : A → B is a bijection,
then for every b ∈ B, there is an a ∈ A that is uniquely associated to it via the relationship b = f (a).
Hence, the association from B to A is also a function.
Definition 1.1.18
Let f : A → B be a bijective function between two sets. The inverse function, denoted
f −1 , is the function f −1 : B → A such that f −1 (b) = a if and only if f (a) = b.
The bijection
√ f (x) = x3 in Example 1.1.17(3) has the inverse function f −1 : R → R with
f −1 (x) = 3 x. It is important to keep in contrast the function g(x) = x2 . The function g is a
function but not a bijection R → R. When we restrict the domain and codomain to h : R≥0 → R √
≥0
with h(x) = x2 , then h is a bijection with inverse function h−1 : R≥0 → R≥0 given by h−1 (x) = x.
It is not uncommon to consider how a function behaves on a subset of elements in the domain.
If f : A → B is a function and S ⊆ A, we regularly use the following shorthand notation:
def
f (S) = {f (s) | s ∈ S} = {b ∈ B | ∃s ∈ S, f (s) = b}. (1.3)
This is the image set of S under f . We also define the restriction of f to S, and denote it by f |S ,
the function f |S : S → B such that f |S (x) = f (x) for all x ∈ S.
In the special case when S = A, the subset f (A) of B is called the range of f . Note that a
function is surjective if and only if its range is equal to its codomain.
10 CHAPTER 1. SET THEORY
def
f −1 (T ) = {a ∈ A | f (a) ∈ T }. (1.4)
This set is called the pre-image of T by f . It is essential to note that using this latter notation does
not presume that f is bijective; the definition in (1.4) is a matter of notation. If T = {b}, consisting
of a single element b ∈ B, then f −1 ({b}) is called the fiber of b.
For example, if f : R → R is defined by f (x) = sin x, then the fiber of 2 is f −1 ({2}) = ∅, and
the fiber of 21 is
nπ o 5π
−1
f ({1/2}) = + 2πk | k ∈ Z ∪ + 2πk | k ∈ Z .
6 6
Remark 1.1.19. There are two common notations, Fun(A, B) and B A , for the set of all functions
from A to B. Exercise 1.1.31 provides a justification for the latter notation. 4
1.1.4 – Cardinality
We conclude this section with a brief discussion on a concept of size for sets. Interestingly enough,
properties of functions between sets are the key.
Definition 1.1.20
We say that two sets A and B have the same cardinality if there exists a bijection f : A → B.
We write |A| = |B|. If there does not exist a bijection between A and B, then we write
|A| =6 |B|. If there exists an injection f : A → B, then we write |A| ≤ |B|. If |A| ≤ |B| and
|A| =
6 |B|, then we write |A| < |B|.
Definition 1.1.21
A set A is called finite if there exists a bijection from A to {1, 2, . . . , n}, where n is a positive
integer. In this case, we write in shorthand |A| = n. If a set A is not finite, it is called
infinite.
Definition 1.1.22
An infinite set A is called countable if there exists a bijection f : A → N∗ . In this case,
we denote its cardinality by ℵ0 and write |A| = ℵ0 . If a set is not countable, it is called
uncountable.
The definition of the term “countable” models the mental process of listing all the elements
in the set A and labeling them as “first” (1), “second” (2), “third” (3), and so on. The function
f : N → N∗ defined by f (n) = n + 1 sets up a bijection between N and N∗ so N is countable. Often,
to show that some other sets are countable requires a little more creativity.
Proposition 1.1.23
The set of integers Z is countable.
1
Proof. Consider the function f : Z → N defined by f (n) = 2n + 2 . It is not hard to show that
(
m
1 2 if m is even
2n + ⇐⇒ n = m+1
2 − 2 if m is odd.
In symbols of =, ≤, and < are obviously reminiscent of inequalities over the integers. However,
we need to be careful not to use the analogy of these symbols and assume that something is true
simply by analogy. For example, the fact that |A| ≤ |B| and |B| ≤ |A| implies that |A| = |B| is the
Schröder-Bernstein Theorem [21, 13.10]. Consider also the Trichotomy Law , which states that for
any two sets A and B, then exactly one of the following statements is true:
The Trichotomy Law is equivalent to the Axiom of Choice. (See [55, p. 9].)
We mention without proof a few interesting results about cardinalities of sets. Proposition 1.1.23
shows that |N| = |Z|. It is not difficult to show that |N>0 | = |Q>0 | and from there conclude that
Q is countable. On the other hand, |N| < |R|. This proof follows by showing that R is in bijection
with P(N) and from Cantor’s Theorem.
As stated in the preface, this book emphasizes the notion of an algebraic structure. We point
out that sets along with functions between sets provide a first example of an algebraic structure.
We might consider the structure of sets as the simplest possible algebraic structure. Later alge-
braic structures involve sets with additional properties, operations, and relations on them. These
structures will come with their own interesting applications and properties.
The reader may notice that we have already covered some of the topics of interest outlined in the
paragraph on Organizing Principles in the preface. The remaining sections of this chapter illustrate
some interesting and essential topics in the context of the algebraic structure of sets.
1.2
The Cartesian Product; Operations; Relations
In Section 1.1, when we discussed the power set of a set, we mentioned that this was one way to
create a new set from a previous one. The Cartesian product is another way to create new sets from
old ones. More importantly, the Cartesian product of two sets gives a rigorous model to the mental
notion of pairing or ordering elements from sets.
Definition 1.2.1
Let A and B be sets. The Cartesian product of A and B, denoted A × B is the set that
consists of ordered pairs (a, b), where a ∈ A and b ∈ B. Hence,
def
A × B = {(a, b) | a ∈ A and b ∈ B}.
The Cartesian coordinate system motivates the concept of the Cartesian product. The notation R2
stands for ordered pairs of real numbers, which we regularly use to locate points in the Euclidean
plane (in reference to a set of axes). Similarly, R3 is the set of triples of real numbers and represents
Euclidean 3-space.
Example 1.2.2. Let W be the set of students registered at Wheaton College right now. Let C be
the set of classes offered (at Wheaton College, right now). The set W × C represents all possible
pairings of registered students with classes offered. 4
Example 1.2.3. Let A = {1, 2, 3} and let B = {e, f }. We write out the sets A × B, B × B, and
B × A explicitly:
The terminology of “product” in the name “Cartesian product” comes from the following fact.
Proposition 1.2.4
Let A and B be finite sets. Then A × B is also finite and
|A × B| = |A| · |B|.
1.2. THE CARTESIAN PRODUCT; OPERATIONS; RELATIONS 15
Proof. For any element a0 ∈ A, there exist |B| pairs in A × B of the form (a0 , b) with b ∈ B. There
are |A| distinct elements in A. Furthermore, (a1 , b1 ) = (a2 , b2 ) if and only if a1 = a2 and b1 = b2 .
Hence, all |A| · |B| pairs are distinct.
The Cartesian product of two sets or a finite collection of sets is of considerable importance
through set theory. Here below, we introduce a few concepts arising from the Cartesian product
that are essential for the rest of this book.
Definition 1.2.5
A binary operation on a set S is a function ? : S × S → S. We typically write a ? b instead
of ?(a, b) for the output of the binary operation on the pair (a, b) with a, b ∈ S.
The concept of a binary operation models a process by which any two objects in a set may be
combined to produce a specific other element in the set. This concept is ubiquitous in mathematics.
Some standard examples include +, −, and × on Z or R. Note that the division process ÷ is not a
binary operator on R because for example 2 ÷ 0 is not well-defined, while it is a binary operator on
R − {0}. Also ÷ is not a binary operator on Z − {0} because for example, 5 ÷ 2 is a rational number
but not again an element of Z − {0}.
As yet another nonexample, consider the dot product · of two vectors in R3 . This is not a binary
operation because it is a function R3 × R3 → R and the codomain is not again R3 .
As we will see throughout the book, many algebraic structures are defined as sets equipped with
some binary operations that satisfy certain properties. We list a few of the typical properties that
we consider later.
Definition 1.2.6
Let S be a set equipped with a binary operation ?. We say that the binary operation is
It is important to note the order of the quantifiers in definition (3) for the
√ identity element.
For example, let ? be the operation of geometric average on R>0 , i.e., a ? b = ab. Note that this
operation is commutative and idempotent. The operation of geometric average does not have an
identity because if we attempted to solve for b in a ? b = a, we would obtain b = a1 . However, in
order for the geometric average to have an identity, this element b could not depend on a.
Proposition 1.2.7
Let S be a set equipped with a binary operation ?. If ? has an identity element then ? has
a unique identity element.
Proof. Suppose that there exist two identity elements e1 and e2 in S. Since e1 is an identity element,
then e1 ? e2 = e2 . Since e2 is an identity element, then e1 ? e2 = e1 Thus, e1 = e2 . There do not
exist two distinct identity elements so S has a unique identity element.
Because of this proposition, we no longer say “an identity element” but “the identity element.”
16 CHAPTER 1. SET THEORY
Definition 1.2.8
Let S be a set equipped with a binary operation ? that has an identity e. The operation is
said to have inverses if ∀a ∈ S ∃b ∈ S, a ? b = b ? a = e.
For example, in R the operation + has inverses because for all a ∈ R, we have a + (−a) =
(−a) + a = 0. So (−a) is the (additive) inverse of a. In R∗ , the operation × also has inverses: For
all a ∈ R∗ , we have a × a1 = a1 × a = 1. So a1 is the (multiplicative) inverse of a.
Definition 1.2.9
Let S be a set equipped with two binary operations ? and ∗. We say that
(A ∩ B) ∩ C = A ∩ (B ∩ C)
A ∩ B = {x ∈ S | x ∈ A and x ∈ B} = {x ∈ S | x ∈ B and x ∈ A} = B ∩ A
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C).
(We leave the proof of this result as an exercise for the reader. See Exercise 1.2.17.) 4
where we choose the + in order for x to remain a positive number. Consequently, the binary
operation ? does not have an inverse.
We may sometimes call the left-inverse of a an element x such that x ? a is the identity and call
the right-inverse of a an element x such that a?x is the identity. In this example, the operation has a
left- and a right-inverse for all elements in R≥0 . However, since the left-inverse and the right-inverse
are not equal, then ? does not have inverses for any element. 4
1.2.3 – Relations
The everyday notion of a relationship between classes of objects is very general and somewhat
amorphous. Mathematics requires a concept as broad as that of a relation but with rigor. Cartesian
products offer a simple solution.
Definition 1.2.12
A relation from a set A to a set B is a subset R of A × B. A relation on a set A is a subset
of A2 . If (a, b) ∈ R, we often write a R b and say that a is related to b via R.
At first sight, this definition may appear strange. We typically think of a relation as some
statement about pairs of objects that is true or false. By gathering together all the true statements
about a relation into a subset of the Cartesian product, this definition gives the notion of a relation
(in mathematics) the same rigor as sets and as Boolean logic.
Example 1.2.13. Let W be the set of Wheaton College students registered now and C the set of
classes offered now. Let T be the relation of “taking classes.” T is a relation from W to C and we
write w T c if student w is taking class c. (A major function of the registrar is to keep track of T at
any given point in time.) 4
Example 1.2.14. Consider the relation ≤ on S = {1, 2, 3, 4, 5}. According to Definition 1.2.12, the
relation ≤ is the subset of S × S, given by
{(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (2, 2), (2, 3), (2, 4), (2, 5), (3, 3), (3, 4), (3, 5), (4, 4), (4, 5), (5, 5)}. 4
When we consider relations on reasonably small sets, we may depict them in a variety of ways.
We illustrate the following four descriptions with the same relation R from A = {1, 2, 3, 4, 5} to
B = {a, b, c}
R = {(1, b), (1, c), (2, a), (2, b), (4, a), (4, b), (4, c), (5, c)}.
Chart. In a chart with the columns labeled for the elements of A and the columns labeled with
the elements of B, mark a check in the box of column x and row y if x R y. The chart for our
running example is the following.
1 2 3 4 5
a x x
b x x x
c x x x
18 CHAPTER 1. SET THEORY
1 a
3 b
5 c
The relation t consists of the solution set. Setting x2 = 10−y and plugging into the second equation,
we have y 2 − y − 6 = 0, which has two roots −2 and 3. Referring to the first equation, we find that
√ √ √ √
t= {( 7, 3), (− 7, 3), ( 12, −2), (− 12, −2)}.
In this book, we introduced functions between sets before relations. Many presentations of set
theory reverse the order. An alternate definition of a function f from A to B is a relation satisfying
∀a ∈ A ∃!b ∈ B, af b. Function notation writes f (a) = b instead of a f b. We now see relations as a
generalization of functions. Furthermore, recasting our former definition of a function in this way
provides a rigorous definition as opposed to using the unclear term “association.”
Example 1.2.16. Let P be the set of people who are living and let E be the set of working email
accounts. Let R be the relation from P to E so that p R e stands for person p owns the email e.
Some people own multiple email accounts, so R could not be a function. R also fails to be a function
because people do not own any email accounts. Conversely, note that some email accounts are used
by more than one person so it would not be possible to create a function from E to P that accurately
described email ownership. 4
1.2. THE CARTESIAN PRODUCT; OPERATIONS; RELATIONS 19
Definition 1.2.17
Let A, B, and C be sets. Let R1 be a relation from A to B and let R2 be a relation from
B to C. The composite relation of R2 with R1 is the relation R = R2 ◦ R1 from A to C
such that
aRc ⇐⇒ ∃b ∈ B, a R1 b and b R2 c.
Definition 1.2.18
If R is a relation from A to B, then the inverse relation R−1 is the relation from B to A
such that
b R−1 a ⇐⇒ a R b.
Certain classes of relations from a set A to itself play important roles due to a combination of
specific properties. A relation from a set A to itself is called a relation on A. In the next section, we
introduce equivalence relations and partial orders, both of which are essential in abstract algebra.
However, we list here below some of the properties for relations on a set A that are often of particular
interest.
Definition 1.2.19
Let R be a relation on a set A. The relation R is called
(1) reflexive if ∀a ∈ A, a R a (Reflexivity);
(2) symmetric if a R b =⇒ b R a (Symmetry);
(3) antisymmetric if a R b and b R a =⇒ a = b (Antisymmetry);
Example 1.2.20. Let L be the set of lines in R2 . The notion of perpendicularity ⊥ is a relation
on L. It is a symmetric relation but it does not satisfy any of the other properties described in
Definition 1.2.19. 4
Example 1.2.21. Let B be the set of blood types. Encode the blood types by B = {o, a, b, ab}
and consider the donor relation →, such that t1 → t2 means (disregarding all other factors) someone
with blood type t1 can donate to someone with blood type t2 . As a subset of B 2 , the donor relation
is
{(o, o), (o, a), (o, b), (o, ab), (a, a), (a, ab), (b, b), (b, ab), (ab, ab)}.
It is not hard to check exhaustively or logically that → is reflexive, antisymmetric, and transitive
but not symmetric. 4
As an aside, consider the many nonalphabetical symbols that are used in mathematics. Many
of them either represent a relation or a binary operation. Some common relations symbols on real
numbers are =, ≤, ≥, <, >, and 6=. If S is any set, some common relation symbols on P (S) are ⊆,
(, *. Even the symbols ∈ and ∈ / represent relations from S to P (S).
When defining relations, it is common to create a symbol to signify a relation. When selecting a
symbol to represent a relation, we usually use one that is left-to-right or centrally symmetric (e.g.,
!, $, ) to represent a symmetric relation. Another standard for symbols is that if a symbol
represents a relation R, then the symbol with an angle slash through it represents the complement
relation R. For example 6= is the symbol not equals, a b is the symbol for the relation in which it
is not true that a ≤ b, and 5 is the complement of whatever the relation E represents.
20 CHAPTER 1. SET THEORY
For Exercises 1.2.8 through 1.2.16, determine (with proof or counterexample) if the binary operation is
associative, is commutative, has an identity, has inverses, and/or is idempotent.
8. The operation ∗ on vectors of Rn defined by ~ u ∗ ~v = proju~ ~v , i.e., projection of ~v onto ~
u.
9. The operation ? on the open interval [0, 1) described by a ? b = a + b − ba + bc where bxc is the greatest
integer less than or equal to x.
10. The operation 4 on nonnegative integers N defined by n4m = |m − n|.
11. The operation ~ on points in the plane R2 where A ~ B is the midpoint of A and B.
× on C defined by a+
12. The operation + ×b = a + b + ab.
13. The operation 4 on P(S), where S is any set.
14. The cross product on R3 .
15. The power operator a∧ b = ab on the set N∗ of positive integers.
16. The composition operator ◦ on the set F(A, A) of functions from a set A to A (where A is any set).
17. Prove that for all A, B, C ∈ P(S),
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C).
18. Let S be a set with a binary operation ∗. Assume that (a ∗ b) ∗ a = b for all a, b ∈ S. Prove that
a∗(b∗a) = b for all a, b ∈ S. [This exercise occurred as Problem A-1 in the 2001 Putnam Mathematical
Competition.]
19. Consider the operations a∧ b = ab and a × b on N∗ . Prove that ∧
is right-distributive over × but not
left-distributive over ×.
20. Let S be a finite set with |S| = n. How many binary operations exist on S?
21. Let S = {1, 2}. How many binary operations on S are associative?
22. Let A and B be finite sets. Find the number of distinct relations from A to B.
2
−n
23. Let A be a finite set with n elements. Prove that the number of reflexive relations on A is 2n and
that the number of symmetric relations on A is 2n(n+1)/2 .
For Exercises 1.2.24 through 1.2.30, determine (with proof ) which of the properties reflexivity, symmetry,
antisymmetry, and transitivity hold for each of the following relations.
24. For any set S, consider the relation G on P(S) defined by A G B to mean that A ∩ B 6= ∅.
25. The relation % on S the set of people defined by p1 % p2 if p1 is taller than or the same height as p2 .
26. The relation R on Z defined by nRm if n ≥ m2 .
27. The relation on S = R2 defined by (x1 , y1 ) (x2 , y2 ) to mean x21 + y12 ≤ x22 + y22 .
28. The relation $ on R defined by a $ b to mean ab = 0.
29. For any set S, consider the relation on P(S) defined by A B to mean that A ∪ B = S.
1.3. EQUIVALENCE RELATIONS 21
30. The relation on the set of pairs of points in the plane S = R2 × R2 defined by (P1 , Q1 ) (P2 , Q2 )
if the segment [P1 , P2 ] intersects [Q1 , Q2 ].
31. Let S be a set and let R be a relation on S. Prove that if a relation is reflexive, symmetric, and
antisymmetric, then it is the = relation on S.
32. Let P be the set of people who are living now. Let R be the relation on P defined by aRb if a and b
are in the same nuclear family, i.e., if a is a self, child, parent, sibling, or spouse of b.
(a) Decide whether R is reflexive, symmetric, antisymmetric, or transitive.
(b) List all the family relations included in R(2) = R ◦ R.
(c) Give four commonly used family terms for relations in R(3) = R ◦ R ◦ R though not in R(2) .
33. We can define the graph of a relation R from R to itself as the subset of R2
{(x, y) ∈ R2 | x R y}.
R = {(a, a), (a, c), (a, d), (b, c), (b, e), (c, b), (c, d), (e, a), (e, b)}.
Prove that the relation R is transitive if and only if R(n) ⊆ R for all n = 1, 2, 3, . . ..
36. Let R be a relation that is reflexive and transitive. Prove that Rn = R for all n ∈ N∗ .
1.3
Equivalence Relations
An equivalence relation is a generalization of the concept of equality. Intuitively speaking, an
equivalence relation mentally models a notion of sameness or similarity, that is to say that two
elements are in relation to each other if they are “the same from a certain perspective.” This
concept is ubiquitous throughout mathematics and occurs frequently in algebra.
Definition 1.3.1
An equivalence relation on a set S is a relation ∼ that is reflexive, symmetric, and transitive.
Example 1.3.2. Let S be any set. The equal relation is reflexive, symmetric, antisymmetric, and
transitive. In particular, = is an equivalence relation. Two elements are in relation via = if and
only if they are the same object. 4
Example 1.3.3. Let S be the set of lines in R3 . Consider the relation of parallelism, denoted k,
on S. This is reflexive, symmetric, and transitive and so is an equivalence relation on S. From an
intuitive perspective, all lines that are parallel have the same direction. 4
22 CHAPTER 1. SET THEORY
Example 1.3.4. Define C as intersections in Chicago and define the relation R on C to be “within
walking distance.” As stated, this is not well-defined so let us say that two intersections in Chicago
are within walking distance if and only if they are two miles or less apart. This relation is reflexive
and symmetric but not transitive. If three intersections a, b, and c lie successively in a straight line
with a and b two miles apart and b and c also two miles apart, then a and c are four miles apart.
This relation is not an equivalence relation. 4
Example 1.3.5. Let X = {1, 2, 3, . . . , 10} and consider S = P(X). For the following two relations,
we discuss whether they are equivalence relations.
(1) A ∼1 B if |A| = |B|. This is an equivalence relation. Two sets are equivalent to each other if
they have the same cardinality.
(2) A ∼2 B if 1 ∈ A ∩ B. The relation ∼2 is symmetric and transitive but is not reflexive. A set
that does not contain 1 is not in relation to itself. Sets that are in relation to each other in ∼2
have the similar property of all containing the element 1 but it feels natural to impose that in
any notion of sameness, an element should be “the same” as itself. 4
Definition 1.3.6
Let ∼ be an equivalence relation on a set S. For a ∈ S, the equivalence class of a is
def
[a] = {s ∈ S | s ∼ a}.
We sometimes write [a]∼ to clarify if a certain context considers more than one equivalence
relation at a time. A subset U ⊆ S is called an equivalence class if U = [a] for some a ∈ S.
An element a of an equivalence class U is called a representative of U .
Definition 1.3.7
Let S be a set and ∼ an equivalence relation on S. The set of ∼-equivalence classes on S
is called the quotient set of S by ∼ and is denoted by S/ ∼.
The quotient set S/ ∼ is a first example of what we called in the preface, a quotient object. As
we will see later at various points in the book, in a given algebraic structure, if we begin with one
object O that also has an equivalence relation ∼ such that O/ ∼ is again an object with the same
algebraic structure, we call O/ ∼ a quotient object. With a set S, any equivalence relation on S
leads to a quotient set. However, with other algebraic structures, not all equivalence relations are
such that the set of equivalence classes naturally produce another object with the same algebraic
structure. Therefore, the study of quotient objects will require some care.
We remark that there is a bijection between S/ ∼ and a complete set of representatives T of ∼
via the function
ψ : T → S/ ∼
a 7→ [a].
However, we do not consider these sets as equal since their objects are different.
1.3. EQUIVALENCE RELATIONS 23
Example 1.3.8. The notion of cardinality of subsets of a given set can now be described in the
following way. Fix a set S, not necessarily finite or even countable, and consider the equivalence
relation ∼ on P(S) by A ∼ B if and only if there exists a bijection between A and B. It is not hard
to check that this is an equivalence relation on P(S). We saw earlier that we always write |A| instead
of [A]∼ but now the notion of cardinality |A| is well-defined as an element in the set P(S)/ ∼, even
if A is not a finite set. 4
Example 1.3.9 (Projective Space). Let L(R3 ) be the set of lines in R3 and consider the equiva-
lence relation of parallelism k on L(R3 ). If L is a line in L(R3 ), then [L] is the set of all lines parallel
to L. The set L(R3 )/ k is called the projective plane and is denoted as RP2 . It consists of all the
directions lines can possess.
There are other ways to understand the projective plane. Every line in R3 is parallel to a unique
line through the origin. Possible direction vectors for lines through the origin consist simply of
nonzero vectors, so we consider the set R3 − {(0, 0, 0)}. Lines through the origin are the same if and
only if their given direction vector differs by a nonzero multiple. Hence, we define the equivalence
relation ∼ on R3 − {(0, 0, 0)} by
Our comments on direction vectors show that as sets (R3 − {(0, 0, 0)})/ ∼= RP2 . 4
Example 1.3.10 (Rational Numbers). Consider the set of pairs S = Z × Z∗ and the relation ∼
defined by
(a, b) ∼ (c, d) ⇐⇒ ad = bc.
We leave it to the reader to show that this is an equivalence relation. (See Exercise 1.3.20.) The
quotient set S/ ∼ is a rigorous definition for Q. The equivalence relation is precisely the condition
that is given when two fractions are considered equal. Hence, the fraction notation ab for rational
numbers represents the equivalence class [(a, b)]∼ . 4
Remark 1.3.11. When working with functions whose domains are quotient sets, it is natural to
wish to define a function of an equivalence class based on a representative of the class. More precisely,
if S and T are sets and ∼ and equivalence relation on S, we may wish to define a function
F : S/ ∼ −→ T
(1.5)
[a] −
7 → f (a)
where f : S → T is some function. This construction does not always produce a function. We say
that a function defined according to (1.5) is well-defined if whenever a ∼ b then f (a) = f (b). Thus,
using any representative of the equivalence of [a] will return the same value for F ([a]). 4
a
that this g is well-defined as follows. Suppose that b = dc . Then (a, b) ∼ (c, d) so ad = bc. Then
c2 + d2 b2 (c2 + d2 )
= since these two fractions satisfy ∼
cd b2 cd
a d + b2 d2
2 2
= because bc = ad
bad2
a + b2
2
= since the last two fractions satisfy ∼.
ba
Hence, the formula for g is independent of the choice of representative for the fraction ab .
The above discussion leads to another nice property about quotient sets. Let ∼ be an equivalence
relation on a set S and let p : S → S/ ∼ be the “projection” defined by p(a) = [a]. For any set X
and any function f : S → X such that a ∼ b implies that f (a) = f (b), there exists a unique function
def
f 0 : S/ ∼ → X such that f = f 0 ◦ p. This function is simply defined by f 0 ([a]) = f (a) and we saw
that since f (a) = f (b) whenever a ∼ b, this f 0 is well-defined. We often express the relationship
f = f 0 ◦ p by calling the following diagram of sets and functions commutative.
p
S S/ ∼
f0
f
X
1.3.3 – Partitions
Let ∼ be an equivalence relation on a set S. For any two elements in a, b ∈ S, by definition a ∈ [b] if
and only if a ∼ b. However, since ∼ is symmetric, this implies that b ∼ a and hence that b ∈ [a]. By
transitivity, if a ∈ [b] then s ∼ a implies that s ∼ b, so a ∈ [b] implies that [a] ⊆ [b]. Consequently,
we have proven that the following statements are logically equivalent:
Proposition 1.3.12
Let S be a set equipped with an equivalence relation ∼. Then
(1) distinct equivalence classes are disjoint;
Proof. Suppose that [a] ∩ [b] 6= ∅. Then there exists c ∈ [a] ∩ [b] so c ∈ [a] and c ∈ [b]. Hence,
[a] = [c] = [b]. Hence, if two equivalence classes overlap, then they are equal.
Let T be a complete set of representatives of ∼ in S. Obviously, since a ∈ [a], every element of
S is in some equivalence class. Thus, we have
[ [
S= [a] = [a]
a∈S a∈T
The property of equivalence classes described in Proposition 1.3.12 has a particular name in set
theory.
1.3. EQUIVALENCE RELATIONS 25
s5 s6
s8
s10
s2 s9
s3 s7
s4 S
s1
Definition 1.3.13
Let S be a set. A collection A = {Ai }i∈I of subsets of S is called a partition of S if
(1) Ai ∩ Aj 6= ∅ =⇒ i = j and
[
(2) Ai = S.
i∈I
Partitions of sets may be visualized by a diagram akin to Figure 1.1. In this figure, S is a set
with ten elements and the sets of the partition are {s1 , s2 , s3 }, {s4 }, {s5 , s6 }, and {s7 , s8 , s9 , s10 }. A
partition of S is a particular subset of P(S). A general subset of P(S) would consist of overlapping
subsets of S and possibly not cover all of S. Hence, a diagram like Figure 1.1 would not suffice to
visualize a general subset of P(S).
The concept of a partition simply models the mental construction of subdividing a set into parts
without losing any elements of the set and without any parts overlapping. Partitions and equivalence
relations are closely connected. Proposition 1.3.12 establishes that the set of distinct equivalence
classes of an equivalence relation on S forms a partition of S. The following proposition establishes
the converse.
Proposition 1.3.14
Let A = {Ai }i∈I be a partition of a set S. Define the relation ∼A on S by
a ∼A b =⇒ ∃i ∈ I with a ∈ Ai and b ∈ Ai .
Then ∼A is an equivalence relation. Furthermore, the sets in A are the distinct equivalence
classes of ∼A .
Example 1.3.15. Consider the set S = {1, 2, 3, 4, 5, 6, 7, 8}. The following are examples of parti-
tions on S:
{{1, 4, 5}, {2, 6, 7}, {3, 8}} , {{1, 3, 5, 7}, {2, 4, 6, 8}} , {{1, 2, 3, 4, 5, 6, 7, 8}} .
However, {{1, 3, 5}, {2, 8}, {4, 7}} is not a partition because the union of the subsets does not contain
6. On the other hand, {{1, 2, 3, 5}, {3, 6, 8}, {4, 6, 7}} is not a partition because some of the subsets
have nonempty intersections, namely {1, 2, 3, 5} ∩ {3, 6, 8} = {3} and {3, 6, 8} ∩ {4, 6, 7} = {6}. 4
Example 1.3.16. Consider the unit sphere in R3 , denoted by S2 . Consider the partition on S2
given by
A = {{p, −p} | p ∈ S2 }.
The partition A consists of pairs of points that are diametrically opposite each other. According
to Proposition 1.3.14, there exists a unique equivalence relation ∼1 on S2 that has A as a set of
distinct equivalence classes. Note that any line through the origin intersects S2 in two diametrically
opposite points. Hence, the quotient set S2 / ∼1 is equal to the projective space RP2 constructed in
Example 1.3.9. 4
For Exercises 1.3.3 through 1.3.15, prove or disprove whether the described relation is an equivalence relation.
If the relation is not an equivalence relation, determine which properties it lacks.
3. Let P be the set of living people. For all a, b ∈ P , define the relation a R b if a and b have met.
4. Let P be the set of living people. For all a, b ∈ P , define the relation a R b if a and b live in a common
town.
5. Let C be the set of circles in R2 and let R be the relation of concentric on C.
6. Let S = Z × Z and define the relation R on S by (m1 , m2 ) R (n1 , n2 ) if m1 m2 = n1 n2 .
7. Let S = Z × Z and define the relation R on S by (m1 , m2 ) R (n1 , n2 ) if m1 n1 = m2 n2 .
8. Let S = Z × Z and define the relation R on S by (m1 , m2 ) R (n1 , n2 ) if m1 n2 = m2 n1 .
9. Let P3 be the set of polynomials with real coefficients and of degree 3 or less. Define the relation R
on P3 by p(x) R q(x) to mean that q(x) − p(x) has 5 as a root.
10. Consider the set C 0 (R) of continuous functions over R. Define the relation R on C 0 (R) by f R g if
there exist some a, b ∈ R such that
11. Let Pfin (R) be the set of finite subsets of R and define the relation ∼ on Pfin (R) by A ∼ B if the sum
of elements in A is equal to the sum of elements in B. Prove that ∼ is an equivalence relation.
12. Let `∞ (R) be the set of sequences of real numbers. Define the relation R on `∞ (R) by (an ) R (bn ) if
lim (bn − an ) = 0.
n→∞
13. Let `∞ (R) be the set of sequences of real numbers. Define the relation R on `∞ (R) by (an ) R (bn ) if
the sequence (an + bn )∞
n=1 converges.
1.3. EQUIVALENCE RELATIONS 27
14. Let S be the set of lines in R2 and let R be the relation of perpendicular.
15. Let W be the words in the English language (i.e., have an entry in the Oxford English Dictionary).
Define the relation R on W by w1 R w2 is w1 comes before w2 in alphabetical order.
16. Let C 0 ([0, 1]) be the set of continuous real-valued functions on [0, 1]. Define the relation ∼ on C 0 ([0, 1])
by Z 1 Z 1
f ∼g ⇐⇒ f (x) dx = g(x) dx.
0 0
Show that ∼ is an equivalence relation and describe (with a precise rule) a complete set of distinct
representatives of ∼.
17. Let C ∞ (R) be the set of all real-value functions on R such that all its derivatives exist and are
continuous. Define the relation R on C ∞ (R) by f R g if f (n) (0) = g (n) (0) for all positive, even integers
n.
(a) Prove that R is an equivalence relation.
(b) Describe concisely all the elements in the equivalence class [sin x].
18. Let S = {1, 2, 3, 4} and the relation ∼ on P(S), defined by A ∼ B if and only if the sum of elements
in A is equal to the sum of elements in B, is an equivalence relation. List the equivalence classes of ∼.
19. Let T be the set of (nondegenerate) triangles in the plane.
(a) Prove that the relation ∼ of similarity on triangles in T is an equivalence relation.
(b) Concisely describe a complete set of distinct representatives of ∼.
20. Prove that the relation defined in Example 1.3.10 is an equivalence relation.
21. Let S = {1, 2, 3, 4, 5, 6}. For the partitions of S given below, write out the equivalence relation as a
subset of S × S.
(a) {{1, 2}, {3, 4}, {5, 6}}
(b) {{1}, {2}, {3, 4, 5, 6}}
(c) {{1, 2}, {3}, {4, 5}, {6}}
22. Let S = {a, b, c, d, e}. For the partitions of S given below, write out the equivalence relation as a
subset of S × S.
(a) {{a, d, e}, {b, c}}
(b) {{a}, {b}, {c}, {d}, {e}}
(c) {{a, b, d, e}, {c}}
23. Let C 1 ([a, b]) be the set of continuously differentiable functions on the interval [a, b]. Define the relation
∼ on C 1 ([a, b]) as f ∼ g if and only if f 0 (x) = g 0 (x) for all x ∈ (a, b). Prove that ∼ is an equivalence
relation on C 1 ([a, b]). Describe the elements in the equivalence class for a given f ∈ C 1 ([a, b]).
24. Let Mn×n (R) be the set of n × n matrices with real coefficients. For two matrices A, B ∈ Mn×n (R),
we say that B is similar to A if there exists an invertible n × n matrix S such that B = SAS −1 .
(a) Prove that similarity ∼ is an equivalence relation on Mn×n (R).
(b) Prove that the function f : Mn×n (R)/ ∼ → R defined by f ([A]) = det A is a well-defined function
on the quotient set Mn×n (R)/ ∼.
(c) Determine with a proof or counterexample whether the function g : Mn×n (R)/ ∼ → R defined
by g([A]) = Tr A, the trace of A, is a well-defined function.
25. Define the relation ∼ on R by a ∼ b if and only if b − a ∈ Q.
(a) Prove that all real x ∈ R, there exists y ∈ [x]∼ that is arbitrarily close to x. (In other words, for
all ε > 0, there exists y with y ∼ x and |x − y| < ε.)
(b) (*) Prove that ∼ has an uncountable number of equivalence classes.
26. Let R1 and R2 be equivalence relations on a set S. Determine (with a proof or counterexample) which
of the following relations are also equivalence classes on S. (a) R1 ∩ R2 ; (b) R1 ∪ R2 ; (c) R1 4R2 .
[Note that R1 ∪ R2 , and similarly for the others, is a relation as a subset of S × S.]
28 CHAPTER 1. SET THEORY
27. Which of the following collections of subsets of the integers form partitions? If it is not a partition,
explain which properties fail.
(a) {pZ | p is prime}, where kZ means all the multiples of k.
(b) {{3n, 3n + 1, 3n + 2} | n ∈ Z}.
(c) {{k | n2 ≤ k ≤ (n + 1)2 } |n ∈ N}.
(d) {{n, −n} | n ∈ N}.
28. Let S be a set. Prove that there is a bijection between the set of partitions of S and the set of
equivalence classes on S.
29. Call p(n) the number of equivalence relations (equivalently, by Exercise 1.3.28, partitions) on a set of
cardinality n. (The numbers p(n) are called the Bell numbers after the Scottish-born mathematician
E. T. Bell.)
(a) (*) Prove that p(0) = 1 and that for all n ≥ 1, p(n) satisfies the condition
n−1
!
X n−1
p(n) = p(n − j − 1) .
j=0
j
1.4
Partial Orders
1.4.1 – Partial Orders
Section 1.3.1 introduced equivalence relations as a generalization of the notion of equality. Equiva-
lence relations provide a mental model for calling certain objects in a set equivalent. Similarly, the
concept of a partial order generalizes the inequality ≤ on R to a mental model of ordering objects
in a set.
Definition 1.4.1
A partial order on a set S is a relation 4 that is reflexive, antisymmetric, and transitive. A
pair (S, 4), where S is a set and 4 is a partial order on S is often succinctly called a poset.
1.4. PARTIAL ORDERS 29
The name poset, which abbreviates “partially ordered set,” emphasizes the perspective that
posets form an algebraic structure. As a first example of a nontrivial structure, we point out that a
poset consists of a set equipped with a relation with certain specified properties. The data for many
other algebraic structures will resemble Definition 1.4.1.
Motivated by the notations for inequalities over R, we use the symbol ≺ to mean
x ≺ y ⇐⇒ x 4 y and x 6= y
Example 1.4.2. Consider the relation ≤ on R. For all x ∈ R, x ≤ x so ≤ is reflexive. For all
x, y ∈ R, if x ≤ y and y ≤ x, then x = y and hence ≤ is antisymmetric. It is also true that x ≤ y
and y ≤ z implies that x ≤ z and hence ≤ is transitive. Thus, the inequality ≤ on R is a partial
order. 4
Note that ≥ is also a partial order on R but that the strict inequalities < and > are not. The
inequality < is not reflexive though it is both antisymmetric and transitive. (< is antisymmetric
because there do not exist any x, y ∈ R such that x < y and y < x so the conditional statement
“x < y and y < x implies x = y” is trivially satisfied.)
Equivalence relations generalize = and hence loosen up some properties of =. In a similar way,
though modeled after the relation of ≤ on R, a partial order on a set S has a number of additional
possibilities. For example, in a general poset (S, 4), given two arbitrary elements a, b ∈ S, it is
possible that neither a 4 b nor b 4 a.
Definition 1.4.3
Let (S, 4) be a poset. If for some pair {a, b} of distinct elements, either a 4 b or b 4 a then
we say that a and b are comparable; otherwise a and b are called incomparable. A partial
order in which every pair of elements is comparable is called a total order .
The posets (N, ≤) and (R, ≤) are total orders. Many posets are not total orders as the following
examples illustrate.
Example 1.4.4. Consider the donor relation → defined on the set of blood types B = {o, a, b, ab}
as discussed in Example 1.2.21. We saw that → is reflexive, antisymmetric, and transitive. This
shows that (B, →) is a poset. Note that a and b are not comparable, meaning that neither can
donate to the other. 4
Example 1.4.5. Let S be any set. The subset relation ⊆ on P(S) is a partial order. In the partial
order, many pairs of subsets in S are incomparable. In fact, two subsets A and B are incomparable
if and only if A − B and B − A are both nonempty. 4
That (x1 , y1 ) 4 (x1 , y1 ) is built into the definition so 4 is reflexive. It is impossible for 2x1 − y1 <
2x2 − y2 and 2x2 − y2 ≤ 2x1 − y1 so the only way (x1 , y1 ) 4 (x2 , y2 ) and (x2 , y2 ) 4 (x1 , y1 ) can occur
is if (x1 , y1 ) = (x2 , y2 ). Finally, the relation is also transitive so that 4 is a partial order on R2 .
In this poset on R2 , two elements (x1 , y1 ) and (x2 , y2 ) are incomparable if and only if 2x2 − y2 =
2x1 − y1 and (x1 , y1 ) 6= (x2 , y2 ), namely they are distinct points on the same line of slope 2. 4
Besides the dichotomy between totally ordered posets and partial orders with incomparable
elements, there is another dichotomy that already appears when comparing properties of the posets
(N, ≤) and (R, ≤). In (R, ≤), given any x ≤ y with x 6= y, there always exists an element z such
that z 6= x and z 6= y with x ≤ z ≤ y. In contrast, in (N, ≤), for example 2 ≤ 3 but for all z ∈ N, if
2 ≤ z ≤ 3, then z = 2 or z = 3.
30 CHAPTER 1. SET THEORY
Definition 1.4.7
Let (S, 4) be a poset and let x ∈ S. We call y ∈ S an immediate successor (resp. immediate
predecessor ) to x if y 6= x with x 4 y (resp. y 4 x) and for all z ∈ S such that x 4 z 4 y
(resp. y 4 z 4 x), either z = x or z = y.
In (N, ≤) all elements have both immediate successors and immediate predecessors, except for 0
that does not have an immediate predecessor. In (Z, ≤) all elements have both immediate successors
and immediate predecessors. In contrast, as commented above, in (R, ≤) no element has either an
immediate successor or an immediate predecessor.
A partial order does not have to be a total order to have immediate successors or predecessors.
In the blood donor relation (B, →) in Example 1.4.4, o has two immediate successors, namely a and
b.
Example 1.4.8 (Another Order on Q). The usual partial order of ≤ on Q>0 is a total order,
but like for (R, ≤), no element has either an immediate successor or an immediate predecessor. We
define an alternate partial order 4 on Q>0 in which every element has an immediate successor and
an immediate predecessor, except for 1 which only has an immediate successor.
For fractions given in reduced form, we define
(
a c a + b ≤ c + d if a + b 6= c + d
4 ⇐⇒
b d a≤c if a + b = c + d.
It is not hard to check that: (1) 4 is a partial order on Q>0 ; (2) 4 is a total order; (3) that every
element besides 1 has an immediate successor and an immediate predecessor. (See Exercise 1.4.6.)
We can visualize this total order in the following way. Organize all fractions in Q>0 as in the
chart below. For all positive integers n, define the subsets An by
x
An = ∈ Q>0 x + y = n + 1, gcd(x, y) = 1, 1 ≤ x, y ≤ n .
y
1 3 5 7
So for example, A7 = 7, 5, 3, 1 .
A6
1 2 3 4 5 6
A5 6 6 6 6 6 6
1 2 3 4 5 6
A4 5 5 5 5 5 5
1 2 3 4 5 6
A3 4 4 4 4 4 4
1 2 3 4 5 6
A2 3 3 3 3 3 3
1 2 3 4 5 6
A1 2 2 2 2 2 2
1 2 3 4 5 6
1 1 1 1 1 1
We read the fractions in Q>0 successively according to 4 by first going through the An subsets
in increasing order of n and within each An (each of which is finite) list the fractions in increasing
order. 4
1.4. PARTIAL ORDERS 31
1.4.2 – Subposets
Definition 1.4.9
Let S be a nonempty set equipped with a partial order 4 and let T be a subset of S. The
restriction of 4 to T is the relation on T defined by
def
4T = 4 ∩(T × T ),
It is not difficult to see that 4T is reflexive, antisymmetric, and transitive on T , making (T, 4T )
into a poset in its own right. We call (T, 4T ) a subposet of (S, 4).
Though a generic poset (S, 4) need not be a total order, many of the terms associated to
inequalities in relation to subsets of R have corresponding definitions in any poset.
Definition 1.4.10
Let (S, 4) be a poset, and let A be a subset of S.
As an example, consider the blood donor relation (B, →) described in Example 1.4.4 and consider
the subset A = {o, a, b}. Then A has one minimal element o and two maximal elements a and b.
Definition 1.4.11
Let (S, 4) be a poset, and let A be a subset of S.
(1) An upper bound of A is an element u ∈ S such that ∀t ∈ A, t 4 u.
(2) A lower bound of A is an element ` ∈ S such that ∀t ∈ A, ` 4 t.
(3) A least upper bound of A is an upper bound u of A such that for all upper bounds u0
of A, we have u 4 u0 .
(4) A greatest lower bound of A is a lower bound ` of A such that for all lower bounds `0
of A, we have `0 4 `.
We say that a subset A ⊆ S is bounded above if A has an upper bound, is bounded below if A
has a lower bound, and is bounded if A is bounded above and bounded below.
If u1 and u2 are two least upper bounds to A, then by definition u1 4 u2 and u2 4 u1 . Thus,
u1 = u2 and we conclude that least upper bounds are unique. It is similar for greatest lower bounds.
Therefore, if a subset A has a least upper bound, we talk about the least upper bound of A and
denote this element by lub(A). Similarly, if a subset A has a greatest lower bound, we talk about
the greatest lower bound of A and denote this element by glb(A). If A is a subset of S given in
list form as A = {a1 , a2 , . . . , an }, we often write lub(a1 , a2 , . . . , an ) for lub(A) and similarly for the
greatest lower bound.
From the perspective of analysis, one of the most important differences between the posets (R, ≤)
and its subposet (Q, ≤) is that any bounded subset of R has a least upper bound whereas this does
not hold in Q. Consider for example, the subset
p 2 2
A= ∈ Q | p < 2q .
q
32 CHAPTER 1. SET THEORY
√ r
In R, lub(A) = 2 whereas in Q for any upper bound u = s of A we have
√
1 r 2s r
2< + < .
2 s r s
(We leave the proof to the reader.) Hence, A has no least upper bound in (Q, ≤).
Example 1.4.12. Let S be any set and consider the power set P(S) equipped with the ⊆ partial
order. Let X be a subset of P(S). Then a maximal element of X is a set M in X such that no other
set in X contains M . Note that there may be more than one of these. An upper bound of X is any
subset of S that contains every element in every set in X. The least upper bound of X is the union
[
lub(X) = A.
A∈X 4
Definition 1.4.13
In a poset (S, 4) any subposet (T, 4T ) that is a total order is called a chain.
The concept of a chain allows us to introduce a theorem that is essential in a variety of contexts
in algebra.
In the context of ZF-set theory, Zorn’s Lemma is equivalent to the Axiom of Choice. (See [62,
Theorem 5.13.1] for a proof.)
Definition 1.4.15
A poset (S, 4) is called a lattice if for all pairs (a, b) ∈ S × S, both lub(a, b) and glb(a, b)
exist.
Lattices are a particularly nice class of partially order sets. They occur frequently in various
areas of mathematics. Given any set S, the power set (P(S), ⊆) is a lattice with lub(A, B) = A ∪ B
and glb(A, B) = A ∩ B. In Section 3.6 we show how to utilize the lattice structure on the set of
subgroups of a group effectively to quickly answer questions about the internal structure of a group.
ab
a b
Figure 1.2: The Hasse diagram for the donor relation on blood types
Example 1.4.16. Consider the partial order on S = {a, b, c, d, e, f, g, h, i} described by the Hasse
diagram shown in Figure 1.3. The diagram makes it clear what relations hold between elements. For
example, notice that all elements in {a, b, c, d, e, f, g} are incomparable with the elements in {h, i}.
The maximal elements in S are d and i. The minimal elements are a, e, g, and h. As a least upper
bound calculation, lub(a, f ) = c because c is the first element in a chain above a that is also in
a chain above f . We also see that lub(e, h) does not exist because there is no chain above e that
intersects with a chain above h. 4
c g i
b f
h
a e
Hasse diagrams allow for easy visualization of properties of the poset. For example, a poset will
be a lattice if and only if taking any two points p1 and p2 in the diagram, there exists a chain rising
from p1 that intersects with a chain rising from p2 (existence of the least upper bound) and a chain
descending from p1 that intersects with a chain descending from p2 (existence of the greatest upper
bound).
Figure 1.4 illustrates three different lattices. The reader is encouraged to notice that, the third
Hasse diagram corresponds to the partial order ⊆ on P({1, 2, 3}). (See Figure 1.5.)
{a, b, c}
Definition 1.4.17
Let (S, 41 ) and (T, 42 ) be two partially ordered sets. A function f : S → T is called
monotonic (or order-preserving) if
x 41 y =⇒ f (x) 42 f (y).
Example 1.4.18. Consider the function f : R → R defined by f (x) = ex . It is not hard to show
that since e > 1, then x1 ≤ x2 implies that ex1 ≤ ex2 . Thus, f is a monotonic function from (R, ≤)
to itself. 4
For functions over intervals of R, we typically say that f is increasing or decreasing functions to
mean
We do not distinguish between increasing and decreasing with posets generally because (R, ≥) is
also a poset. Hence, a decreasing function is simply a monotonic function from (R, ≤) to (R, ≥).
Definition 1.4.19
Let (S, 41 ) and (T, 42 ) be two partially ordered sets. A function f : S → T is called a
poset isomorphism if it is bijective and x 41 y if and only if f (x) 42 f (y).
The etymology of the term “isomorphism” comes from the Greek, meaning “same shape.” If
there exists an isomorphism between two posets they are the same object from the perspective of
the poset algebraic structure; only labels of elements change under f .
1.4.5 – Well-Orderings
A few algorithms presented in this textbook involve a strictly decreasing sequence of elements in a
poset. In many instances, it is crucial to know that these algorithms terminate.
Definition 1.4.20
A total order (S, 4) is called a well-ordering (or well-order, or well-ordered set) if every
nonempty subset A ⊂ S contains a least element.
The poset (N, ≤) is a well-ordering. (See Axiom 2.1.1 and the surrounding discussion.) In
contrast, (Z, ≤) is not a well-ordering because, for example, Z itself has no least element.
1.4. PARTIAL ORDERS 35
Proposition 1.4.21
A total order 4 on S is a well-ordering if and only if every strictly decreasing sequence in
S eventually terminates.
Proof. We prove the contrapositive statement. Suppose that 4 is not a well-ordering. Then there
exists a subset A ⊆ S that has no least element. Pick a1 ∈ A. Since a1 is not a least element of
A, there exists a2 ∈ A such that a1 a2 (i.e., a1 < a2 and a2 6= a1 ). Then a2 is also not a least
element so there exist a3 ∈ A such that a2 a3 . Continuing similarly, we find that there exists an
infinite strictly decreasing sequence
a1 a2 a3 · · · .
Conversely, suppose that (S, 4) is a total order and that there exists a strictly decreasing sequence
as described above. Then the subset {a1 , a2 , a3 , . . .} ⊆ S contains no least element and therefore
(S, 4) is not a well-ordered set.
The concept of well-ordering is valuable for a number of profound properties, not the least of
which is the principle of mathematical induction. However, Proposition 1.4.21 is important for some
algorithms we encounter in algebra: If an algorithm involves steps that produce a sequence that is
strictly decreasing with respect to some well-ordering, then by this proposition, the algorithm must
terminate.
(a1 , a2 , . . . , an ) 4 (b1 , b2 , . . . , bn )
(2, 1700, −5) 4 (4, −300, 2) because the first entry where they differ 2 ≤ 4,
(−5, 4, −10) 4 (−5, 4, 0) because the first entry where they differ − 10 ≤ 0. 4
Example 1.4.23. Consider the posets (Z, ≤) and (P({1, 2, 3, 4}), ⊆) and let 4 be the lexicographic
ordering on Z × P({1, 2, 3, 4}). Then, for example,
(3, {1, 3}) 4 (5, {2, 3, 4}) because the first entry where they differ 3 ≤ 5,
(−2, {4}) 4 (−2, {2, 3, 4}) because the first entry where they differ {4} ⊆ {2, 3, 4}.
On the other hand (−2, {1, 4}) and (−2, {2, 3, 4}) are incomparable. 4
36 CHAPTER 1. SET THEORY
Taking the contrapositive of the above implication, we deduce that x 4 y implies that y 0 4
/ x0 or
0 0 0 0
x ∼ y . Note that if x ∼ y then x ∼ y in the first place and thus [x] = [y]. Also, the statement
that y 0 4/ x0 is equivalent to x0 4 y 0 or x0 and y 0 are incomparable.
Conversely, suppose that ∼ satisfies the condition that
[x] 4inh [y] if and only if ∃x0 ∈ [x], ∃y 0 ∈ [y] such that x0 4 y 0 (1.7)
makes the projection p monotonic onto S/ ∼. We call 4inh the partial order inherited from 4. This
proves the following proposition.
Proposition 1.4.24
Let (S, 4) be a poset and let ∼ be an equivalence relation on S satisfying Condition
(1.6). Then the partial order 4inh on S/ ∼ inherited from 4 on S defined by (1.7) makes
(S/ ∼, 4inh ) into a poset.
Definition 1.4.25
We call Condition (1.6) the poset quotient condition and we call the poset (S/ ∼, 4inh )
established in Proposition 1.4.24 the quotient poset of (S, 4) by ∼.
Example 1.4.26. Consider the Hasse diagram in Figure 1.6. Consider the poset depicted on the
left and suppose that the gray bubbles indicate equivalences of an equivalence relation ∼ on S. Then
∼ satisfies the condition in (1.6). The Hasse diagram on the right shows the resulting quotient poset
S/ ∼. 4
Example 1.4.27. Consider the poset (R, ≤) and the equivalence relation ∼ on R defined by x ∼ y
if and only if bxc = byc, where bxc is the greatest integer less than or equal to x. Equivalence classes
of this partition consist of intervals [n, n + 1) with n ∈ Z where [x] = [n, n + 1) if and only if bxc = n.
1.4. PARTIAL ORDERS 37
S S/ ∼
where 4lex is the lexicographic order on Z3 (with each copy of Z equipped with the partial order ≤).
Prove that 4 is a partial order on Z3 . Prove also that 4 is a total order.
13. Let (Ai , 4i ) be posets for i = 1, 2, . . . , n and define 4lex as the lexicographic order on A1 × A2 × · · · An .
Prove that 4lex is a total order if and only if 4i is a total order on Ai for all i.
14. Let 4 be the lexicographic order on R3 , where each R is equipped with the usual ≤. Prove or disprove
the following statement: For all vectors ~a, ~b, ~c, d,
~ if ~a 4 ~b and ~c 4 d,
~ then ~a + ~c 4 ~b + d.
~
15. Answer the following questions pertaining to the poset described by the Hasse diagram below.
j
h i
f g
e
c
d
a b
16. Consider the partial order on R2 given in Example 1.4.6. Let A be the unit disk
A = {(x, y) ∈ R2 | x2 + y 2 ≤ 1}.
(a) Show that A has both a maximal and minimal element. Find all of them.
(b) Find all the upper bounds and all the lower bounds of A.
17. Consider the lexicographic order on R2 coming from the standard (R, ≤). Let A be the closed disk of
center (1, 2) and radius 5.
(a) Show that A has both a maximal and minimal element. Find all of them.
(b) Find all the upper bounds and all the lower bounds of A.
(c) Show that A has both a least upper bound and a greatest lower bound.
18. Prove that in a finite lattice, there exists exactly one maximal element and one minimal element.
19. Let (B, →) be the poset of blood types equipped with the donor relation. (See Example 1.4.4.)
(a) Consider the poset ({1, 2, 3}, ≤). Show that the function f : B −→ {1, 2, 3} defined by f (o) = 1,
f (a) = 2, f (b) = 2 and f (ab) = 3 is a monotonic function.
(b) Show that there exists no isomorphism between (B, →) and ({1, 2, 3, 4}, ≤).
1.4. PARTIAL ORDERS 39
20. Let (S, 4), (T, 40 ), and (U, 400 ) be three posets. Let f : S → T and g : T → U be monotonic functions.
Prove that the composition g ◦ f : S → U is monotonic.
21. Prove that the poset (R, ≤) is not isomorphic to (R − {0}, ≤).
22. Prove that the poset of integers greater that a fixed number k (with partial order ≤) is isomorphic to
the (N, ≤).
23. Let (S, 41 ) and (T, 42 ) be two partially ordered sets and let f : S → T be a monotonic function.
(a) Prove that if A is a subset of S with an upper bound u, then f (u) is an upper bound of f (A).
(b) Prove with a counterexample that f (lub(A)) is not necessarily equal to lub(f (A)).
(c) Prove that if f is an isomorphism, then f (lub(A)) = lub(f (A)).
24. Prove or disprove that (Z, ≤) and (Q, ≤) are isomorphic as posets.
25. Determine whether the posets corresponding to the following Hasse diagrams are lattices. If they are
not, explain why.
(a) (b) h (c) h (d) h
g
f g f g f g
f
d e d e d e d e
b c b c b c b c
a a a
26. Explain under what conditions a flow chart may be viewed as a partial order.
27. Let (S, 4) and (T, 42 ) be two partial orders. Suppose that ∼ is an equivalence relation on S that
satisfies Condition (1.6). Prove that for any monotonic function f : S → T such that f (a) = f (b)
whenever a ∼ b, there exists a unique monotonic function f 0 : S/ ∼ → T such that f = f 0 ◦ p, where
p : S → S/ ∼ is the projection p(a) = [a] and the partial order on S/ ∼ is defined in (1.7). In the
terminology of diagrams, prove that the diagram below is commutative.
p
(S, 4) (S/ ∼, 40 )
f0
f
(T, 42 )
(The equivalence relation has the effect of considering a and b as the same element.) Equivalence
classes have either one element or three elements.
(a) Show that ∼ satisfies the poset quotient condition.
(b) Show that (S/ ∼, ⊆inh ) is isomorphic to a lattice of subgroups for a set of four elements.
40 CHAPTER 1. SET THEORY
1.5
Projects
Project I. Discriminants of Polynomials. The reader should be familiar with the discriminant
of a quadratic polynomial, which determines if a quadratic has 0, 1, or 2 real roots. We
investigate the discriminants for the cubic polynomials and polynomials of higher degree.
(1) Thinking about the shape of a parabola that corresponds to f (x) = ax2 + bx + c and
where it can interest the x-axis determine a condition using calculus that tells you when
the equation ax2 + bx + c has zero, one or two distinct roots.
(2) Show that by replacing x with the variable by x = su + t for appropriate constants s and
t, any cubic equation ax3 + bx2 + cx + d = 0 can be rewritten as u3 + pu + q = 0.
(3) (Using calculus) and thinking about the possible maxima and minima of the graph of
a cubic, find conditions depending on p and q that determine when a cubic of the form
f (x) = x3 + px + q has one, two, or three distinct real roots.
(4) Repeat the above question with f (x) = x3 + ax2 + bx + c.
(5) Can anything be said for a quartic (degree 4) polynomial? Assume that the discriminant
is an expression in that coefficients of the polynomial that is 0 if and only if the polynomial
has a multiple root.
Project II. Fuzzy Set Theory. Let S be any set. Exercise 1.1.28 establishes a bijection between
P(S) and functions from S to {0, 1}. The one defining characteristic of a set is that is has a
clear, unequivocal rule whether any object is in or not in it.
Fuzzy set theory offers a model for when instead of knowing unequivocally whether an object
is in or not in the set, we only know with a certain likelihood whether an object is in it. Given
a set S, a fuzzy subset of S is a function p : S → [0, 1]. Hence, for all s ∈ S, the function p(s)
is a real number between 0 and 1, inclusive. We could make the connection with normal sets
via the interpretation that p(a) = 0 to mean that a is not in the set and p(b) = 1 means b is
in the set. A fuzzy subset of S is tantamount to assigning a probability to each elements of S.
In this project, you are encouraged to develop a theory of fuzzy subsets. Define and make sense
of the usual set operations: intersection, union, complement, set difference, and symmetric
difference. Does the concept of a function between fuzzy sets make sense? Does the concept
of a relation make sense? Can you make sense of the usual relation A ⊆ B for fuzzy sets?
Describe some natural situations where fuzzy set theory may legitimately find more use than
usual set theory.
Project III. Associahedron. Let S be a set and ◦ a binary operation on S. If ? is not associative,
then the expressions with three terms a1 ? (a2 ? a3 ) and (a1 ? a2 ) ? a3 are not necessarily equal.
The nth associahedron Kn is a convex polytope (arbitrary dimensional generalization of a
polyhedron) in which each vertex corresponds to a way of properly placing parentheses in an
operation expression with n terms and each edge corresponds to a single application of the
associativity rule. With three terms, K3 consists of two vertices and one edge so is a line
segment.
a1 ? (a2 ? a3 ) (a1 ? a2 ) ? a3
Some mathematicians say tongue-in-cheek that abstract algebra is a service branch of mathematics.
By “service” they mean that applications in other branches of mathematics shape the development
and topics of interest of abstract algebra. Algebraists chafe at such comments because, as a branch of
mathematics, algebra has a life of its own. However, there is some historical basis for this sentiment.
Besides approaches to solving the quadratic equation dating back to antiquity, geometry and
number theory (arithmetic) formed the heart of mathematics until the 16th century. As algebra
became a branch of its own, work centered on studying properties of polynomials with an emphasis
on finding solutions to polynomial equations. However, certain conjectures in geometry and number
theory resisted proofs using classical methods. Some of these difficult problems galvanized math-
ematical investigations, leading to countless discoveries and to the development of whole fields of
inquiry.
Consider Fermat’s Last Theorem that states that for any integer n ≥ 3, there do not exist
nonzero integers a, b, and c such that
an + bn = cn .
In 1637, Fermat wrote in the margin of his notes that he had found a proof for this fact but that he
did not have room in the margin to write it down. For hundreds of years, mathematicians attempted
to prove this conjecture. Though simple to state, a rigorous proof of Fermat’s Last Theorem eluded
mathematicians until 1994 when, building on the work of scores of others, Andrew Wiles proved
the Taniyama-Shimura Conjecture, which implies Fermat’s Last Theorem. During their quest for
this Holy Grail of classical number theory, mathematicians contributed to areas now labeled as ring
theory and algebraic geometry.
Throughout this book, we will occasionally mention when classical conjectures motivated certain
directions of investigation and when algebra offered solutions to some of these great problems.
Consequently, many motivating examples for various types of algebraic structures come from
geometry and number theory. Though this text does not offer a review in geometry, it does offer three
sections of elementary number theory that cover only the topics absolutely necessary to introduce the
algebraic structures we will study. Section 2.1 reviews terms and theorems concerning divisibility of
integers. Section 2.2 introduces modular arithmetic, a useful technique to answer many divisibility
problems in number theory. Finally, Section 2.3 gives an optional review of mathematical induction.
2.1
Basic Properties of Integers
The following results about integers arise in a basic course on number theory. Indeed, some of
these topics are taught as early as late elementary school and developed to different degrees in high
school. However, in primary and secondary school, many basic theorems in number theory are
taught without giving proofs and are introduced in an order that is not appropriate for providing
proofs. In this section, we discuss elementary divisibility properties of integers, first because we
need these results, and second as a reference for how to develop similar topics in general algebraic
structures.
43
44 CHAPTER 2. NUMBER THEORY
because it presupposes that we are clear about what the integers are. Since the set of integers is
infinite, we cannot list them out or perceive them at a glance. Hence, we need a structural definition,
a set of axioms, that define the set of integers.
There exist a few different, though equivalent formulations of the set of axioms for the integers.
Though we do not provide a list of axioms for the integers here, [53, Appendix A] and [59, Section 2.1]
give two slightly different formulations. Almost all of the axioms are well-known, even by elementary
school children. The only axiom that does not feel immediately obvious is the well-ordering property.
Example 2.1.2. Let A = {n ∈ N | n2 > 167}. A is nonempty because for example 202 = 400 > 167
so 20 ∈ A. Theorem 2.1.1 allows us to conclude that A has a minimal element. By trial and error
or using some basic algebra, we can find that the minimal element of A is 13. 4
Example 2.1.3. The rational numbers, equipped with the usual partial order ≤, does not satisfy
the well-ordering principle. Let S = Q≥0 and let A be the set of positive rational numbers. The
set A does not contain a minimal element. No matter what positive rational number pq we take, the
p
rational 2q is less than pq . 4
Example 2.1.4. Let A be the set of integers that can be written as the sum of two positive cubes in
three different ways. Since A consists of positive numbers, Theorem 2.1.1 allows us to conclude that
A is either empty or has a minimal element. In this example, if there does exist a number theoretic
method to find the minimal element of A (besides running a computer algorithm that would perform
an exhaustive search), Theorem 2.1.1 offers no way to determine this minimal element. 4
Example 2.1.4 shows that Theorem 2.1.1 is a nonconstructive result as it affirms the existence of
an element with a certain property without offering a method to “construct” or to find this element.
2.1.2 – Divisibility
The notion of divisibility of integers is often introduced in elementary school. However, in order to
prove theorems about divisibility, we need a rigorous definition.
Definition 2.1.5
If a, b ∈ Z with a 6= 0, we say that a divides b if ∃k ∈ Z such that b = ak. We write a | b if
a divides b and a - b otherwise. We also say that a is a divisor (or a factor ) of b and that
b is a multiple of a.
Proposition 2.1.6
Let a, b, and c be integers.
(1) Any nonzero integer divides 0.
(2) Suppose a 6= 0. If a|b and a|c, then a|(b + c).
Proof. For (1), setting k = 0, then for any nonzero integer a has ak = 0. Thus, a divides 0.
For (2), by a|b and a|c, there exist k, ` ∈ Z such that ak = b and a` = c. Then
b + c = ak + a` = a(k + `).
Suppose that we restrict the relation of divisibility to positive integers N∗ . Since any positive
integer divides itself, divisibility on N∗ is reflexive, antisymmetric (by Proposition 2.1.6(4)), and
transitive (by Proposition 2.1.6(3)). Thus, (N∗ , |) is a partially ordered set. This poset has a
minimal element of 1, because 1|n for all positive integers n, but has no maximal element.
To discuss how divisibility interacts with the sign of integers, note that since dk = a implies
d(−k) = −a, then d is a divisor of a if and only if d is a divisor of −a. Therefore, when discussing
the set of divisors of a number a 6= 0, we can assume without loss of generality that a > 0.
Let d be a nonzero integer. To get a positive multiple dk of d, we need k 6= 0 of the same sign
as d. Then
dk = |d||k| ≥ |d|
because |k| ≥ 1. Thus, any nonzero multiple a of d satisfies |a| ≥ |d|. By the same token, any divisor
d of a satisfies |d| ≤ |a|.
S = {b − ka ∈ N | k ∈ Z}.
(b − k1 a) − (b − k2 a) = (k2 − k1 )a,
b − a(q + 1) = b − aq − a = r − a ≥ 0
and so r − a contradicts the minimality of r as a least element of S. Since any two elements of S
differ by a multiple of a, then r is the unique element n of S with 0 ≤ n < a. Hence, the integers q
and r satisfy the conclusion of the theorem.
Techniques for integer division are taught in elementary school. It is obvious that d|n if and only
if the remainder of the integer division of n by d is 0.
Borrowing notation from some programming languages, it is common to write b mod a to stand
for the remainder of the division of b by a.
46 CHAPTER 2. NUMBER THEORY
Definition 2.1.8
If a, b ∈ Z with (a, b) 6= (0, 0), a greatest common divisor of a and b is an element d ∈ Z
such that:
Because of condition (2) in this definition, it is not obvious that two integers not both 0 have
a greatest common divisor. (If we had said d0 ≤ d in the definition, the proof that two integers
not both 0 have a greatest common divisor is a simple application of the Well-Ordering Principle
on Z.) The key to showing that integers possess a greatest common divisor relies on the Euclidean
Algorithm, which we described here below.
Let a and b be two positive integers with a ≥ b. The Euclidean Algorithm starts by setting
r0 = a and r1 = b and then repeatedly perform the following integer divisions
r0 = r1 q1 + r2 (where 0 ≤ r2 < b)
r1 = r2 q2 + r3 (where 0 ≤ r3 < r2 )
..
.
rn−2 = rn−1 qn−1 + rn (where 0 ≤ r3 < r2 )
rn−1 = rn qn + 0 (and rn > 0).
This process terminates because the sequence r1 , r2 , r3 , . . . is a strictly decreasing sequence of positive
integers and hence has at most r1 = b terms in it. Note that if b | a, then n = 1 and the Euclidean
Algorithm has one line.
To see what the Euclidean Algorithm tells us, consider the positive integer rn . By the last line
of the Euclidean Algorithm we see that rn |rn−1 . From the second to last row, of the Euclidean
Algorithm, rn | rn−1 qn−1 and by Proposition 2.1.6(2), rn | rn−2 . Repeatedly applying this process
(n − 1 times, and hence a finite number), we see that rn | r1 and rn | r0 , so rn is a common divisor
of a and b.
Also, suppose that d0 is a common divisor of a and b. Then d0 k0 = a = r0 and d0 k1 = b = r1 .
We have
r2 = r0 − r1 q1 = d0 k0 − d0 k1 q1 = d0 (k0 − k1 q1 ).
Hence, d0 divides r2 with d0 k2 = r2 . Repeating this process (n − 1 times), we deduce that d0 |rn .
Thus, rn is a positive greatest common divisor of a and b. Consequently, the Euclidean Algorithm
leads to the following theorem.
Proposition 2.1.9
There exists a unique positive greatest common divisor for all pairs of integers (a, b) ∈
Z × Z − {(0, 0)}.
Proof. First suppose that either a or b is 0. Without loss of generality, assume that b = 0 and
a 6= 0. Since any integer divides 0, common divisors of a and b consist of divisors of a. Then greatest
common divisors of a and 0 consist of a and −a and |a| is the unique positive greatest common
divisor of a and 0.
2.1. BASIC PROPERTIES OF INTEGERS 47
Now suppose neither a nor b is 0. Since the set of divisors of an integer c is the same set as
the divisors of −c, then we can assume without loss of generality that a and b are both positive.
Applying the Euclidean Algorithm to a and b shows that the pair (a, b) has a greatest common
divisor. Now suppose d1 and d2 are two positive greatest common divisors of a and b. Then d1 |d2
and d2 |d1 and according to Proposition 2.1.6(4), d1 = d2 since they are both positive. Hence, a and
b possess a unique positive greatest common divisor.
Because of Proposition 2.1.9, we regularly refer to “the” greatest common divisor of two integers
as this unique positive one and we use the notation gcd(a, b). The proof of Proposition 2.1.9 tells us
how to calculate the greatest common divisor: (1) if a 6= 0, then gcd(a, 0) = |a|; (2) if a, b 6= 0, then
gcd(a, b) is the result of the Euclidean Algorithm applied to |a| and |b|.
Example 2.1.10. We perform the Euclidean Algorithm to find gcd(522, 408).
522 = 408 × 1 + 114
408 = 114 × 3 + 66
114 = 66 × 1 + 48
66 = 48 × 1 + 18
48 = 18 × 2 + 12
18 = 12 × 1 + 6
12 = 6 × 2 + 0
According to Proposition 2.1.9 and the Euclidean Algorithm, gcd(522, 408) = 6. 4
Two integers always have 1 and −1 as common divisors. However, if gcd(a, b) = 1 then we say
that a and b are relatively prime.
Lemma 2.1.11
Let a and b be positive integers. Then if k and l are integers such that a = k gcd(a, b) and
b = l gcd(a, b), then k and l are relatively prime.
Proof. Consider c = gcd(k, l), and write k = ck 0 and l = cl0 for some integers k 0 and l0 . Then
a = k 0 c gcd(a, b) and b = l0 c gcd(a, b).
Therefore, c gcd(a, b) would be a divisor of gcd(a, b) so for some integer h, we have c gcd(a, b)h =
gcd(a, b). Hence, ch = 1. Since c is a positive integer, this is only possible if c = 1.
There is an alternative characterization of the greatest common divisor. Let a, b ∈ Z∗ and define
Sa,b as the set of integer linear combinations in integers of a and b, i.e.,
Sa,b = {sa + tb|s, t ∈ Z}.
Proposition 2.1.12
The set Sa,b is the set of all integer multiples of gcd(a, b). Consequently, gcd(a, b) is the
least positive linear combination in integers of a and b.
Proof. By Proposition 2.1.6, any common divisors to a and b divides sa, tb, and sa + tb. This shows
that Sa,b ⊆ gcd(a, b)Z. We need to show the reverse inclusion.
By the Well-Ordering Principle, the set Sa,b has a least positive element. Call this element d0
and write d0 = s0 a + t0 b for some s0 , t0 ∈ Z. We show by contradiction that d0 is a common divisor
of a and b. Suppose that d0 does not divide a. Then by integer division,
a = qd0 + r where 0 < r < d0 .
48 CHAPTER 2. NUMBER THEORY
This writes r, which is positive and less than d0 , as a linear combination of a and b. This contradicts
the assumption that d0 is the minimal positive element in S(a, b). Hence, the assumption that d0
does not divide a is false so d0 divides a. By a symmetric argument, d0 divides b as well. Since
kd0 = (ks0 )a + (kt0 )b, then every multiple of d0 is in Sa,b in particular every multiple of gcd(a, b) is
too. Hence, gcd(a, b)Z ⊆ Sa,b . We conclude that Sa,b = gcd(a, b)Z and the proposition follows.
Proposition 2.1.12 does not offer a way to find the integers s and t such that gcd(a, b) = sa + tb.
If a and b are small then one can find s and t by inspection. For example, by inspecting the divisors
of 22, it is easy to see that gcd(22, 14) = 2. A linear combination that illustrates Proposition 2.1.12
for 22 and 14 is
2 × 22 − 3 × 14 = 44 − 42 = 2.
However, it is possible to backtrack the steps of the Euclidean Algorithm and find s and t such that
sa + tb = gcd(a, b). The following example illustrates this.
Example 2.1.13 (Extended Euclidean Algorithm). In Example 2.1.10, the Euclidean Algo-
rithm gives gcd(522, 408) = 6. We start from the penultimate line in the algorithm and work back-
ward, in such a way that each line gives 6 as a linear combination of the intermediate remainders ri
and ri+1 .
6 = 18 − 12 × 1
= 18 − (48 − 18 × 2) × 1 = 18 × 3 − 48 × 1
= (66 − 48 × 1) × 3 − 48 × 1 = 66 × 3 − 48 × 4
= 66 × 3 − (114 − 66 × 1) × 4 = 66 × 7 − 114 × 4
= 114 × 4 − (408 − 114 × 3) × 7 = 408 × 7 − 114 × 25
= (522 − 408 × 1) × 25 − 408 × 7 = 408 × 32 − 522 × 25
The characterization of the greatest common divisor as given in Proposition 2.1.12 leads to many
consequences about the greatest common divisor. The following proposition gives one such example.
Proposition 2.1.14
Let a and b be nonzero integers that are relatively prime. For any integer c, if a|bc, then
a|c.
Proof. Since a and b are relatively prime, then gcd(a, b) = 1. By Proposition 2.1.12, there exist
integers s, t ∈ Z such that sa + tb = 1. Since a | bc, there exists k ∈ Z such that ak = bc. Then
Definition 2.1.15
If a, b ∈ Z∗ , a least common multiple is an element m ∈ Z such that:
• a|m and b|m (m is a common multiple);
• if a|m0 and b|m0 , then m|m0 .
Similar to our presentation of the greatest common divisor, we should note that from this defi-
nition, it is not obvious that a least common multiple always exists. Again, we must show that it
exists.
Proposition 2.1.16
There exists a unique positive least common multiple m for all pairs of integers (a, b) ∈
Z × Z − {(0, 0)}.
Proof. If m1 and m2 are least common multiples of a and b, then m1 |m2 and m2 |m1 . Therefore, by
Proposition 2.1.6(4), if a and b have a least common multiple m, the integer −m is the only other
least common multiple.
Without loss of generality, assume that a and b are positive in the rest of the proof.
Since gcd(a, b) divides a and divides b, then gcd(a, b)|ab. Also, we can write a = k gcd(a, b) and
b = l gcd(a, b). Let M be the positive integer such that M gcd(a, b) = ab. From M gcd(a, b) =
gcd(a, b)kb, we get M = bk and similarly M = al and hence M is a common multiple of a and b.
Let m0 be another common multiple of a and b with m0 = pa and m0 = qb. Since pa = qb then
pk gcd(a, b) = ql gcd(a, b)
and hence pk = ql. Since gcd(k, l) = 1, by Proposition 2.1.14 we conclude that k|q with q = kc for
some integer c. Then m0 = (kc)b = c(bk) = cM and we deduce that m0 | M .
This shows that M = ab/ gcd(a, b) satisfies the criteria of a least common multiple and the
proposition follows.
We regularly call this unique positive least common multiple of a and b “the” least common
multiple. We denote this positive integer as lcm(a, b). The proof of Proposition 2.1.16 establishes
the following important result
When restricted to the positive integers N∗ , the existence theorems given in Propositions 2.1.9
and 2.1.16 show that the poset (N∗ , |) is a lattice. The greatest common divisor gcd(a, b) of two
positive integers a and b is the greatest lower bound of {a, b} in the terminology of posets and the
least common multiple is the least upper bound of {a, b}. Remark that (N∗ , |) is an infinite lattice
while for n ≥ 3, the subposet ({1, 2, . . . , n}, |) is not a lattice.
Definition 2.1.17
An element p ∈ Z is called a prime number if p > 1 and the only divisors of p are 1 and
itself. If an integer n > 1 is not prime, then n is called composite.
For short, we often say “p is prime” instead of “p is a prime number.” As simple as the concept
of primality is, properties about prime numbers have intrigued mathematicians since Euclid and
50 CHAPTER 2. NUMBER THEORY
Proof. Assume that the set of prime numbers is finite. Write the set as {p1 , p2 , . . . , pn }. Consider
the integer
Q = (p1 p2 · · · pn ) + 1.
The integer Q is obviously larger than 1 so it is divisible by a prime number, say pk . Then since
1 = Q − (p1 p2 · · · pn )
The problem of finding a fastest algorithm for determining whether a given integer n is prime is a
difficult problem. This problem is not just one of simple curiosity but has applications to industrial
information security. It is possible to simply take in sequence all integers 1 < d ≤ n and perform
an integer division of n by d. The least integer d that divides n is a prime number. This d satisfies
d = n if and only if n is prime. This method offers an exhaustive algorithm to determine whether n
is prime but many improvements can be made. We can shorten the exhaustive algorithm with the
following result.
Proposition 2.1.19
√
If n is composite, then it has a divisor d such that 1 < d ≤ n.
√
Proof. Suppose that all the divisors of n are greater than n. Since n is composite,
√ there exist
√
positive integers a and b greater than 1 with n = ab. The supposition that a > n and b > n
implies that ab = n > n, a contradiction. The proposition follows.
Corollary 2.1.20
√
An integer n is prime if and only if none of the integers 1 < d ≤ n divides n.
Techniques for finding prime numbers have grown increasingly technical. See [36] for a recent
text on advanced prime detecting techniques.
We mention two other key properties about prime numbers. We omit here a proof for the second
theorem since we will discuss these topics in greater generality in Section 6.4.
The following proposition gives an alternative characterization for primality.
2.1. BASIC PROPERTIES OF INTEGERS 51
Proof. Suppose that p|ab. If p|a then the proof is done. Suppose instead that p - a. Then, since the
only divisors of p are 1 and itself, gcd(p, a) = 1. By Proposition 2.1.14, p|b and the proposition is
still true.
In the factorizations in (2.2), we do not assume that the prime numbers pi are all unique. It is
common to write the generic factorization of integers as
n = pα 1 α2 αn
1 p2 · · · pn (2.3)
with the primes pi all distinct and αi are nonzero integers. It is also common to list the primes in
increasing order. Using these latter two habits, we call the expression in (2.3) the prime factorization
of n. The prime factorization inspires the so-called prime order function ordp : N∗ → N defined by
ordp (n) = k ⇐⇒ pk | n and pk+1 - n. (2.4)
Example 2.1.23. Let n = 2016. By dividing about by appropriate primes, we find that the prime
factorization is 2016 = 25 × 32 × 7. Thus,
ord2 (2016) = 5, ord3 (2016) = 2, ord7 (2016) = 1,
and the ordp (2016) = 0 for all p ∈
/ {2, 3, 7}. 4
m
The order function extends to a function ordp : Q>0 → Z in the following way. Let n be a
fraction written in reduced form. Then
m
ordp = ordp (m) − ordp (n).
n
Example 2.1.24. Let n/m = 48/55. We have
48 48 48 48
ord2 = 4, ord3 = 1, ord5 = −1, ord11 = −1,
55 55 55 55
48
and ordp = 0 for all p ∈
/ {2, 3, 5, 11}. 4
55
Definition 2.1.25
Euler’s totient function (or Euler’s φ-function) is the function φ : N∗ → N∗ such that φ(n)
is the number of positive integers less than n that are relatively prime to n. In other words,
def
φ(n) = {a ∈ N∗ | 1 ≤ a ≤ n and gcd(a, n) = 1} . (2.5)
52 CHAPTER 2. NUMBER THEORY
(2) φ(20) = 8, because for 1 ≤ a ≤ n, the integers relatively prime to 20 are those that are not
divisible by 2 or by 5. Thus,
(3) φ(243) = ϕ(35 ) = 35 − 34 = 243 − 81 = 162, because the integers relatively prime to 243 are
precisely those that are not divisible by 3. 4
Exercises 2.1.29 through 2.1.31 guide the reader to develop the following formula for Euler’s
totient function.
Proposition 2.1.27
α`
If a positive integer n has the prime decomposition of n = pα 1 α2
1 p2 · · · p` , then
α` −1
φ(n) = p1α1 − p1α1 −1 pα α2 −1
· · · pα
2 − p2 ` − p` . (2.6)
2 `
5. Use the Euclidean Algorithm to find the greatest common divisor of the following pairs of integers.
(a) a = 234, and b = 84
(b) a = 5241, and b = 872
(c) a = 1010101, and b = 1221
6. Use the Euclidean Algorithm to find the greatest common divisor of the following pairs of integers.
(a) a = 55, and b = 34
(b) a = 4321, and b = 1234
(c) a = 54321, and b = 1728
7. Define the Fibonacci sequence {fn }n≥0 by f0 = 0, f1 = 1 and fn = fn−1 + fn−2 for all n ≥ 2. Let fn
and fn+1 be two consecutive terms in the Fibonacci sequence. Prove that gcd(fn+1 , fn ) = 1 and show
that for all n ≥ 2, the Euclidean algorithm requires exactly n − 1 integer divisions (including the last
one that has a remainder of 0).
8. Let a, b, c ∈ Z. Prove that a|b implies that a|bc.
9. Perform the Extended Euclidean Algorithm on the three pairs of integers in Exercise 2.1.5.
10. Perform the Extended Euclidean Algorithm on the three pairs of integers in Exercise 2.1.6.
11. Suppose that a, b ∈ Z∗ and that s, t ∈ Z∗ such that sa+tb = gcd(a, b). Show that s and t are relatively
prime.
2.1. BASIC PROPERTIES OF INTEGERS 53
12. Consider the relation of “relatively prime” on Z∗ . Determine whether it is reflexive, symmetric,
antisymmetric, or transitive.
13. Let a, b, c be positive integers. Prove that gcd(ab, ac) = a gcd(b, c).
14. Let a and b be positive integers. Show that the set of common multiples of a and b is lcm(a, b)Z, i.e.,
the set of multiples of lcm(a, b).
15. Prove that any integer greater than 3 that is 1 less than a square cannot be prime.
16. Prove that if 2n − 1 is prime then n is prime. [Hint: Recall that for all real numbers,
Prime numbers of the form 2p − 1, where p is prime, are called Mersenne primes and have been
historically of great research interest. The converse implication is not true however. For example,
211 − 1 = 2047 = 23 × 89.]
17. Prove or disprove that p1 p2 · · · pn + 1 is a prime number where p1 , p2 , . . . , pn are the n smallest
consecutive prime numbers.
18. Prove that the product of two consecutive positive integers is even.
19. Prove that the product of four consecutive positive integers is divisible by 24.
20. Suppose that the prime factorizations of a and b are
a = pα 1 α2 αn
1 p2 · · · pn and b = pβ1 1 pβ2 2 · · · pβnn ,
25. Find ord5 (200!). Use this to determine the number of 0s to the right in the decimal expansion of 200!.
26. Let p be a prime number. Prove that the function ordp : Q → Z defined in (2.4) satisfies the following
logarithmic-type properties.
(a) ordp (mn) = ordp (m) + ordp (n) for all m, n ∈ Z;
(b) ordp (mk ) = k ordp (m) for all m ∈ Z and k ∈ N∗ .
27. For the following integers, calculate ϕ(n) by directly listing the set in (2.5).
where this summation notation means we sum over all positive divisors d of n. [Hint: Consider the
set of fractions n1 , n2 , n3 , . . . , n
n
written in reduced form.]
29. Prove that for any prime p, the following identities hold.
(a) φ(p) = p − 1
(b) φ(pk ) = pk − pk−1
30. Prove that if a and b are relatively prime, then Euler’s totient function satisfies φ(ab) = φ(a)φ(b).
31. Using Exercises 2.1.29 and 2.1.30, prove Proposition 2.1.27.
54 CHAPTER 2. NUMBER THEORY
2.2
Modular Arithmetic
In this section we assume that n represents an integer greater than or equal to 2.
One of the theorems often presented as a highlight in an introduction to modular arithmetic,
Fermat’s Little Theorem, dates as far back as to 1640. However, many properties of congruences, a
fundamental notion in modular arithmetic, appear in Leonhard Euler’s early work on number theory,
circa 1736 (see [9, p.131] and [57, p.45]). The modern formulation of congruences first appeared in
Gauss’ Disquisitiones Arithmeticae in 1801 [28]. He applied the theory of congruences to the study
of Diophantine equations, algebraic equations in which we look for only integer solutions.
We introduce modular arithmetic here for its fundamental value in number theory and because
modular arithmetic will provide relatively easy examples of groups.
2.2.1 – Congruence
Definition 2.2.1
Let a and b be integers. We say that a is congruent to b modulo n if n | (b − a) and we
write
a ≡ b (mod n).
If n is understood from context, we simply write a ≡ b. The integer n is called the modulus.
Proposition 2.2.2
The congruence modulo n relation on Z is an equivalence relation.
Section 1.3.1 introduced notation that is standard for equivalence classes and quotient sets in the
context of generic equivalence relations. However, the congruence relation has such a long history,
that it carries its own notations.
When the modulus n is clear from context, we denote by a the equivalence class of a mod n and
call it the congruence class of a. If we consider the integer division of a by n with a = nq + r, we see
that n | a − r. Hence, r ∈ a. In fact, we can characterize the equivalence class of a in a few different
ways:
a = {b ∈ Z | b ≡ a (mod n)}
= {b ∈ Z | a and b have the same remainder when divided by n}
= {a + kn | k ∈ Z} = a + nZ.
a≡0 (mod n) ⇐⇒ a | n.
2.2. MODULAR ARITHMETIC 55
Instead of writing Z/ ≡ for the quotient set for the congruence relation, we always write
def
Z/nZ = {0, 1, . . . , n − 1}
for the set of equivalence classes modulo n. We pronounce this set as “Z mod n Z.”
Example 2.2.3. Suppose n = 15, then we have the equalities 2 = 17 = −13 and many others
because these numbers are all congruent to each other. We will also say that 2, 17, −13, are repre-
sentatives of the congruence class 2. 4
The set {0, 1, 2, . . . , n − 1} is not the only useful complete set of distinct representatives for
congruence modulo n. If n is odd, the set
n−1 n−1
− , . . . , −2, −1, 0, 1, 2, . . . ,
2 2
Proposition 2.2.4
Fix a modulus n. Let a, b, c, d ∈ Z such that a ≡ c and b ≡ d. Then
c − a = nk (2.7)
d − b = n`. (2.8)
(d + c) − (b + a) = nk + n` = n(k + `).
This illustrates that n | (cd − ab), which means that ab ≡ cd (mod n).
Corollary 2.2.5
Let n be a modulus and let a, b ∈ Z. If we define a + b (resp. a · b) as the set consisting of
all sums (resp. products) of an element from a with an element from b, then
a+b=a+b and a · b = a · b.
56 CHAPTER 2. NUMBER THEORY
To use the term “arithmetic” connotes the ability to do addition, multiplication, subtract, and
division; to solve equations; and to study various properties among these operations. Subtraction of
two elements is defined
def
a − b = a + (−b),
where −b is the additive inverse of b. The additive inverse of b is an element c such that b + c =
b + c = 0. We can take −b = −b. If we use {0, 1, 2, . . . , n − 1} as the complete set of distinct
representatives, then we would write −b = n − b.
However, as the multiplication table for Z/6Z in Example 2.2.6 illustrates, there exist nonzero el-
ements that do not have multiplicative inverses. This is just one of the differences between arithmetic
in Z and Q and modular arithmetic.
Definition 2.2.8
If a has a multiplicative inverse, it is called a unit. We denote the set of units in Z/nZ as
Proposition 2.2.9
As sets, U (n) = {a ∈ Z/nZ | gcd(a, n) = 1}.
Proof. Suppose that ac = 1. Then ac ≡ 1 (mod n) and so there exists k ∈ Z such that ac = 1 + kn.
Thus, ac − kn = 1. Hence, there is a linear combination of a and n that is 1. The number 1 is the
least positive integer so by Proposition 2.1.12, gcd(a, n) = 1. So far, this shows that if a is a unit
modulo n, then a is relatively prime to n.
To show the converse, suppose now that gcd(a, n) = 1. Then again by Proposition 2.1.12, there
exists s, t ∈ Z such that sa + tn = 1. Then sa = 1 − tn and so sa ≡ 1 (mod n). The proposition
follows.
Corollary 2.2.10
The number of units in Z/nZ is |U (n)| = ϕ(n) (Euler’s totient function).
Proof. We use {0, 1, 2, . . . , n − 1} as a complete set of distinct representatives for congruence modulo
n. Euler’s totient function ϕ(n) gives the number of integers a ∈ {0, 1, 2, . . . , n − 1} such that
gcd(a, n) = 1. Hence, by Proposition 2.2.9, we have |U (n)| = ϕ(n).
Note that it does not make sense to say, for example, that the inverse of 2 modulo 5 is 21 . The
fraction 21 is a specific element in Q. The following sentences are proper. In Q, 2−1 = 12 . In Z/5Z,
−1 −1
2 = 3. In Z/6Z, 2 does not exist.
Finding the inverse of a in Z/nZ is not easy, especially for large values of n. If n is small, then
we can find an inverse by inspection. The proof of Proposition 2.2.9 shows that s = a−1 in the linear
combination
sa + tn = 1,
which must hold for some integers s and t if a has an inverse modulo n. The Extended Euclidean
Algorithm described in Example 2.1.13 provides a method to find such s and t.
58 CHAPTER 2. NUMBER THEORY
Example 2.2.11. We look for the inverse of 79 in Z/123Z. We write the Euclidean Algorithm and,
to the right, the Extended Euclidean Algorithm applied to 123 and 79. (The following should be
read top to bottom down the left half and then bottom to top on the right half.)
According to Proposition 2.2.9, that the Euclidean Algorithm leads to gcd(123, 79) = 1 establishes
that 79 is a unit in Z/123Z. The identity 1 = 123 × 9 − 79 × 14 gives that 1 ≡ −14 × 79 ≡ 109 × 79
−1
(mod 123). Thus, in Z/123Z, we have 79 = 109. 4
−1 −1
Example 2.2.12. Let n = 13. We calculate 3 (6 − 11). First note that 3 = 9 because 3 × 9 =
27 ≡ 1 (mod 13). Thus,
−1
3 (6 − 11) = 9 × (6 − 11) = 9(−5) = −45 = −6 = 7. 4
Example 2.2.13. Consider the equation 3x + 7 ≡ 5 (mod 11). We can solve it by searching ex-
haustively through {0, 1, . . . , 10} to find which values of x solve it. Otherwise, we can solve it using
standard methods of algebra as follows.
3x + 7 ≡ 5 ⇐⇒ 3x ≡ 5 − 7 ≡ 9
⇐⇒ 4 × 3x ≡ 4 × 9 4 is the inverse of 3 modulo 11
⇐⇒ x ≡ 3
All integers that solve the congruence equation are x ≡ 3 (mod 11). 4
Example 2.2.14. Suppose we are in Z/15Z. We show how to solve the equation 7x + 10 = y.
Note first that 2 · 7 = 14 = −1. So −2 = 13 is the multiplicative inverse of 7 modulo 15. Now
we have
7x + 10 = y =⇒ 7x = y − 10 =⇒ −2 · 7x = −2(y − 10) =⇒ x = −2y + 20 = 13y + 5. 4
If an integer a is greater than 2 in absolute value, then in Z the absolute value of powers ak
increase without bound. However, since Z/nZ is a finite set, powers of elements in Z/nZ demonstrate
interesting patterns.
Example 2.2.15. We calculate the powers of 2 and 3 in Z/7Z.
k 0 1 2 3 4 5 6 7 8
k
2 1 2 4 1 2 4 1 2 4
k
3 1 3 2 6 4 5 1 3 2
We notice that the powers follow a repeating pattern. This is because if a ∈ Z/nZ and ak = ak+l ,
then
ak+2l = ak+l al = ak al = ak+l = ak
and, by induction, we can prove that ak+ml = ak for all m ∈ N. Observing the pattern for 3, we see
for example that, in congruences,
Therefore, using congruences, we have easily calculated the remainder of 33201 when divided by 7,
without ever calculating 33201 , which has blog10 (33201 )c + 1 = b3201 log10 3c = 1, 528 digits. 4
2.2. MODULAR ARITHMETIC 59
Some of the patterns in powers of a number in a given modulus are not always easy to detect.
The following theorem gives a general result.
Proof. By Proposition 2.2.9 and since a prime p is relatively prime to any positive integer less than
itself, then p - a if and only if a is a unit modulo p. Note that |U (p)| = p − 1.
Consider the sequence of congruence classes a, a2 , a3 , . . . This infinite sequence stays in (Z/pZ)×
and so the sequence must repeat some terms. Thus, there exist i, j ∈ N such that aj = ai . Then
multiplying both sides by (a−1 )i , we get
aj (a−1 )i = 1 ⇐⇒ aj−i = 1.
Let k be the smallest positive integer such that ak = 1. Note that the elements {1, a, a2 , . . . , ak−1 }
are all distinct.
Define the equivalence relation on U (p) by b ∼ c to mean b = caj for some j ∈ Z. Each equivalence
class consists of the form {c, ca, ca2 , . . . , cak−1 } for some c and in particular has k elements. Since
equivalence classes partition U (p), the fraction (p − 1)/k counts the number of equivalence classes
and hence is an integer. Thus, k | (p − 1) and hence,
ap−1 = 1.
As a point of notation, the notation Z/nZ comes from quotient sets (discussed in Section 1.3.1)
and is consistent with quotient groups (discussed in Section 4.3). However, if p is a prime number,
then Z/pZ has the special structure of a field. (See Definition 5.1.22.) Because of the particular
importance of modular arithmetic over a prime, we also denote Z/pZ by Fp .
15. Show that the powers of 7 in Z/31Z account for exactly half of the elements in U (31).
16. Show that a number is divisible by 11 if and only if the alternating sum of its digits is divisible by
11. (An alternating sum means that we alternate the signs in the sum + − + − . . ..) [Hint: 10 ≡ −1
(mod 11).]
17. Prove that if n is odd then n2 ≡ 1 (mod 8).
18. Show that the difference of two consecutive cubes (an integer of the form n3 ) is never divisible by 3.
19. Use Fermat’s Little Theorem to determine the remainder of 734171 modulo 13.
20. Find the units digit of 78357 .
21. Let {bn }n≥1 be the sequence of integers defined by b1 = 1, b2 = 11, b3 = 111, and in general
n digits
z }| {
bn = 111 · · · 1.
Prove that for all prime numbers p different from 2 or 5, there exists a positive n such that p | bn .
22. Show that 3 | n(n + 1)(n + 2) for all integers n.
p
23. Let p be a prime. Prove that p divides the binomial coefficient k
for all k with 1 ≤ k ≤ p − 1. Use
the binomial theorem to conclude that
where t1 ≡ n−1
2 (mod n1 ) and t2 ≡ n−1
1 (mod n2 ).
25. Apply the result of Exercise 2.2.24 to solve the system
(
x ≡ 2 (mod 9)
x ≡ 4 (mod 11).
m
27. Prove that if ac ≡ bc (mod m) then a ≡ b (mod d
) where d = gcd(m, c).
28. Consider the sequence of integers {cn }n≥0 defined by
2.3
Mathematical Induction
2.3.1 – Weak Induction
In logic, a predicate P (x) is a statement that is true or false depending on the specific instance of
the variable x. For example, the algebraic expression “x ≥ 1” is a predicate because it is neither
true or false in itself but has a truth value that depends on the numerical value of x. We say that a
predicate P (x) is instantiated when x is given a value.
2.3. MATHEMATICAL INDUCTION 61
Proof. Without loss of generality, we prove Theorem 2.3.1 with the assumption that n0 = 0. A
linear shift in the meaning of the predicate P then establishes the general statement with arbitrary
n0 .
Let P (n) be a predicate on the integers such that P (0) is true and such that P (n) implies
P (n + 1). Let S = {n ∈ N | P (n) is true} and consider the set S = N − S.
Suppose that S 6= ∅. Then by the well-ordering property, S contains a least element m. Note
that m 6= 0 because 0 ∈ S. Then m − 1 ∈ S so P (m − 1) is true. But then P (m) = P ((m − 1) + 1)
is true, so m ∈ S. This is a contradiction. Hence, S = ∅, S = N and P (n) is true for all n ≥ 0.
Section 2.1.1 pointed out that there are a few different but ultimately equivalent sets of axioms
for the integers. Some formulations, for example Peano’s axioms for the integers [59, Section 2.1],
have the principle of induction as an axiom and prove the well-ordered property on the nonnegative
integers as a theorem.
A common mental image for induction is a chain of dominoes that begins at a certain spot but
continues ad infinitum. Suppose that the n0 th domino falls and suppose also that if one domino falls
then the subsequent one falls. Then the n0 th domino and all dominoes after it fall. In Figure 2.1,
the first domino to fall is the second one, but all subsequent dominoes fall as well.
The Principle of Induction as stated above is also called weak induction in contrast to strong
induction, discussed in Section 2.3.2. In an induction proof where n0 is given, we call the step
of proving P (n0 ) the basis step. The basis step is usually easy, especially when it requires just a
calculation check. The part of an induction proof that involves proving P (n) → P (n + 1) for all
n ≥ n0 is called the induction step. During the induction step, one commonly refers to P (n) as the
induction hypothesis.
Set n = 1. The left-hand side of (2.12) is 12 = 1 while the right-hand side is 1×2×3
6 = 1. The formula
holds. (This is the basis step.) Now suppose that (2.12) is true for some n ≥ 1. Then
n+1 n
!
X
2
X
2 n(n + 1)(2n + 1)
i = i + (n + 1)2 = + (n + 1)2 ,
i=1 i=1
6
62 CHAPTER 2. NUMBER THEORY
At first glance, the principle of strong induction appears more powerful than the first principle
of induction. The conjunctive statement
Q(n0 ) and Q(n0 + 1) and · · · and Q(n)
is false not only when Q(n) is false but when any of the other instantiated predicates are false. A
conditional statement p → q, meaning “if p then q” or “p implies q,” is false when p is true but q is
false and is true otherwise. Hence, the conditional statement
Q(n0 ) and Q(n0 + 1) and · · · and Q(n) implies Q(n + 1)
will be false if all of the Q(k) with n0 ≤ k ≤ n are true and Q(n + 1) is false, whereas
Q(n) implies Q(n + 1)
is false only when Q(n) is true and Q(n + 1) is false. Thus, the induction step of strong induction
is less likely to occur than the induction step of weak induction.
However, we can see that the strong induction and weak induction are in fact the same. If the
induction hypothesis holds in weak induction, then the strong induction hypothesis holds. On the
other hand, by setting the predicate P (n) to be “Q(k) is true for all integers k with n0 ≤ k ≤ n,”
we see that the principle of strong induction is simply a problem in weak induction.
Example 2.3.6. Consider the sequence {an }n≥0 defined by a0 = 0, a1 = 1, and an+2 = an+1 + 2an
for all n ≥ 0. We prove that for all n ≥ 0,
1 n
an = (2 − (−1)n ) . (2.14)
3
Notice that (2.14) holds for n = 0 and n = 1. We will use n = 1 as our basis step. For the induction
step, assume that (2.14) is true for all indices k with 0 ≤ k ≤ n. Note that if n = 0, the induction
step is true because Q(0) and Q(1) is true so the induction hypothesis is true. Hence, a general
proof is needed only for n ≥ 1. According to the induction hypothesis,
1 n 1 n−1
(2 − (−1)n ) − (−1)n−1 .
an = and an−1 = 2
3 3
Thus,
1 n 2 n−1
(2 − (−1)n ) + − (−1)n−1
an+1 = an + 2an−1 = 2
3 3
1 n n−1
1
= 2 +2·2 − (−1) − 2(−1)n−1 = (2n + 2n − (−1)n + 2(−1)n )
n
3 3
1 n+1 n
1 n+1 n+1
= 2 + (−1) = 2 − (−1) .
3 3
This proves the induction hypothesis and hence, by strong induction, (2.14) is true for all n ≥ 0. 4
In Example 2.3.6, the proof of the induction hypothesis only required using the formula for Q(n)
and Q(n − 1) to establish Q(n + 1). However, since the proof used more than just the one previous
step Q(n) to establish Q(n + 1), it is not a (weak) induction proof.
The following proposition is important in its own right for future sections. We provide it here
because its proof relies on strong induction.
Proposition 2.3.7
Let S be a set and let ? be a binary operation on S that is associative. In an operation
expression with a finite number of terms,
a1 ? a2 ? · · · ? an with n ≥ 3, (2.15)
all possible orders in which we pair operations (i.e., parentheses orders) are equal.
64 CHAPTER 2. NUMBER THEORY
Proof. Before starting the proof, we define a temporary but useful notation. Given a sequence
X
a1 , a2 , . . . , ak of elements in S, by analogy with the notation, we define
def
Fki=1 ai = (· · · ((a1 ? a2 ) ? a3 ) · · · ak−1 ) ? ak .
In this notation, we perform the operations in (2.15) from left to right. Note that if k = 1, the
expression is equal to the element a1 .
We prove by (strong) induction on n, that every operation expression in (2.15) is equal to Fni=1 ai .
The basis step with n ≥ 3 is precisely the assumption that ? is associative.
We now assume that the proposition is true for all integers k with 3 ≤ k ≤ n. Consider an
operation expression (2.15) involving n + 1 terms. Suppose without loss of generality that the last
operation performed occurs between the jth and (j + 1)th term, i.e.,
j terms n−j terms
z }| { z }| {
q = (operation expression1 ) ? (operation expression2 ).
Since both operation expressions involve n terms or less, by the induction hypothesis
q = Fji=1 ai ? Fni=j+1 ai .
Furthermore,
q = Fji=1 ai ? aj+1 ? Fni=j+2 ai
by the induction hypothesis
= Fji=1 ai ? aj+1 ? Fni=j+2 ai
by associativity
= Fj+1 n
i=1 ai ? Fi=j+2 ai .
q = Fn+1
i=1 ai .
We show the reverse inclusion S ⊆ gcd(a, b)Z as follows. Note that 0 ∈ gcd(a, b)Z. Furthermore,
if x ∈ gcd(a, b)Z, then by properties of divisibility, all four integers x + a, x − a, x + b, x − b will be
divisible by gcd(a, b). Hence, every integer in S satisfies the property of being divisible by gcd(a, b),
and thus S ⊆ gcd(a, b)Z.
We can now conclude that S = gcd(a, b)Z . 4
We should note that the definition of S in Example 2.3.8 differs from both of the standard ways
to define sets as presented in Section 1.1.1. A recursive definition of a subset S in a context set U
neither explicitly lists all the elements of S nor provides a property that can be immediately tested
on each element of U . Instead a recursive definition contains a basis step and a recursive step of the
following form.
Basis Step. a1 , a2 , . . . , ak ∈ S (for some specific elements in U );
Recursion Step. if x1 , x2 , . . . , xm ∈ S, then
which we can also think of as the smallest set (by inclusion) that satisfies both the basis step and
the recursion step of the recursive definition.
4. Prove that 1 + nh ≤ (1 + h)n that for all h ≥ −1 and for nonnegative integers n ≥ 0.
5. Prove that 5|(n5 − n) for all nonnegative integers n in the following two ways:
(a) Using Fermat’s Little Theorem.
(b) By induction on n.
n
X
6. Prove by induction that (2i + 1) = (n + 1)2 .
i=0
66 CHAPTER 2. NUMBER THEORY
In Exercises 2.3.7 through 2.3.11, let {fn }n≥0 be the sequence of Fibonacci numbers.
7. Prove that fn−1 fn+1 − fn2 = (−1)n for all n ≥ 1.
8. Prove that fn2 + fn−1
2
= f2n−1 for all n ≥ 1.
Xn
9. Prove that fi = fn+2 − 1.
i=0
n+1 2n−1
13. Prove that 13 divides 3 +4 for all n ≥ 1.
14. A set of lines in the plane is said to be in general position if no two lines are parallel and no three lines
intersect at a single point. Prove that for any set {L1 , L2 , . . . , Ln } of lines in R2 in general position,
the complement R2 − (L1 ∪ L2 ∪ · · · ∪ Ln ) consists of (n2 + n + 2)/2 disjoint regions in the plane.
1 1 1
15. Let Hn = 1 + + + · · · + be the nth harmonic number. Prove that H2n ≤ 1 + n.
2 3 n
16. Prove that n! ≤ nn−1 for all positive integers n.
Xn
17. Show that i(i!) = (n + 1)! − 1.
i=1
18. Show that any amount of postage of value 48 cents or higher can be formed using just 5-cent and
12-cent stamps.
19. Let A1 , A2 , . . . , An and B be sets. Use mathematical induction to prove that
2.4
Projects
Project I. Sums of Powers and Divisibility. Revisit Exercise 2.3.13. Attempt to generalize
this result in as many ways as possible.
√
Project II. A Diophantine Equation. It is not hard to show that 2 is not a rational number.
(See Exercise 2.1.22.) In other words, the equation
a2 − 2b2 = 0
has no roots for a and b as nonzero integers. In this project, consider instead a modified
equation. Attempt to find integer solutions to
Try to find some individual solutions. It is known that there exists an infinite number of
solutions to any of the three equations. Attempt to find an infinite number of solutions by
2.4. PROJECTS 67
giving a pattern that generates some. Try to prove that you have all of them. Consider other
modifications to the problem as you deem interesting.
[Note: A Diophantine equation is an equation in which we attempt to find all the solutions of
an equation where the variables are assumed to take on only integer solutions.]
Project III. Prime Factorization. Without consulting other sources, use the theorems in this
section to write an algorithm to obtain the prime factorization of a positive integer. Attempt
to make your algorithm the most time-efficient as possible and argue why it is time-efficient.
Project IV. Properties of ϕ(2n − 1). Try to discern patterns in the sequence of numbers ϕ(2n −
1). If you find pattern attempt to generalize with numbers of the form ϕ(an − 1). If you find
conjectures attempt to prove them. Can you generalize your results even further?
for some integers A and B. The relation (2.16) is called a second-order recurrence relation.
Try to find other sequences defined by second-order recurrence relations that are strong
divisibility sequences.
(4) Try to find an infinite family of strong divisibility sequences that are second-order recur-
rence relations.
3. Groups
As a field in mathematics, group theory did not develop in the order that this and subsequent
chapters follow. The first definition of a group is generally credited to Evariste Galois. As he
studied methods to find the roots of polynomials of high degree, he considered polynomials with
symmetries in their roots and functions on the set of roots that preserved those symmetries. Galois’
approach to studying polynomials turned out to be exceedingly fruitful and this book covers Galois
theory in Chapter 11.
As the dust settled and mathematicians separated the concept of a group from Galois’ application,
mathematicians realized two things. First, groups occur naturally in many areas of mathematics.
Second, group theory presents many challenging problems and profound results in its own right.
Like modern calculus texts that slowly lead to the derivative concept after a rigorous definition of
the limit, we begin with the definition of a group and methodically prove results with a view towards
as many applications as possible, rather than a single application.
In comparison to some algebraic structures presented later in the book, groups are particularly
easy to define. Whereas a poset involved a set and a relation with certain properties, a group involves
a set and one binary operation with certain properties. It may come as a surprise to some readers
that despite the brevity of the definition of a group, group theory is a vast branch of mathematics
with applications to many other areas.
In Section 3.1 we precede a general introduction to groups with an interesting example from
geometry. Section 3.2 introduces the axioms for groups and presents many elementary examples.
Section 3.3 presents some elementary properties of groups and introduces the notion of a classification
theorem. Section 3.4 introduces the symmetric group, a family of groups that plays a central role
in group theory.
Section 3.5 studies subgroups, how to describe them or how to prove that a given subset is a
subgroup, while Section 3.6 borrows from the Hasse diagrams of a partial order to provide a visual
representation of subgroups within a group. Section 3.7 introduces the concept of a homomorphism
between groups, functions that preserve the group structure. Section 3.8 introduces a particular
method to describe the content and structure of a group and introduces the fundamental notion of
a free group.
The last three sections are optional to a concise introduction to group theory. Sections 3.9
and 3.10 provide two applications of group theory, one to patterns in geometry and the other to
information security. Finally, Section 3.11 introduces the concept of a monoid, offering an example
of another not uncommon algebraic structure, similar to groups but with fewer properties.
3.1
Symmetries of the Regular n-gon
3.1.1 – Dihedral Symmetries
Let n ≥ 3 and consider a regular n-sided polygon, Pn . Recall that by regular we mean that all the
edges of the polygon are of same length and all the interior angles of edges meeting at a vertex are
equal. The set of vertices V = {v1 , v2 , . . . , vn } of the regular polygon Pn is a subset of the Euclidean
plane R2 . For simplicity, assume that the center of Pn is at the origin and that one of its vertices is
on the x-axis.
A symmetry of a regular n-gon is a bijection σ : V → V that is the restriction of a bijection
F : R2 → R2 , that leaves the overall vertex-edge structure of Pn in place, i.e., if the pair (vi , vj ) is
an edge of the regular n-gon then the pair (σ(vi ), σ(vj )) is also an edge of the n-gon.
69
70 CHAPTER 3. GROUPS
Consider, for example, a regular hexagon P6 and the bijection σ : V → V such that σ(v1 ) = v2 ,
σ(v2 ) = v1 and σ stays fixed on all the other vertices. Then σ is not a symmetry of P6 . Figure 3.1
shows that σ fails to preserve the vertex-edge structure of the hexagon because for example, the
segment joining σ(v2 ) and σ(v3 ) is not an edge of the original hexagon.
v3 v2 σ(v3 ) σ(v1 )
σ
v1 σ(v2 )
v4 σ(v4 )
v5 v6 σ(v5 ) σ(v6 )
In contrast, consider the bijection τ on the vertices of the regular hexagon such that
τ (v1 ) = v2 , τ (v2 ) = v1 , τ (v3 ) = v6 , τ (v4 ) = v5 , τ (v5 ) = v4 , τ (v6 ) = v3 .
This bijection on the vertices is a symmetry of the hexagon because it preserves the edge structure of
the hexagon. Figure 3.2 shows that τ can be realized as the reflection through the line L as drawn.
v3 v2 τ (v6 ) τ (v1 )
L τ
v1 τ (v2 )
v4 τ (v5 )
v5 v6 τ (v4 ) τ (v3 )
Definition 3.1.1
We denote by Dn the set of symmetries of the regular n-gon and call it the set of dihedral
symmetries.
To count the number of bijections on the set V = {v1 , v2 , . . . , vn }, we note that a bijection
f : V → V can map
f (v1 ) to any element in V,
f (v2 ) to any element in V − {f (v1 )},
..
.
f (vn ) to any element in V − {f (v1 ), f (v2 ), . . . , f (vn−1 )}.
Hence, there are
n × (n − 1) × (n − 2) × · · · × 2 × 1 = n!
3.1. SYMMETRIES OF THE REGULAR N-GON 71
Proposition 3.1.2
The cardinality of Dn is |Dn | = 2n.
It is not difficult to identify these symmetries by their geometric meaning. Half of the symmetries
in Dn are rotations that shift vertices k spots counterclockwise, for k ranging from 0 to n − 1. We
denote by Rα the rotation symmetry of angle α. The rotation R2πk/n of angle 2πk/n on the set of
vertices, performs
R2πk/n (vi ) = v((i−1+k) mod n)+1 for all 1 ≤ i ≤ n.
Note that R0 is the identity function on V .
As remarked above, regular n-gons possess symmetries that correspond to reflections through
lines that pass through the center of the polygon. Assuming the regular n-gon is centered at the origin
and with a vertex on the x-axis, then there are n distinct reflection symmetries, each corresponding
to a line through the origin and making an angle of πk/n with the x-axis, for 0 ≤ k ≤ n − 1. The
reflection symmetry in Figure 3.2 is through a line that makes an angle of π/6 with respect to the
x-axis. We denote by Fβ the reflection symmetry through the line that makes an angle of β with
the x-axis.
Since |Dn | = 2n, the rotations and reflections account for all dihedral symmetries.
If two bijections on V preserve the polygon structure, then their composition does as well. Con-
sequently, the function composition of two dihedral symmetries is again another dihedral symmetry
and thus ◦ is a binary operation on Dn . However, having listed the dihedral symmetries as rota-
tions or reflections, it is interesting to determine the result of the composition of two symmetries as
another symmetry.
First, it is easy to see that rotations compose as follows:
Rα ◦ Rβ = Rα+β ,
where we subtract 2π from α + β if α + β ≥ 2π. However, the composition of a given rotation and
a given reflection or the composition of two reflections is not as obvious. There are various ways
to calculate the compositions. The first is to determine how the function composition acts on the
vertices. For example, let n = 6 and consider the compositions of R2π/3 and Fπ/6 .
vi v1 v2 v3 v4 v5 v6
R2π/3 (vi ) v3 v4 v5 v6 v1 v2
Fπ/6 (vi ) v2 v1 v6 v5 v4 v3
(Fπ/6 ◦ R2π/3 )(vi ) v6 v5 v4 v3 v2 v1
(R2π/3 ◦ Fπ/6 )(vi ) v4 v3 v2 v1 v6 v5
From this table and by inspection on how the compositions act on the vertices, we determine that
We notice with this example that the composition operation is not commutative.
Another approach to determining the composition of elements comes from linear algebra. Ro-
tations about the origin by an angle α and reflections through a line through the origin making an
angle β with the x-axis are linear transformations. With respect to the standard basis, these two
types of linear transformations respectively correspond to the following 2 × 2 matrices
cos α − sin α cos 2β sin 2β
Rα : and Fβ : . (3.1)
sin α cos α sin 2β − cos 2β
72 CHAPTER 3. GROUPS
For example, let n = 6 and consider the composition of Fπ/6 and Fπ/3 . The composition
symmetry corresponds to the the matrix product
Whether we use one method or the other, the following table gives the composition a ◦ b for the
symmetries of the hexagon.
a\b R0 Rπ/3 R2π/3 Rπ R4π/3 R5π/3 F0 Fπ/6 Fπ/3 Fπ/2 F2π/3 F5π/6
R0 R0 Rπ/3 R2π/3 Rπ R4π/3 R5π/3 F0 Fπ/6 Fπ/3 Fπ/2 F2π/3 F5π/6
Rπ/3 Rπ/3 R2π/3 Rπ R4π/3 R5π/3 R0 Fπ/6 Fπ/3 Fπ/2 F2π/3 F5π/6 F0
R2π/3 R2π/3 Rπ R4π/3 R5π/3 R0 Rπ/3 Fπ/3 Fπ/2 F2π/3 F5π/6 F0 Fπ/6
Rπ Rπ R4π/3 R5π/3 R0 Rπ/3 R2π/3 Fπ/2 F2π/3 F5π/6 F0 Fπ/6 Fπ/3
R4π/3 R4π/3 R5π/3 R0 Rπ/3 R2π/3 Rπ F2π/3 F5π/6 F0 Fπ/6 Fπ/3 Fπ/2
R5π/3 R5π/3 R0 Rπ/3 R2π/3 Rπ R4π/3 F5π/6 F0 Fπ/6 Fπ/3 Fπ/2 F2π/3
F0 F0 F5π/6 F2π/3 Fπ/2 Fπ/3 Fπ/6 R0 R5π/3 R4π/3 Rπ R2π/3 Rπ/3
Fπ/6 Fπ/6 F0 F5π/6 F2π/3 Fπ/2 Fπ/3 Rπ/3 R0 R5π/3 R4π/3 Rπ R2π/3
Fπ/3 Fπ/3 Fπ/6 F0 F5π/6 F2π/3 Fπ/2 R2π/3 Rπ/3 R0 R5π/3 R4π/3 Rπ
Fπ/2 Fπ/2 Fπ/3 Fπ/6 F0 F5π/6 F2π/3 Rπ R2π/3 Rπ/3 R0 R5π/3 R4π/3
F2π/3 F2π/3 Fπ/2 Fπ/3 Fπ/6 F0 F5π/6 R4π/3 Rπ R2π/3 Rπ/3 R0 R5π/3
F5π/6 F5π/6 F2π/3 Fπ/2 Fπ/3 Fπ/6 F0 R5π/3 R4π/3 Rπ R2π/3 Rπ/3 R0
(3.2)
From this table, we can answer many questions about the composition operator on D6 . For
example, if asked what f ∈ Dn satisfies R2π/3 ◦ f = F5π/6 , we simply look in the row corresponding
to a = R2π/3 for the b that gives the composition of F5π/6 . A priori, without any further theory,
there does not have to exist such an f , but in this case there does and f = Fπ/2 .
Some other properties of the composition operation on Dn are not as easy to identify directly
from the table in (3.2). For example, by Proposition 1.1.15, ◦ is associative on Dn . Verifying
associativity from the table in (3.2) would require checking 123 = 1, 728 equalities. Also, ◦ has an
identity on Dn , namely R0 . Indeed R0 is the identity function on V . Finally, every element in Dn
has an inverse: the inverse to R2πk/n is R2π(n−k)/n and the inverse to Fπk/n is itself. We leave the
proof of the following proposition as an exercise.
Proposition 3.1.3
Let n be a fixed integer with n ≥ 3. Then the dihedral symmetries Rα and Fβ satisfy the
following relations:
In abstract notation, similar to our habit of notation for multiplication of real variables, we simply
write ab to mean a ◦ b for any two elements a, b ∈ Dn . Since ◦ is associative, by Proposition 2.3.7,
an expression such as rrsr is well-defined, regardless of the order in which we pair terms to perform
the composition. In this example, still with n = 6,
Hence, we could write, r2 sr for rrsr. It is important to note that since composition ◦ is not
commutative, r3 s is not necessarily equal to r2 sr. Finally, note that R0 is the identity function so
R0 ◦ f = f ◦ R0 = f for all f ∈ Dn . Consequently, we will denote R0 by ι to stand for the identity
function. Though we use multiplicative notation, we must continue to think of symbols as functions
and not as representing real variables.
It is not hard to see that
k times
z }| {
k
r = R2π/n ◦ R2π/n ◦ · · · ◦ R2π/n = R2πk/n .
rk s = Fπk/n ,
where k satisfies 0 ≤ k ≤ n − 1. The result of Exercise 3.1.7 proves this. Consequently, as a set
The symbols r and s have a few interesting properties. First, rn = ι and s2 = ι. These are
obvious as long as we do not forget the geometric meaning of the functions r and s. Less obvious is
the equality in the following proposition.
Proposition 3.1.4
Let n be an integer n ≥ 3. Then in Dn equipped with the composition operation,
sr = rn−1 s.
Proof. We first prove that rsr = s. By Exercise 3.1.7, the composition of a rotation with a reflection
is a reflection. Hence, (rs)r is a reflection. For any n, r maps v1 to v2 , then s maps v2 to vn , and
then r maps vn to v − 1. Hence, rsr is a reflection that keeps v1 fixed. There is only one reflection
in Dn that keeps v1 fixed, namely s.
Since rsr = s, we multiply both sides by rn−1 and get
Corollary 3.1.5
Consider the dihedral symmetries Dn with n ≥ 3. Then
srk = rn−k s.
In Figure 3.4, the shape on the left displays D7 symmetry while the shape on the right displays
D5 symmetry.
S0
S0
S S
ra sb rc sd · · ·
... ...
11. List all the symmetries and describe the compositions between them for the infinitely long pattern
shown below:
... ...
12. List all the symmetries and describe the compositions between them for the infinitely long pattern
shown below:
... ...
13. Determine the set of symmetries for each of the following shapes (ignoring shading):
76 CHAPTER 3. GROUPS
(c)
(a)
(b)
(d) (f)
(e)
14. Sketch a pattern/shape (possibly a commonly known logo) that has D8 symmetry but does not have
Dn symmetry for n > 8.
π
15. Sketch a pattern/shape (possibly a commonly known logo) that has rotational symmetry of angle 2
but does not have full D4 symmetry.
16. Consider a regular tetrahedron. We call a rigid motion of the tetrahedron any rotation or composition
of rotations in R3 that map a regular tetrahedron back into itself, though possibly changing specific
vertices, edges, and faces. Rigid motions of solids do not include reflections through a plane.
(a) Prove that there are exactly 12 rigid motions of the tetrahedron. Call this set R.
(b) Using a labeling of the tetrahedron, explicitly list all rigid motions of the tetrahedron.
(c) Explain why function composition ◦ is a binary operation on R.
(d) Write down the composition table of ◦ on R.
17. Consider the hexagonal tiling pattern on the plane drawn below.
Define r as the rotation by 60◦ about O and t as the translation that moves the whole plane one
hexagon to the right. We denote by ◦ the operation of composition on functions R2 → R2 and we
denote by r−1 and t−1 the inverse functions to r and t.
3.2. INTRODUCTION TO GROUPS 77
S0
19. Consider the diagram S 0 below. Sketch the diagram S that has S 0 as a fundamental region with (a)
D6 symmetry; (b) D3 symmetry. [Assume reflection through the x-axis is one of the reflections.]
S0
3.2
Introduction to Groups
In the preface, we claimed that abstract algebra does not study properties of just one particular
algebraic object but rather studies properties of all objects with a given algebraic structure. An alge-
braic structure typically consists of a set equipped with various properties: a relation with specified
properties, a binary operation with specified properties, or some other set theoretic construction. In
Section 1.4, we presented posets as an algebraic structure. A group is an algebraic structure that
involves a set and one binary operation with certain properties.
At first glance, it may seem arbitrary why we might deem a certain set of properties as more
important than another. However, the long list of examples we will develop, the numerous connec-
tions to other branches of math, and the fruitful areas of research in group theory have given groups
a place of prominence in mathematics.
Definition 3.2.1
A group is a pair (G, ∗) where G is a set and ∗ is a binary operation on G that satisfies the
following properties:
(1) associativity: (a ∗ b) ∗ c = a ∗ (b ∗ c) for all a, b, c ∈ G;
Proposition 1.2.7 showed that if any binary operation has an identity, then that identity is unique.
Similarly, any element in a group has exactly one inverse element.
Proposition 3.2.2
Let (G, ∗) be a group. Then for all a ∈ G, there exists a unique inverse element to a.
Proof. Let a ∈ G be arbitrary and suppose that b1 and b2 satisfy the properties of the inverse axiom
for the element a. Then
b1 = b1 ∗ e by identity axiom
= b1 ∗ (a ∗ b2 ) by inverse axiom
= (b1 ∗ a) ∗ b2 by associativity
= e ∗ b2 by definition of b1
= b2 by identity axiom.
Therefore, for all a ∈ G there exists a unique inverse.
Since every group element has a unique inverse, our notation for inverses can reflect this. We
denote the inverse element of a by a−1 .
The defining properties of a group are often called the group axioms. In logic, one often uses the
term “axiom” to mean a truth that is self-evident or for which there can exist no further justification.
That is not the sense in which we use the term axiom in this case. In algebra, when we say that
such and such are the axioms of a given algebraic structure, we mean the defining properties of the
algebraic structure.
In the group axioms, there is no assumption that the binary operation ∗ is commutative. We say
that two particular elements a, b ∈ G commute (or commute with each other) if a ∗ b = b ∗ a. The
following property is named after Niels Abel, one of the founders of group theory.
Definition 3.2.3
A group (G, ∗) is called abelian if for all a, b ∈ G, a ∗ b = b ∗ a.
Usually, the groups we encounter possess a binary operation with a natural description. Some-
times, however, it is useful or even necessary to list out all operation pairings. If (G, ∗) is a finite
group and if we label all the elements as G = {g1 , g2 , . . . , gn }, then the group Cayley table (also
operation table) is the n × n array in which the (i, j)th entry is the result of the operation gi ∗ gj .
When listing the elements in a group it is customary that g1 be the identity element of the group.
Example 3.2.7. In Section 2.2, we introduced modular arithmetic. Recall that Z/nZ represents
the set of congruence classes modulo n and that U (n) is the subset of Z/nZ of elements with
multiplicative inverses. Given any integer n ≥ 2, both (Z/nZ, +) and (U (n), ×) are groups. The
element 0 is the identity in Z/nZ and the element 1 is the identity U (n).
The tables for addition in (2.9) and (2.10) are the Cayley tables for (Z/5Z, +) and (Z/6Z, +).
By ignoring the column and row for 0 in the multiplication table in Equation (2.9), we obtain the
Cayley table for (U (5), ×).
× 1 2 3 4
1 1 2 3 4
2 2 4 1 3
3 3 1 4 2
4 4 3 2 1 4
All the examples so far are of abelian groups. We began this chapter by introducing dihedral
symmetries precisely because it offers an example of a nonabelian group.
Example 3.2.8 (Dihedral Groups). Let n ≥ 3 be an integer. The pair (Dn , ◦), where Dn is the
set of dihedral symmetries of a regular n-gon and ◦ is function composition, is a group. We call
(Dn , ◦) the nth dihedral group. Since rs = sr−1 and r−1 6= r for any n ≥ 3, the group (Dn , ◦) is
not abelian. The table given in Equation (3.2) is the Cayley table for D6 . 4
Example 3.2.9. The pair (R3 , ×), where × is the vector cross product, is not a group. First of
all × is not associative. Indeed, if ~ı, ~, and ~k are respectively the unit vectors in the x-, y-, and z-
directions, then
~ı × (~ı × ~) = ~ı × ~k = −~ 6= (~ı ×~ı) × ~ = ~0 × ~ = ~0.
Furthermore, × has no identity element. For any nonzero vector ~a and any other vector ~v , the
product ~a × ~v is perpendicular to ~a or is ~0. Hence, for no vector ~v do we have ~a × ~v = ~a. 4
Example 3.2.10. Let S be a set with at least 1 element. The pair (P(S), ∪) is not a group. The
union operation ∪ on P(S) is both associative and has an identity ∅. However, if A 6= ∅, there does
not exist a set B ⊆ S such that A ∪ B = ∅. Hence, (P(S), ∪) does not have inverses. 4
Example 3.2.11 (Matrix Groups). Let n be a positive integer. The set of n × n invertible
matrices with real coefficients is a group with the multiplication operation. In this group, the
identity is the identity matrix and the inverse of a matrix A is the matrix inverse A−1 . This group
is called the nth general linear group over R and is denoted by GLn (R).
In Section 5.3, we discuss properties of matrices in more generality. However, without yet pro-
viding the full algebraic theory, we point out that matrix operations are well-defined if we consider
matrices with only rational coefficients, matrices with complex coefficients or even matrices with
coefficients from Fp , modular arithmetic modulo a prime number p. In fact, matrix addition, matrix
multiplication, matrix inversion, and the Gauss-Jordan elimination algorithm only require that the
coefficients are in a field. (See Definition 5.1.22.)
Suppose that F represents Q, R, C, or Fp . Of key importance is the fact that an n × n matrix A
with coefficients in F is invertible if and only if the columns are linearly independent if and only if
det(A), the determinant of A, is nonzero. We denote by GLn (F ) the nth general linear group over
F and we always denote the identity matrix by I.
As an explicit example, consider GL2 (F5 ). The number of 2 × 2 matrices over F5 is 54 = 625.
However, all are invertible. To determine the cardinality GL2 (F5 ), we consider the columns of a
matrix A, which must be linearly independent. The only condition on the first column is that it is
not 0. Hence, there are 52 − 1 = 24 options for the first column. Given the first column, the only
necessary condition on the second column is that it is not a F5 -multiple of the first column. This
accounts for 52 − 5 = 20 (all columns minus the 5 multiples of the first column) options. Hence,
GL2 (F5 ) has 24 × 20 = 480 elements.
80 CHAPTER 3. GROUPS
The reader should verify that all the matrices involved in the above calculations have a determinant
in F5 that is different from 0̄. 4
The above examples give us an initial repertoire of groups from which to draw intuition. Oc-
casionally, we will also encounter methods to create new groups from old ones. The following
construction is one such method.
The direct sum generalizes to any finite number of groups. For example, the group (R3 , +) is
the triple direct sum of group (R, +) with itself.
k
We extend the power notation so that a0 = e and a−k = a−1 , for any positive integer k.
Groups that involve addition give an exception to the above habit of notation. In that case, we
always write a + b for the operation, −a for the inverse, and, if k is a positive integer,
k times
def
z }| {
k · a = a + a + · · · + a. (3.3)
We refer to k·a as a multiple of a instead of as a power. Again, we extend the notation to nonpositive
“multiples” just as above with powers.
Proposition 3.2.13
Let G be a group and let x ∈ G. For all n, m ∈ Z, the following identities hold
n
a) xm xn = xm+n b) (xm ) = xmn
The process of simply considering the successive powers of an element gives rise to an important
class of groups.
Definition 3.2.14
A group G is called cyclic if there exists an element x ∈ G such that every element of G is
a power of x. The element x is called a generator of G.
For example, we notice that for all integers n ≥ 2, the group Z/nZ (with addition as the
operation) is a cyclic group because all elements of Z/nZ are multiples of 1. As we saw in Section 2.2,
one of the main differences with usual arithmetic is that n·1 = 0. The intuitive sense that the powers
of an element cycle back motivate the terminology. The group Z (with addition) is also a cyclic
because every element in Z is n · 1 with n ∈ Z.
Example 3.2.15 (Finite Cyclic Groups). Let n be a positive integer. We denote by Zn the
group with elements {e, x, x2 , . . . , xn−1 }, where x has the property that xn = e. We point out two
things about this notation.
First, we do not define this group as existing in any previously known arithmetic context. The
element x does not represent some complex number or matrix or any other object; we have simply
defined how it operates symbolically.
Second, whether we use the variable name x or any other letter of the alphabet, the group is
the same. In fact, if a certain discussion involves two groups Zm and Zn with m 6= n, then we will
commonly use different letters for this variable name for which all other elements exist as powers.
For example, if we work with the group Z4 ⊕ Z2 we might write
We take the opportunity in this example to point out that Z4 ⊕ Z2 is not another cyclic group.
If we were to consider all the powers of all the elements (xi , y j ) with 0 ≤ i ≤ 3 and 0 ≤ j ≤ 1, we
would find that no element of the group is such that the set of all its powers gives the whole group.
For example, if g = (x3 , y), then
g = (x3 , y), g 2 = (x6 , y 2 ) = (x2 , e), g 3 = (x5 , y) = (x, y), g 4 = (x4 , y 2 ) = (e, e).
Hence, the set of powers {e, g, g 2 , . . .} contains only 4 elements and not all eight elements of Z4 ⊕Z2 .4
82 CHAPTER 3. GROUPS
Proposition 3.2.16
Let (G, ∗) be a group.
(1) The identity in G is unique.
(2) For each a ∈ G, the inverse of a is unique.
Proof. We have already seen (1) and (2) in Proposition 1.2.7 and Proposition 3.2.2, respectively.
For (3), by definition of the inverse of a we have a ∗ (a−1 ) = (a−1 ) ∗ a = e. However, this shows
that a satisfies the inverse axiom for the element a−1 .
For (4), we have
(a ∗ b)−1 ∗ (a ∗ b) = e ⇐⇒ ((a ∗ b)−1 ∗ a) ∗ b = e associativity
−1 −1 −1
⇐⇒ (((a ∗ b) ∗ a) ∗ b) ∗ b =e∗b operate on right by b−1
⇐⇒ ((a ∗ b)−1 ∗ a) ∗ (b ∗ b−1 ) = b−1 assoc and id
−1 −1
⇐⇒ ((a ∗ b) ∗ a) ∗ e = b inverse axiom
−1 −1
⇐⇒ (a ∗ b) ∗a= b identity axiom
−1 −1 −1
⇐⇒ (a ∗ b) = b ∗a .
au = av =⇒ u = v (Left cancellation)
ub = vb =⇒ u = v. (Right cancellation)
In Exercises 3.2.1 through 3.2.14, decide whether the given set and the operation pair forms a group. If it is,
prove it. If it is not, decide which axioms fail. You should always check that the symbol is in fact a binary
operation on the given set.
1. The pair (N, +).
2. The pair (Q − {−1}, +
×), where +
× is defined by a+
×b = a + b + ab.
a
3. The pair (Q − {0}, ÷), with a ÷ b = b
.
4. The pair (A, +), where A = {x ∈ Q | |x| < 1}.
5. The pair (Z × Z, ∗), where (a, b) ∗ (c, d) = (ad + bc, bd).
6. The pair ([0, 1), ), where x y = x + y − bx + yc.
7. The pair (A, +), where A is the set of rational numbers that when reduced have a denominator of 1
or 3.
√
8. The pair (A, +), where A = {a + b 5 | a, b ∈ Q}.
√
9. The pair (A, ×), where A = {a + b 5 | a, b ∈ Q}.
√
10. The pair (A, ×), where A = {a + b 5 | a, b ∈ Q and (a, b) 6= (0, 0)}.
11. The pair (U (20), +).
12. The pair (P(S), 4), where S is any set and 4 is the symmetric difference of two sets.
13. The pair (G, ×), where G = {z ∈ C |z| = 1}.
14. The pair (D, ∗), where D is the set of open disks in R2 , including the empty set ∅, and where D1 ∗ D2
is the unique open disk of least radius that encloses both D1 and D2 .
15. Show that Z5 ⊕ Z2 is cyclic.
16. Show that Z4 ⊕ Z2 is not cyclic.
17. Prove Proposition 3.2.13. [Hint: Pay careful attention to when powers are negative, zero, or positive.]
18. Is U (11) a cyclic group?
19. Is U (10) a cyclic group?
20. Prove that (Q, +) is not a cyclic group.
21. Construct the Cayley table for U (15).
22. Construct the Cayley table for Z3 ⊕ Z3 .
23. Prove that a group is abelian if and only if its Cayley table is symmetric across the main diagonal.
24. Prove that S = {2a 5b | a, b ∈ Z} as a subset of rational numbers is a group under multiplication.
25. Prove that the set {1, 13, 29, 41} is a group under multiplication modulo 42.
26. Prove that if xn = e, then x−1 = xn−1 .
27. Let A and B be groups. Prove that the direct sum A ⊕ B is abelian if and only if A and B are both
abelian.
28. Prove that if a group G satisfies x2 = e for all x ∈ G, then G is abelian.
29. Prove that if a group G satisfies (xy)−1 = x−1 y −1 for all x, y ∈ G, then G is abelian.
84 CHAPTER 3. GROUPS
3.3
Properties of Group Elements
As we progress through group theory, we will encounter more and more internal structure to groups
that is not readily apparent from the three axioms for groups. This section introduces a few ele-
mentary properties of group operations.
Definition 3.3.1
Let G be a group.
(1) If G is finite, we call the cardinality of |G| the order of the group.
(2) Let x ∈ G. If xk = e for some positive integer k, then we call the order of x, denoted
|x|, the smallest positive value of n such that xn = e. If there exists no positive n
such that xn = e, then we say that the order of x is infinite.
Note that the order of a group element g is |g| = 1 if and only if g is the group’s identity element.
As a reminder, we list the orders of a few groups we have encountered so far:
|Dn | = 2n, (Proposition 3.1.2)
|Zn | = n,
|U (n)| = φ(n), (Corollary 2.2.10)
| GLn (Fp )| = (p − 1)(pn − p)(pn − p2 ) · · · (pn − pn−1 ).
n
(Exercise 3.2.32)
3.3. PROPERTIES OF GROUP ELEMENTS 85
Example 3.3.2. Consider the group G = (Z/20Z, +). We calculate the orders of 5̄ and 3̄.
For 5̄, we calculate directly that
k · 3̄ = 3k for 0 ≤ k ≤ 6,
k · 3̄ = 3(k − 7) + 1 for 7 ≤ k ≤ 13,
k · 3̄ = 3(k − 14) + 2 for 14 ≤ k ≤ 20.
This shows that the first positive integer k such that k · 3̄ = 0̄ is 20. Hence, |3̄| = 20. 4
Example 3.3.3. Consider the group GL2 (F3 ) and we calculate the order of
2 1
g=
1 0
The orders of elements in groups will become a particularly useful property to consider in a
variety of situations. We present a number of propositions concerning the powers and orders of
elements.
Proposition 3.3.4
Let G be any group and let x ∈ G. Then x−1 = |x|.
Proposition 3.3.5
Let x ∈ G with xn = e and xm = e, then xd = e where d = gcd(m, n).
Proof. From Proposition 2.1.12, the greatest common divisor d = gcd(m, n) can be written as a
linear combination sm + tn = d, for some s, t ∈ Z. Then
Corollary 3.3.6
Suppose that x is an element of a group G with xm = e. Then the order |x| divides m.
86 CHAPTER 3. GROUPS
Proposition 3.3.7
Let G be a group and let a ∈ N∗ . Then we have the following results about orders:
Proof. For (1), suppose that xa had finite order k. Then (xa )k = xak = e. This contradicts |x| = ∞.
Hence, xa has infinite order.
For (2), let y = xa and d = gcd(n, a). Writing n = db and a = dc, by Proposition 2.1.11, we
know that gcd(b, c) = 1. Then
By Corollary 3.3.6, the order |y| divides b. Conversely, suppose that |y| = k. Then k divides b.
Since y k = xak = e, then n|ak. Thus db|dck, which implies that b|ck. However, gcd(b, c) = 1, so we
conclude that b|k. However, since b|k and k|b and b, k ∈ N∗ , we can conclude that b = k.
Hence, |y| = |xa | = b = n/d = n/ gcd(a, n).
Proposition 3.3.7 presents two noteworthy cases. If gcd(a, n) = 1, then |xa | = n. Second, if a
divides n, then |xa | = n/a.
It is important to point out that in a general group, there is very little that can be said about the
relationship between |xy|, |x|, and |y| for two elements x, y ∈ G. For example, consider the dihedral
group Dn . Both s and rs refer to reflections through lines and hence have order 2. However,
s(sr) = r, and r has order n. Thus, given any integer n, there exists a group where |g1 | = 2 and
|g2 | = 2 and |g1 g2 | = n.
Example 3.3.8 (Infinite Dihedral Group). As a more striking example, consider the group D∞
that contains elements labeled as x and y with x2 = ι and y 2 = ι and no other conditions. Since
there are no other conditions between the elements, the element xy has infinite order. Hence, all
elements of the group are of the form
for n ∈ Z. In fact, by Exercise 3.3.19, all elements can be written as (xy)n or (xy)n x. We denote
this group by D∞ because it is called the infinite dihedral group. 4
There is a particular case in which we can calculate the orders of a product of elements from the
orders of the original elements.
Theorem 3.3.9
Let G1 , G2 , . . . , Gn be groups and let (g1 , g2 , . . . , gn ) ∈ G1 ⊕ G2 ⊕ · · · ⊕ Gn be an element
in the direct sum. Then the order of (g1 , g2 , . . . , gn ) is
Proof. We have (g1 , g2 , . . . , gn )m = (e1 , e2 , . . . , en ) if and only if gim = ei for all gi ∈ Gi . Note that
each symbol ei represents the identity in each of the group Gi . By Corollary 3.3.6, |gi | divides m
for all i = 1, 2, . . . , n. Thus, lcm(|g1 |, |g2 |, . . . , |gn |) divides m.
3.3. PROPERTIES OF GROUP ELEMENTS 87
Suppose now that k is the order of (g1 , g2 , . . . , gn ), namely the least positive integer m such
that (g1 , g2 , . . . , gn )m = (e1 , e2 , . . . , en ). Then lcm(|g1 |, |g2 |, . . . , |gn |) divides k = |(g1 , g2 , . . . , gn )|.
However, lcm(|g1 |, |g2 |, . . . , |gn |) is a multiple of |gi | for each i, and hence,
(g1 , g2 , . . . , gn )lcm(|g1 |,|g2 |,...,|gn |) = (e1 , e2 , . . . , en ).
Hence, by Corollary 3.3.6, k divides lcm(|g1 |, |g2 |, . . . , |gn |). Since lcm(|g1 |, |g2 |, . . . , |gn |) and k are
positive numbers that divide each other, they are equal. The theorem follows.
This group is often denoted by V4 and called the Klein-4 group. Our approach covered all possible
cases for orders of elements in a group of order 4 so we conclude that Z4 and V4 are the only two
groups of order 4. 4
The conclusion of the previous example might seem striking at first. In particular, we already
know two groups of order 4 that have different properties: Z4 and Z2 ⊕ Z2 . Consequently, V4 and
Z2 ⊕ Z2 must in some sense be the same group. However, we do not yet have the background to
fully develop an intuitive concept of sameness for groups. We return to that issue in Section 3.7.
We will discuss classification theorems then and at various points in later sections.
Though not fully cast as a classification question, the next two examples illustrate how we can
discover new groups through abstract reasoning as used in the previous example.
Example 3.3.11. Suppose that G is a group of order 8 that contains an element x of order 4. Let
y be another element in G that is distinct from any power of x. With these criteria, we know so far
that G contains the distinct elements e, x, x2 , x3 , y. The element xy cannot be
e because that would imply y = x3 ;
x because that would imply y = e;
x2 because that would imply y = x;
x3 because that would imply y = x2 ;
y because that would imply x = e.
So xy is a new element of G. By similar reasonings which we leave to the reader, the elements x2 y
and x3 y are distinct from all the others. Hence, G must contain the 8 distinct elements
Now let us assume that |y| = 2. Consider the question of the value of yx. By the identical
reasoning by cases provided above, yx cannot be e, x, x2 , x3 , or y. Thus, there are three cases: (1)
yx = x3 y; (2) yx = xy; and (3) yx = x2 y.
Case 1. If yx = x3 y, then the group is in fact D4 , the dihedral group of the square, where x serves
the role of r and y serves the role of s.
We conclude that x = e, which contradicts the assumption that x has order 4. Hence, there
exists no group of order 8 with an element x of order 4 and an element y of order 2 with
yx = x2 y.
Assume now that |y| = 3. Consider the element y 2 in G. A quick proof by cases shows that y 2
cannot be any of the eight distinct elements listed in (3.4). Hence, there exists no group of order 8
containing an element of order 4 and one of order 3.
Assume now that |y| = 4. Again, we consider the possible value of y 2 . If there exists a group with
all the conditions we have so far, then y 2 must be equal to an element in (3.4). Now |y 2 | = 4/2 = 2
so y 2 cannot be e, x, x3 , or y, which have orders 1, 4, 4, 4, respectively. Furthermore, y 2 cannot be
equal to xy, (respectively x2 y or x3 y) because that would imply x = y (respectively x2 = y or
x3 = y), which is against the assumptions on x and y. We have not ruled out the possibility that
y 2 = x2 .
We focus on this latter possibility, namely a group G containing x and y with |x| = 4, |y| = 4,
y∈ / {e, x, x2 , x3 } and x2 = y 2 . If we now consider possible values of the element yx, we can quickly
eliminate all possibilities except xy and x3 y. If G = Z4 ⊕ Z2 = {(z, w) | z 4 = e and w2 = e}, then
3.3. PROPERTIES OF GROUP ELEMENTS 89
setting x = (z, e) and y = (z, w), it is easy to check that Z4 ⊕ Z2 satisfies x2 = y 2 and yx = xy. On
the other hand, if yx = xy 3 , then G is a nonabelian group in which x, x3 , y, y 3 = x2 y are elements
of order 4. However, D4 is the only nonabelian group of order 8 that we have encountered so far
and in D4 only r and r3 have order 4. Hence, G must be a new group. 4
We now introduce the new group identified in this example but using the symbols traditionally
associated to it.
Example 3.3.12 (The Quaternion Group). The Quaternion group, denoted by Q8 , contains
the following eight elements:
1, −1, i, −i, j, −j, k, −k.
The operations on the elements are in part inspired by how the imaginary number operates on itself.
In particular, 1 is the identity element and multiplication by (−1) changes the sign of any element.
We also have
i2 = −1, i3 = −i, i4 = 1,
2 3
j = −1, j = −j, j 4 = 1,
2 3
k = −1, k = −k, k 4 = 1,
ij = k, jk = i, ki = j,
ji = −k, kj = −i, ik = −j.
Matching symbols to Example 3.3.11, note that i4 = j 4 = 1, i2 = −1 = j 2 , and ji = −k = (−i)j =
i3 j. This shows that Q8 is indeed the new group of order 8 discovered at the end of the previous
example. 4
A bigger classification question would involve finding all the groups of order 8. We could solve this
problem at this point, but we will soon encounter theorems that establish more internal structure on
groups that would make such questions easier. Consequently, we delay most classification questions
until later.
33. Let G = {e, t, u, v, w, x, y, z} be a group of order 8. For the following partial table, decide if it can be
completed to the Cayley table of some G and if so fill it in. [Hint: You may need to use associativity.]
e t u v w x y z
e − − − − − − − −
t − − − − − − − e
u − − e − − y x t
v − − − u − t − −
w − x v − − − − y
x − − − z − − − −
y − − − t z − − −
z − − − − x − − u
34. Let {Gi }i∈I be a collection of groups, indexed by a set I that is not necessarily finite. We define the
direct sum of this collection, denoted by M
Gi
i∈I
3.4
Symmetric Groups
Symmetric groups play a key role in group theory and applications of group theory. This section
introduces the terminology and elementary properties of symmetric groups.
3.4.1 – Permutations
Definition 3.4.1
Let A be a nonempty set. Define SA as the set of all bijections from A to itself.
Proposition 3.4.2
The pair (SA , ◦) is a group, where the operation ◦ is function composition.
We call SA the symmetric group on A. In the case that A = {1, 2, · · · , n}, then we write Sn
instead of the cumbersome S{1,2,...,n} . We call the elements of SA permutations of A.
92 CHAPTER 3. GROUPS
In Section 3.1, we discussed the symmetries of a regular n-gon in the plane. Though we described
the symmetries (reflections, rotations) with terms pertaining to the whole plane, we also described
the transformations as the symmetries on the set of vertices that preserve the n-gon incidence
structure. The symmetric group on a given set is simply the group of all bijections on that set
without imposing any conditions. In contrast to the regular n-gon that is not preserved by all of
Sn , the complete graph on n vertices (see Figure 3.7 with n = 6) is preserved as a graph under any
permutation in Sn .
3 2
4 1
5 6
Proposition 3.4.3
|Sn | = n!.
Proof. A function from {1, 2, . . . , n} to another set is injective if and only if the range has n elements.
Hence, a function from {1, 2, . . . , n} to itself is a bijection if and only if it is an injection.
The order of Sn is the number of distinct bijections on {1, 2, . . . , n}. To enumerate the bijections,
we count the injections from {1, 2, . . . , n} to itself. Note that there are n options for f (1). Since
f (2) 6= f (1), for each choice of f (1), there are n − 1 choices for f (2). Given values for f (1) and f (2),
there are n − 2 possible choices for f (3) and so on. Since an enumeration of injections requires an
n-part decision, we use the product rule. Hence,
Symmetric groups arise in a variety of natural contexts. In a 100-meter Olympic race, eight
runners are given lane numbers. The function from the runner’s lane number to the rank they place
in the race is a permutation of S8 . A cryptogram is a word game in which someone writes a message
but replacing each letter of the alphabet with another letter and a second person attempts to recover
the original message. The first person’s choice of how to scramble the letters of the alphabet is a
permutation in Sa , where a is the number of letters in the alphabet used. When someone shuffles a
deck of 52 cards, the resulting reordering of the cards represents a permutation in S52 .
We need a few convenient ways to visualize and represent a permutation on {1, 2, . . . , n}.
Chart Notation. Another way of writing a permutation is to record in a chart or matrix the
outputs like
1 2 ··· n
σ= .
σ(1) σ(2) · · · σ(n)
3.4. SYMMETRIC GROUPS 93
3
4 2
5 1
6 8
7
n-tuple. If the value n is clear from context, then the top row of the chart notation is redundant.
Hence, we can represent the permutation σ by the n-tuple (σ(1), σ(2), . . . , σ(n)). Using the
n-tuple notation, the permutation in Figure 3.8 is written as σ = (3, 8, 7, 4, 6, 2, 1, 5).
Cycle Notation. A different notation turns out to be more useful for the purposes of group theory.
In cycle notation, the expression
σ = (a1 a2 · · · am1 )(am1 +1 am1 +2 · · · am2 ) · · · (amk−1 +1 amk−1 +2 · · · amk ),
where a` are distinct elements in {1, 2, . . . , n}, means that for any index i,
(
ai+1 if i 6= mj for all j
σ(ai ) =
amj−1 +1 if i = mj for some j,
where m0 = 0. Any of the expressions (amj−1 +1 amj−1 +2 · · · amj ) is called a cycle because σ
“cycles” through these elements in order as σ iterates.
Using the cycle notation for the permutation in Figure 3.8, we note that
σ(1) = 3, σ(3) = 7, and σ(7) = 1;
then σ(2) = 8, σ(8) = 5, σ(5) = 6, σ(6) = 2;
and then σ(4) = 4.
Therefore, in cycle notation, the permutation in Figure 3.8 is written as σ = (1 3 7)(2 8 5 6)(4).
There are many different ways of expressing a permutation using the cycle notation. For example,
as cycles, (1 3 7) = (3 7 1) = (7 1 3). Standard cycle notation imposes four additional habits. (1) If
σ is the identity function, we just write σ = id. (Advanced texts commonly refer to the identity
permutation as 1 but, for the moment, we will use id or idn in order to avoid confusion.) (2) We
write each cycle of σ starting with the lowest integer in the cycle. (3) The order in which we list
the cycles of σ is such that initial elements of each cycle are in increasing order. (4) Finally, we
omit any cycle of length 1. We say that a permutation in Sn is written in standard cycle notation
if it satisfies these requirements. The standard cycle notation for the permutation in Figure 3.8 is
σ = (1 3 7)(2 8 5 6).
An m-cycle is a permutation that in standard cycle notation consists of only one cycle of length
m. Two cycles are called disjoint if they involve no common integers. By the construction of the
standard cycle notation for a permutation, we notice that the cycles of σ must be disjoint. A 2-cycle
is also called a transposition because it simply interchanges (transposes) two elements and leaves
the rest fixed.
94 CHAPTER 3. GROUPS
Example 3.4.4. To illustrate the cycle notation, we list all the permutations in S4 in standard
cycle notation:
We can verify that we have all the 3-cycles by calculating how many we should have. Each cycle
consists of 3 integers. The number of ways of choosing 3 from 4 integers is 43 . For each selection
of 3 integers, we list the least one first in the cycle. Then, there are 2 options for how to order the
remaining two integers in the 3-cycle. Hence, there are 2 · 43 = 8 three-cycles in S4 .
4
The cycle type of a permutation describes how many disjoint cycles of a given length make up the
standard cycle notation of that permutation. Hence, we say that (1 3)(2 4) is of cycle type (a b)(c d).
Example 3.4.5. As another example, consider the symmetric group S6 . There are 6! = 720 ele-
ments in S6 . We count how many permutations there are in S6 of a given cycle type. In order to
count the 6-cycles, note that every integer from 1 to 6 appears in the cycle notation of a 6-cycle. In
standard cycle notation, we write 1 first and then all 5! = 120 orderings of {2, 3, 4, 5, 6} give distinct
6-cycles. Hence, there are 120 6-cycles in S6 .
We now count the permutations in S6 of the form σ = (a1 a2 a3 )(a4 a5 a6 ). The standard cycle
notation of a permutation has a1 = 1. To choose the values in a2 and a3 , there are 5 choices for
a2 and then 4 remaining choices for a3 . With a1 , a2 , and a3 chosen, we know that {a4 , a5 , a6 } =
{1, 2, 3, 4, 5, 6} − {a1 , a2 , a3 }. The value of a4 must be the minimum value of {a4 , a5 , a6 }. Then there
are two ways to order the two remaining elements in the second 3-cycle. Hence, there are 5 · 4 · 2 = 40
permutations that consist of the product of two disjoint 3-cycles.
The above table counts all the different permutations in S6 , organized by lengths of cycles in standard
cycle notation. 4
στ = (1 4 2 6)(3 5)(2 6 3)
3.4. SYMMETRIC GROUPS 95
and read from right to left how στ maps the integers as a composition of cycles, not necessarily
disjoint now. (In the diagrams below, the arrows below the permutation indicate the direction of
reading and the arrows above indicate the action of a cycle on an element along the way.)
( 1 4 2 6 ) ( 3 5 ) ( 2 6 3 ) so στ = (1 4 . . .
4 1
4 1
( 1 4 2 6 ) ( 3 5 ) ( 2 6 3 ) so στ = (1 4 2 . . .
2 4
2 4
( 1 4 2 6 ) ( 3 5 ) ( 2 6 3 ) so στ = (1 4 2)(3 . . .
1 2
6 2
Note that since στ (2) = 1, we closed the first cycle and start a new cycle with the smallest integer
not already appearing in any previous cycle of στ .
( 1 4 2 6 ) ( 3 5 ) ( 2 6 3 ) so στ = (1 4 2)(3 6 . . .
6 3
6 2
( 1 4 2 6 ) ( 3 5 ) ( 2 6 3 ) so στ = (1 4 2)(3 6 5).
5 6
5 3
And we are done. We closed the cycle at the end of 5 because all integers 1 through 6 already appear
in the standard cycle notation of στ so the cycle must be closed. However, it is a good practice to
verify by the same method that στ (5) = 3.
It should be obvious that the cycle notation expresses a permutation as the composition of
disjoint cycles.
Proposition 3.4.6
Disjoint cycles in Sn commute.
The fact that disjoint cycles commute implies that to understand powers and inverses of permuta-
tions, understanding of how powers and inverses work on cycles is sufficient. Indeed, if τ1 , τ2 , . . . , τk
are disjoint cycles and σ = τ1 τ2 · · · τk , then
σ m = τ1m τ2m · · · τkm , for all m ∈ Z, and in particular σ −1 = τ1−1 τ2−1 · · · τk−1 .
Consequently, some properties about a permutation σ and its powers depend only on the cycle type
of σ. We leave a number of these results for the exercises but we state one proposition here because
of its importance.
96 CHAPTER 3. GROUPS
Proposition 3.4.7
For all σ ∈ Sn , the order |σ| is the least common multiple of the lengths of the disjoint
cycles in the standard cycle notation of σ.
The cycle notation also makes it easy to find the inverse of a permutation. The inverse function
to a permutation σ simply involves reading the cycle notation backwards.
Example 3.4.8. Let σ = (1 3 7)(2 5 4)(6 10) in S10 . We propose to calculate σ −1 and then to
determine the order of σ by calculating all the powers of σ.
To calculate σ −1 , we read the cycles backwards so
The second equals follows by rewriting the cycle with the lowest integer first. This is equivalent to
starting at the lowest integer in the cycle and reading the cycle backwards.
For the powers of σ we have
We briefly consider the product of cycles that are not disjoint. Some of the simplest products
involve two transpositions,
The fact that these products are different establishes the following proposition.
Proposition 3.4.9
The group Sn is nonabelian for all n ≥ 3.
Definition 3.4.10
Let n be an integer with n ≥ 2. Define Tn as the set
n o
(i, j) ∈ {1, 2, . . . , n}2 | i < j .
In other words, Tn consists of all possible pairs (i, j) of indices, where the first index is less than
the second and inv(σ) is the number of times σ would reverse the order of the pair. The set Tn has
cardinality
n−1 n−1
X X (n − 1)n (n − 1)n
|Tn | = (n − i) = n(n − 1) − i = n(n − 1) − = .
i=1 i=1
2 2
Definition 3.4.11
A permutation σ ∈ Sn is called even (resp. odd ) if inv(σ) is an even (resp. odd) integer.
The designation even or odd is called the parity of the permutation σ.
We conclude this section with a few propositions about the parity of permutations.
Proposition 3.4.12
The number of inversions of a transposition is odd. More precisely, inv((a b)) = 2(b − a) − 1.
98 CHAPTER 3. GROUPS
Proposition 3.4.13
Let σ, τ ∈ Sn . Then
inv(στ ) ≡ inv(σ) + inv(τ ) (mod 2).
Notice that inv(στ ) = k12 + k21 , inv(σ) = k11 + k21 , and inv(τ ) = k11 + k12 . Hence,
inv(σ) + inv(τ ) = k12 + k21 + 2k11 ≡ k12 + k21 ≡ inv(στ ) (mod 2).
The following theorem about how the parity of permutations relates to composition is an imme-
diate corollary of Proposition 3.4.13.
Theorem 3.4.14
The composition of two even permutations or of two odd permutations is an even permu-
tation. The composition of an odd and an even permutation is an odd permutation.
n
This product has 2 terms (and hence has degree n2 ). Note that each term in the product
Some authors give this property of the Vandermonde polynomial as the definition for sign(σ), without
reference to inversions. 4
We conclude this section with a characterization of even and odd permutations that further
justifies the terminology.
Theorem 3.4.16
A permutation is even (resp. odd) if and only if it can be written as a product of an even
(resp. odd) number of transpositions.
σ = τ1 τ2 · · · τm .
3.4. SYMMETRIC GROUPS 99
By Proposition 3.4.13,
By Proposition 3.4.12, inv(τi ) is odd for all i and hence inv(σ) is even if and only if m is even. The
theorem follows.
24. Suppose that n ≥ 4. Prove that the number of permutations of cycle type (ab)(cd) in Sn is
25. Show that the function f : Z/10Z → Z/10Z defined by f (a) = a3 is a permutation on Z/10Z and write
f in cycle notation as an element of S10 . (Use the bijection g : {1, 2, . . . , 10} → Z/10Z with g(a) = a
to set up numerical labels for elements in Z/10Z.)
27. Calculate the set {|σ| σ ∈ S7 }, i.e., the set of orders of elements in S7 .
28. We work in the group Sn for n ≥ 3. Prove or disprove that if σ1 and σ2 have the same cycle type
and τ1 and τ2 have the same cycle type, then σ1 τ1 has the same cycle type as σ2 τ2 . Does your answer
depend on n?
29. In a six-contestant steeple race, the horses arrived in the order C, B, F, A, D, E. Suppose someone
predicted they would arrive in the order, F, E, C, B, D, A. How many inversions are in the guessed
ordering?
30. In S5 , count the number of inversions of the following permutations: (a) σ = (1 4 2 5); (b) τ =
(1 4 3)(2 5); (c) ρ = (1 5)(2 3).
31. In S6 , count the number of inversions of the following permutations: (a) σ = (1 3 5 6 2); (b) τ =
(1 6)(2 3 4); (c) ρ = (1 3 5)(2 4 6).
32. Consider a 2-cycle in Sn of the form τ = (a b). Prove that inv(τ ) = 2(b − a) − 1.
33. Let A be an n × n matrix and let σ ∈ Sn . Suppose that A0 (respectively A00 ) is the matrix obtained
from A by permuting the columns (respectively rows) of A according to the permutation σ. Prove
that det(A0 ) = det(A00 ) = sign(σ) det(A).
34. (Challenge) Prove that the sum of inversions of all the permutations in Sn is n!n(n − 1)/4. In other
words, prove that
X n!n(n − 1)
inv(σ) = .
σ∈S
4
n
35. Show by example that a permutation can be written in more than one way as a product of transposi-
tions. Prove that if σ = τ1 τ2 · · · τm and σ = ε1 ε2 · · · εn are two different expressions of σ as a product
of transpositions, then m and n have the same parity.
36. Show that for all σ ∈ Sn , the number of inversions inv(σ −1 ) = inv(σ). Conclude that σ and σ −1 have
the same parity.
37. Show that for all σ, τ ∈ Sn , the element στ σ −1 has the same parity as τ .
3.5
Subgroups
In any algebraic structure, it is common to consider a subset that carries the same algebraic struc-
ture. In linear algebra, for example, we encounter subspaces of a vector space. In Section 1.4, we
encountered subposets. This section presents subgroups.
3.5. SUBGROUPS 101
Definition 3.5.1
Let G be a group. A nonempty subset H ⊆ G is called a subgroup if
(1) ∀x, y ∈ H, xy ∈ H (closed under operation);
(2) ∀x ∈ H, x−1 ∈ H (closed under taking inverses).
If H is a subgroup of G, we write H ≤ G.
Example 3.5.3. Any group G always has at least two subgroups, the trivial subgroup {e} and all
of G. 4
Example 3.5.5. Let G = Sn and consider the subset of permutations that leave the elements
{m + 1, m + 2, . . . n} fixed. This is a subgroup of Sn that consists of all elements in Sm . 4
Example 3.5.6 (Alternating Group). Theorem 3.4.14 shows that the set of even permutations
in Sn is closed under composition. Furthermore, for any permutation σ ∈ Sn must invert the same
number of pairs in Tn as defined in Definition 3.4.10. Hence, the subset of even permutations is
closed under inversion. Thus, the set of even permutations is a subgroup of Sn . The subset of even
permutations in Sn is called the alternating group on n elements and is denoted by An .
If n = 4, then the elements of A4 are
A4 = {id, (123), (124), (132), (134), (142), (143), (234), (243), (12)(34), (13)(24), (14)(23)}. 4
Example 3.5.7 (A Nonexample). As a nonexample, note that U (n) is not a subgroup of Z/nZ.
Even though U (n) is a subset of Z/nZ, the former involves the multiplication operation in modular
arithmetic while the latter involves the addition operation in modular arithmetic. If we considered
the pair (U (n), +), we have 1 and n − 1 in U (n) but 1 + n − 1 = 0 ∈ / U (n). Hence, U (n) is not
closed under addition. 4
The definition of a subgroup has two criteria. It turns out that these two can be combined into
one. This result shortens a number of subsequent proofs.
Proof. (=⇒) If H is a subgroup, then for all x, y ∈ H, the element y −1 ∈ H and hence xy −1 ∈ H.
(⇐=) Suppose that H is a nonempty subset with the condition described in the statement of
the proposition. First, since H is nonempty, ∃x ∈ H. Using the one-step criterion, xx−1 = e ∈ H.
Second, since e and x ∈ H, using the one-step criterion, ex−1 = x−1 ∈ H. We have now proven
that H is closed under taking inverses. Finally, for all x, y ∈ H, we know that y −1 ∈ H so, using
the one-step criterion again, x(y −1 )−1 = xy ∈ H. Hence, H is closed under the group operation.
The following example introduces an important group but uses the One-Step Subgroup Criterion
to prove it is a group.
Example 3.5.9 (Special Linear Group). Let F denote Q, R, C, or Fp . Define the subset of
GLn (F ) by
SLn (F ) = {A ∈ GLn (F ) | det A = 1}.
Obviously, SLn (F ) is not empty because the identity matrix has determinant 1 so it is in SLn (F ).
Furthermore, according to properties of the determinant,
Hence, according to the One-Step Subgroup Criterion, SLn (F ) ≤ GLn (F ). We call SLn (F ) the
special linear group. 4
Proof. Given the hypotheses of the proposition, in order to establish that H is a subgroup of G, we
only need to show that it is closed under taking inverses.
Let x ∈ H. Since H is closed under the operation, xn ∈ H for all positive integers n. Thus,
the set S = {xn | n ∈ N∗ } is a subset of H and hence finite. Therefore, there exists m, n ∈ N∗ with
n 6= m such that xm = xn . Without loss of generality, suppose that n > m. Then n − m is a positive
integer and xn−m = e. Thus, x has finite order, say |x| = k. If x = e, then it is its own inverse. If
x 6= id, then k ≥ 2. Since k − 1 is a positive integer, xk−1 = x−1 ∈ S ⊆ H. This shows that H is
closed under taking inverses.
It is also useful to be aware of the interaction between subgroups and subset operations.
Proposition 3.5.11
Let G be a group and let H and K be two subgroups of G. Then H ∩ K ≤ G.
On the other hand, the union of two subgroups is not necessarily another subgroup and the set
difference of two subgroups is never another subgroup. Consequently, in relation to the operations
of union and intersection on subsets, subgroups behave in a similar way as subspaces of a vector
space do: Intersections are again subgroups while unions usually are not and set differences are never
subgroups.
By an induction reasoning, knowing that the intersection of two subgroups is again a subgroup
implies that any intersection of a finite number of subgroups is again a subgroup. However, it is
also true that a general intersection (not necessarily finite) of a collection of subgroups is again a
subgroup. (See Exercise 3.5.25.)
3.5. SUBGROUPS 103
Definition 3.5.12
The center Z(G) is the subset of G consisting of all elements that commute with every
other element in G. In other words,
Proposition 3.5.13
Let G be any group. The center Z(G) is a subgroup of G.
Definition 3.5.15
The centralizer of A in G is the subset
The operation gag −1 occurs in many different areas of group theory. The element gag −1 is called
the conjugation of a by g. The condition that gag −1 = a is tantamount to ga = ag. Consequently,
the centralizer consists of all elements in G that commute with every element of the subset A.
The center Z(G) of a group is a particular example of a centralizer, namely Z(G) = CG (G). If
A = {a}, i.e., is a singleton set, then we write CG (a) instead of CG ({a}).
Proposition 3.5.16
For any subset A of G, the set CG (A) is a subgroup of G.
Thus, xy ∈ CG (A).
Let x ∈ CG (A). Then for all a ∈ A since xax−1 = a, then xa = ax and a = x−1 ax. Thus,
−1
x ∈ CG (A).
We conclude that CG (A) is a subgroup of G.
In order to present the next construction that always gives a subgroup, we introduce some
notation. Let A ⊆ G and let g ∈ G. Then we define the subsets gA, Ag, and gAg −1 as
For the set gA (and similarly for the other two sets), the function f : A → gA defined by f (a) = ga
is a bijection with inverse function f −1 (x) = g −1 x. Consequently, gA, Ag and gAg −1 have the same
cardinality as A.
Definition 3.5.17
Let A be any subset of G. We also define the normalizer of A in G as
Proposition 3.5.18
The normalizer NG (A) is a subgroup of G.
elements obtained by repeated operations or inverses from elements in S must also be in H. The
concept of generating a subgroup by a subset makes this idea precise.
Definition 3.5.19
Let S be a nonempty subset of a group G. We define hSi as the subset of “words” made
from elements in S, that is to say
hSi = {sα1 α2 αn
1 s2 · · · sn | n ∈ N, si ∈ S, αi ∈ Z}.
The subset hSi is called the subgroup of G generated by S. (Note that the si are not
necessarily distinct.)
It is not hard to see that hSi is indeed a subgroup of G. Since S 6= ∅ and S ⊆ hSi, then hSi is
β1 β2
not empty. For two expressions x = sα 1 α2 αn βm
1 s2 · · · sn and y = t1 t2 · · · tm in hSi, we have
xy −1 = sα 1 α2 αn −βm
1 s2 · · · sn tm · · · t−β
2
2 −β1
t1 .
This product is again an element of hSi so by the One-Step Subgroup Criterion, hSi ≤ G.
Example 3.5.20. Consider the dihedral group on the hexagon G = D6 . The subgroup hri consists
of all powers of the elements r. Hence, hri = {ι, r, r2 , r3 , r4 , r5 }. Notice that hri is the subgroup of
rotations. We can visualize these rotations as follows:
ι r r2 r3 r4 r5
The subgroup hsi = {1, s} consists of only two elements, reflection across the reference axis of
symmetry and the identity transformation. With a similar visualization we have:
ι s
The subgroup hs, r2 i, contains the elements ι, s, r2 , and r4 simply by taking powers of elements in
{s, r2 }. However, hs, r2 i also contains sr2 and sr4 . The defining relation on s and r give ra s = sr6−a .
Hence, as we apply this relation, the parity on the power of r does not change. Hence,
ι r2 r4 s sr2 sr4
The subgroup hs, sri obviously contains s but also contains r = s(sr). Hence, hs, sri = D6
because it contains all rotations and all reflections. 4
Obviously, for any element a in a group G, the subgroup hai is a cyclic subgroup of G whose
order is precisely the order |a|. It is important to note that distinct sets of generators may give that
same subgroup. In the previous example, we noted that hr, sri = hr, si. This occurs even with cyclic
subgroups. For example, in D6 , the rotation subgroup is hri = hr5 i. In D6 , we also have hr2 i = hr4 i.
106 CHAPTER 3. GROUPS
Example 3.5.21. Consider the group S4 . Let H = h(1 3), (1 2 3 4)i. We list out all the elements of
H. By taking powers of the generators, we know that
are all in H. By taking products of generators and their powers, H also contains
(1 3)(1 2 3 4) = (1 2)(3 4), (1 3)(1 2 3 4)2 = (2 4), (1 3)(1 2 3 4)3 = (1 4)(2 3).
H = {id, (1 3), (1 2 3 4), (1 3)(2 4), (1 4 3 2), (2 4), (1 2)(3 4), (1 4)(2 3)}
but we have not yet proven that H does not have any other elements. However, the identity
(1 3)(1 2 3 4) = (1 4 3 2)(1 3) shows that though (1 3) and (1 2 3 4) do not commute, it is possible to
pass (1 2 3 4) to the left of (1 3) by changing the power on the 4-cycle. Hence, every element in H
can be written as (1 2 3 4)a (1 3)b where a = 0, 1, 2, 3 and b = 0, 1. Thus, we have indeed found all
the elements in H. 4
Definition 3.5.22
A group (or a subgroup) is called finitely generated if it is generated by a finite subset.
A finite group is always finitely generated. Indeed, a finite group G is generated (not minimally
so) by G itself. On the other hand, in the group (Z, +) we have h1i = Z which gives a simple
example of a infinite group that is finitely generated. It is not hard to find a group that is not
finitely generated.
Example 3.5.23. Let (Q>0 , ×) be the multiplicative group of positive rational numbers and let P
be the set of prime numbers. Every positive rational number r can be written in the form
r = pα1 α2 αm
1 p2 · · · pm
where αi ∈ Z. In our usual way of writing fractions, any prime pi with αi < 0 would be in the
prime factorization of the denominator and with αi > 0 would be in the prime factorization of the
numerator. Consequently, P is a generating set of Q>0 .
This does not yet imply that (Q>0 , ×) is not finitely generated. We now show this by contra-
diction. Assume that it is generated by a finite set {r1 , r2 , . . . , rk } of rational numbers. The prime
factorizations of the numerators and denominators of all the ri (written in reduced form) involve
a finite number of primes, say {p1 , p2 , . . . , pn }. Let p0 be a prime not in {p1 , p2 , . . . , pn }. Then
p0 ∈ Q>0 but p0 6∈ hr1 , r2 , . . . , rk i. Hence, (Q>0 , ×) is not finitely generated. 4
In Exercises 3.5.1 through 3.5.16, prove or disprove that the given subset A of the given group G is a subgroup.
1. G = Z with addition and A are the multiples of 5.
2. G = (Q, +) and A is the set of rational numbers with odd denominators (when written in reduced
form).
p2
3. G = (Q∗ , ×) and A is the set of rational numbers of the form q2
.
4. G = (C∗ , ×) and A = {a + ai | a ∈ R}.
3.5. SUBGROUPS 107
is a subgroup of G.
\
26. Prove that Z(G) = CG (a).
a∈G
27. Prove that the center Z(Dn ) of the dihedral group is {ι} if n is odd and {ι, rn/2 } if n is even.
28. Prove that for all n ≥ 3, the center of the symmetric group is Z(Sn ) = {id}.
29. In the group Dn , calculate the centralizer and the normalizer for each of the subsets
30. For the given group G, find the CG (A) and NG (A) of the respective sets A.
(a) G = S3 and A = {(1 2 3)}.
(b) G = S5 and A = {(1 2 3)}.
(c) G = S4 and A = {(1 2)}.
31. For the given group G, find the CG (A) and NG (A) of the respective sets A.
(a) G = D6 and A = {s, r2 }.
108 CHAPTER 3. GROUPS
1
σ = (1 3 4)
120◦
2
4
3
[Hint: Use Exercise 3.5.25. From this exercise, we conclude that hSi is the smallest subgroup by
inclusion that contains S.]
39. Prove that if A ⊆ B are subsets in a group G, then hAi ≤ hBi.
40. Prove that in S4 , the subgroup h(1 2 3), (1 2)(3 4)i is A4 .
41. This exercise finds generating subsets of Sn .
(a) Prove that Sn is generated by {(1 2), (2 3), (3 4), . . . , (n − 1 n)}.
(b) Prove that Sn is generated by {(1 2), (1 3), . . . , (1 n)}.
(c) Prove that Sn is generated by {(1 2), (1 2 3 · · · n)}.
(d) Show that S4 is not generated by {(1 2), (1 3 2 4)}.
42. Prove that for any prime number p, the symmetric group Sp is generated by any transposition and
any p-cycle.
43. Label the vertices of a tetrahedron with integers {1, 2, 3, 4}. Prove that the group of rigid motions of
a tetrahedron is A4 . (See Figure 3.9.)
44. Show that if p is prime, then in the alternating group Ap , we have Ap = h(1 2 3), (1 2 3 · · · p)i.
45. Show that (R, +) is not finitely generated.
46. Describe the elements in Tor(C∗ ), the torsion subgroup of C∗ . (See Exercise 3.5.18.) Show that
Tor(C∗ ) is not finitely generated.
47. Let H and K be two subgroups of G. Prove that H ∪ K is a subgroup of G if and only if H ⊆ K or
K ⊆ H.
48. Let H be a subgroup of a group G and let g ∈ G. Prove that if n is the smallest positive integer such
that g n ∈ H, then n divides |g|.
3.6. LATTICE OF SUBGROUPS 109
3.6
Lattice of Subgroups
In order to develop an understanding of the internal structure of a group, listing all the subgroups of
a group has some value. However, showing how these subgroups are related carries more information.
The lattice of subgroups offers a visual representation of relationships among subgroups.
Let Sub(G) be the set of all subgroups of the group G. Note that Sub(G) ⊆ P(G). The
pair (Sub(G), ≤) is a poset, in fact the subposet of (P(G), ⊆) on the subset Sub(G). Indeed, if
H, K ∈ Sub(G), then H ⊆ K if and only if H ≤ K.
Proposition 3.6.1
For all groups G, the poset (Sub(G), ≤) is a lattice.
Proof. We know that (P(G), ⊆) in which the least upper bound of any two subsets A and B is A ∪ B
and the greatest lower bound is A ∩ B. By Proposition 3.5.11, for any two subgroups H, K ≤ G,
the set H ∩ K is also a subset. Hence, H ∩ K is the greatest lower bound of H and K in the poset
(Sub(G), ≤).
The difficulty lies in that H ∪ K is not necessarily a subgroup. Consequently, if H and K have
a least upper bound in (Sub(G), ≤), it must be something else. The generating subsets formalism
gives us an answer. By Exercise 3.5.38, hH ∪ Ki is the smallest (by inclusion) subset of G that
contains both H and K. Thus, hH ∪ Ki is the least upper bound of H and K in (Sub(G), ≤). Since
every pair of subgroups of G has a least upper bound and a greatest lower bound in (Sub(G), ≤),
the result follows.
The construction given in the above proof for a least upper bound of H and K, namely hH ∪ Ki
is called the join of H and K.
Since (Sub(G), ≤) is a poset, we can create the Hasse diagram for it. By a common abuse of
language, we often say “draw the lattice of G” for “draw the Hasse diagram of the poset (Sub(G), ≤).”
The lattice of a group shows all subgroups and their containment relationships.
The following list of examples gives a flavor of some group lattices.
Example 3.6.2 (Prime Cyclic Groups). The groups with the least internal structure are groups
Zp where p is a prime number. The lattice of Zp is:
Zp
{e}
4
Example 3.6.3. Consider the cyclic group Z8 = hz | z 8 = ei. It has a total of 4 subgroups, namely
{e}, hz 4 i, hz 2 i, and Z8 . The lattice of Z8 is:
110 CHAPTER 3. GROUPS
Z8
hz 2 i
hz 4 i
{e}
At first pass, one might wonder why we did not consider subgroups generated by other elements, say
for example z 3 . However, hz 3 i = {z 3 , z 6 , z, z 4 , z 7 , z 2 , z 5 , e} so hz 3 i = Z8 . Also hz 6 i = {z 6 , z 4 , z 2 , e} =
hz 2 i. All subgroups of Z8 do appear in the above diagram. 4
hz 2 i
hz 3 i
hz 4 i
hz 6 i
hz 8 i
hz 12 i
{e}
4
By looking at the above examples of groups, we notice a trend in the subgroups of a cyclic group.
We make explicit the pattern of subgroups in cyclic groups.
Proposition 3.6.5
Every subgroup of a cyclic group G is cyclic. Furthermore, if G is finite with |G| = n, then
all the subgroups of G consist of hz d i for all divisors d of n and where z is a generator of G.
Proof. Let G be a cyclic group (not necessarily finite) generated by an element z. Let H be a
subgroup of G and let S = {a ∈ N∗ | z a ∈ H}, the set of positive powers of z in H. By the
well-ordering principle, S has a least element, say c. We prove by contradiction that H = hz c i.
Suppose that H contained a z k where c does not divide k. Then by integer division, there
exists an integer q and an integer r with 0 < r < c such that k = cq + r. Then the element
z r = z k−qc = z k (z c )−q is in H, which contradicts the assumption that z k ∈ H with c - k. Hence,
H = hz c i.
Now consider the case with G finite and |G| = n. Since e = z n ∈ H, using the argument in the
previous paragraph, we see that c must divide n.
If G is a cyclic group of order n generated by z, then for any d that divides n, we have |z d | = nd .
So we can also say G contains exactly one subgroup of order k where k is any divisor of n. For
example, in Z/45Z, which is generated by 1, the subgroup h36i has order 45/ gcd(45, 36) = 45/9 = 5.
Thus, h36i = h9i, since this is the subgroup of Z/45Z of 5 elements.
We give a few more examples of lattices of noncyclic groups.
Example 3.6.6 (Quaternion Group). The following diagram gives the lattice of Q8 :
3.6. LATTICE OF SUBGROUPS 111
Q8
h−1i
{1}
4
Example 3.6.7. The lattice of A4 , the alternating group on four elements (which has order 12), is
the following:
A4
{id}
4
Example 3.6.8. The lattice of D6 , the hexagonal dihedral group, is the following:
D6
hr2 i
{ι}
4
112 CHAPTER 3. GROUPS
As the size or internal complexity of groups increases, the subgroup lattice diagrams become un-
wieldy. Furthermore, in specific examples and in proofs of theoretical results, we are often interested
in only a small number of subgroups. Consequently, we often restrict our attention to a sublattice
of the full subgroup lattice. In a normal subgroup lattice, an unbroken edge indicates that there
exists no subgroup between the subgroups on either end of the edge. However, in a subdiagram of a
subgroup lattice, an unbroken edge no longer means that there does not exist a subgroup between
the endpoints of that edge. For example, given any two subgroups H, K ≤ G, there is a sublattice
of (Sub(G), ≤) containing the following:
hH ∪ Ki
H ∩K
{e}
Having the subgroup lattice of a group G facilitates many calculations related to subgroups.
Calculating intersections and joins of two subgroups is easy. For the intersection of H and K, we
find the highest subgroup L of G in which there is a path up from L to H and a path up from L
to K. For joins, the process is merely reversed. The lattice of subgroups facilitates in determining
centralizers and normalizers because we can usually either follow a path up or down in the lattice,
testing the subgroup whether it continues to have the appropriate properties or not.
3.7
Group Homomorphisms
The concept of a function is ubiquitous in mathematics. However, in different branches we often
impose conditions on the functions we consider. For example, in linear algebra we do not study
arbitrary functions from one vector space to another but limit our attention to linear transformations.
As exhibited in Section 1.4, when studying posets it is common to restrict attention to monotonic
functions.
Given two objects A and B with a particular algebraic structure, if we consider an arbitrary
function f : A → B, a priori, the only type of information that f carries is set theoretic. In other
words, information about algebraic properties of A would be lost under f . However, if we impose
certain properties on the function, it can, intuitively speaking, preserve the structure.
3.7.1 – Homomorphisms
Definition 3.7.1
Let (G, ∗) and (H, •) be two groups. A function ϕ : G → H is called a homomorphism
from G to H if for all g1 , g2 ∈ G,
Note that the operation on the left-hand side is an operation in G while the operation on the
right-hand side occurs in the group H. With abstract group notation, we write (3.6) as
but take care to remember that the group operations occur in different groups.
114 CHAPTER 3. GROUPS
Example 3.7.2. Fix a positive real number b and consider the function f (x) = bx . Power rules
state that for all x, y ∈ R,
bx+y = bx by .
In the language of group theory, this identity can be restated by saying that the exponential function
f (x) = bx is a homomorphism from (R, +) to (R∗ , ×). 4
Example 3.7.3. The function of inclusion f : (Z, +) → (R, +) given by f (x) = x is a homomor-
phism. 4
Example 3.7.5. Consider the direct sum Z2 ⊕ Z2 , where each Z2 has generator z. Consider the
function ϕ : Q8 → Z2 ⊕ Z2 defined by
This is a homomorphism but in order to verify it, we must check that ϕ satisfies (3.6) for all 64
products of terms in Q8 . However, we can cut down the work. First notice that for all terms
a, b ∈ {1, i, j, k}, the products (±a)(±b) = ±(ab) with the sign as appropriately defined. The
following table shows ϕ(ab) with a in the columns and b in the rows.
±1 ±i ±j ±k
±1 (e, e) (z, e) (e, z) (z, z)
±i (z, e) (e, e) (z, z) (e, z)
±j (e, z) (z, z) (e, e) (z, e)
±k (z, z) (e, z) (z, e) (e, e)
All the entries of the table are precisely ϕ(a)ϕ(b), which confirms that ϕ is a homomorphism. 4
Example 3.7.6. Let n be an integer greater than 1. The function ϕ(a) = ā that maps an integer
to its congruence class in Z/nZ is a homomorphism. This holds because of Proposition 2.2.4 and
the definition
a + b = a + b. 4
This result also applies with modular arithmetic base p, when p is a prime number. The determinant
function det : GL(Fp ) → U (p) is a homomorphism because of the same identity. 4
sign(σ) = (−1)inv(σ) .
In other words, sign(σ) = 1 if σ is even and sign(σ) = −1 if σ is odd. Now for all σ, τ ∈ Sn ,
where the second equality holds because of Proposition 3.4.13. Thus, the sign function is a homo-
morphism sign : Sn → ({1, −1}, ×). This sign function plays a crucial role in many applications of
the symmetric group and we will revisit it often. 4
3.7. GROUP HOMOMORPHISMS 115
As a first property preserved by homomorphisms, the following proposition shows that homo-
morphisms map powers of elements to powers of corresponding elements.
Proposition 3.7.9
Let ϕ : G → H be a homomorphism of groups.
(1) ϕ(eG ) = eH .
(2) For all x ∈ G, ϕ(x−1 ) = ϕ(x)−1 .
(3) For all x ∈ G and all n ∈ Z, ϕ(xn ) = ϕ(x)n .
Definition 3.7.10
Let ϕ : G → H be a homomorphism between groups.
Proposition 3.7.11
Let ϕ : G → H be a homomorphism of groups. The kernel Ker ϕ is a subgroup of G.
Proof. The kernel Ker ϕ is nonempty since eG ∈ Ker ϕ. Now let x, y ∈ Ker ϕ. Then
ϕ(xy −1 ) = ϕ(x)ϕ(y −1 ) since ϕ is a homomorphism
−1
= ϕ(x)ϕ(y) by Proposition 3.7.9(2)
= eH e−1
H = eH since x, y ∈ Ker ϕ.
Hence, xy −1 ∈ Ker ϕ. Thus, Ker ϕ ≤ G by the One-Step Subgroup Criterion.
Proposition 3.7.12
Let ϕ : G → H be a homomorphism of groups. The image Im ϕ is a subgroup of H.
116 CHAPTER 3. GROUPS
Proposition 3.7.15
Let ϕ : G → H be a homomorphism of groups. Then
(1) ϕ is injective if and only if Ker ϕ = {eG }.
3.7.3 – Isomorphisms
We have seen some examples where groups, though presented differently, may actually look strikingly
the same. For example (Zn , ·) and (Z/nZ, +) behave identically and likewise for (Z, +) and (2Z, +),
where 2Z means all even numbers. This raises the questions (1) when should we call two groups the
same and (2) what precisely does it mean if we call two groups the same.
Definition 3.7.16
Let G and H be two groups. A function ϕ : G → H is called an isomorphism if
(1) ϕ is a homomorphism;
(2) ϕ is a bijection.
If there exists an isomorphism between two groups G and H, then we say that G and H
are isomorphic and we write G ∼= H.
When two groups are isomorphic, they are for all intents and purposes of group theory the
same. We could have defined an isomorphism as a bijection ϕ such that both ϕ and ϕ−1 are both
homomorphisms. However, this turns out to be heavier than necessary as the following proposition
shows.
Proposition 3.7.17
If ϕ is an isomorphism (as defined in Definition 3.7.16), then ϕ−1 : H → G is a homomor-
phism.
so f is a homomorphism. By Theorem 3.3.9, the element (x, y) has order lcm(2, 3) = 6. Hence,
Im f = G and so f is a surjective. Since f is a surjection between finite sets of the same cardinality,
then f is a bijection. We conclude that f is an isomorphism. 4
Example 3.7.19. Let b be a positive real number. We know that f (x) = bx is a bijection between
R and R>0 with inverse function f −1 (x) = logb x = (ln x)/(ln b). Example 3.7.2 showed that f is
a homomorphism and thus it is an isomorphism between (R, +) and (R>0 , ×). Proposition 3.7.17
implies that f −1 (x) = logb x is a homomorphism from (R>0 , ×) to (R, +). 4
Example 3.7.20. In this example, we provide an isomorphism between GL2 (F2 ) and S3 . Consider
the following function:
ϕ
1 0
−→ id
0 1
1 1
−→ (12)
0 1
1 0
−→ (13)
1 1
0 1
−→ (23)
1 0
1 1
−→ (123)
1 0
0 1
−→ (132)
1 1
If we compare the group table on GL2 (F2 ) and the group table on S3 (see Exercise 3.7.17), we
find that this particular function ϕ preserves how group elements operate, establishing that ϕ is a
isomorphism. 4
Proposition 3.7.21
Let ϕ : G → H be an isomorphism of groups. Then
(1) |G| = |H|.
(2) G is abelian if and only if H is abelian.
(3) ϕ preserves orders of elements, i.e., |x| = |ϕ(x)| for all x ∈ G.
Proof. Part (1) follows immediately from the requirement that ϕ is a bijection.
For (2), suppose that G is abelian. Let h1 , h2 ∈ H. Since ϕ is surjective, there exist g1 , g2 ∈ G
such that ϕ(g1 ) = h1 and ϕ(g2 ) = h2 . Then
Hence, we have shown that if G is abelian, then H is abelian. Repeating the argument with ϕ−1
establishes the converse, namely that if H is abelian, then G is abelian.
Consider first the case in which the order |x| = n is finite. For (3), we have already seen
that ϕ(xn ) = ϕ(x)n by Proposition 3.7.9(3). Then 1H = ϕ(1G ) = ϕ(xn ) = ϕ(x)n . Hence, by
Corollary 3.3.6, the order of |ϕ(x)| is finite and divides |x|. Applying the same argument to ϕ−1
and the element ϕ(x), we deduce that |x| divides the order of |ϕ(x)|. Hence, since |x| and |ϕ(x)| are
both positive and divide each other |x| = |ϕ(x)|.
Consider now the case in which the order of x is infinite. Suppose that ϕ(x)m = 1H for some
m > 0. Then ϕ(xm ) = 1H and since ϕ is injective, we again deduce that xm = 1G . Hence, the
order of x is finite. But this is a contradiction so if |x| is infinite then |ϕ(x)| is infinite. Conversely,
applying the same argument to ϕ−1 , establishes that if |ϕ(x)| is infinite then |x| is infinite.
118 CHAPTER 3. GROUPS
Proposition 3.7.21 is particularly useful to prove that two groups are not isomorphic. If either
condition (1) or (2) fails, then the groups cannot be isomorphic. Also, if two groups have a different
number of elements of a given order, then Proposition 3.7.21(3) cannot hold for any isomorphism
and thus the two groups are not isomorphic.
However, we underscore that the three conditions in Proposition 3.7.21 are necessary conditions
but not sufficient: just because all three conditions hold, we cannot deduce that ϕ is an isomorphism.
Example 3.7.24 illustrates this.
Remark 3.7.22. If two groups are isomorphic then they have isomorphic (as posets) subgroup
lattices. However, the converse is not true. There are many pairs of nonisomorphic groups with
subgroup lattices that are isomorphic as posets. See Figure 3.10 for an example. 4
Z/15Z Z/21Z
h3i h3i
h5i h7i
{0} {0}
Example 3.7.23. We prove that D4 and Q8 are not isomorphic. They are both of order 8 and
they are both nonabelian. However, in D4 only the elements r and r3 have order 4 while in Q8 , the
elements i, −i, j − j, k, −k are all of order 4. Consequently, there cannot exist a bijection between
D4 and Q8 that satisfies Proposition 3.7.21(3). Hence, D4 ∼ 6 Q8 .
= 4
Example 3.7.24. The partial order in Example 1.4.8 establishes a bijection between Q>0 and
N∗ . From this bijection, it is easy to prove that there is a bijection between Q and Z. However,
there does not exist an isomorphism between (Z, +) and (Q, +). This result does not follow from
Proposition 3.7.21. Indeed, |Z| = |Q|, Z and Q are both abelian, and all nonzero elements of both
groups have infinite order.
Suppose there does exist an isomorphism f : Q → Z. If r is a rational number and n ∈ Z, then
by Proposition 3.7.9(3) with addition, f (n · r) = n · f (r). Suppose that we define f (1) = a, for some
nonzero a. Then
2a 1
a = f (1) = f = 2a · f .
2a 2a
This implies that 1 = 2f (1/2a). But this is a contradiction since f (1/2a) ∈ Z and it contradicts the
hypothesis that there exists an isomorphism between Z and Q. 4
The motivating example that Zn ∼ = Z/nZ offered at the beginning of this subsection generalizes
to a broader result for any cyclic groups.
Proposition 3.7.25
Two cyclic groups of the same cardinality are isomorphic.
Proof. Recall from Exercise 3.2.31 that cyclic groups are abelian.
First suppose that G and H are finite cyclic groups, both of order n. Suppose that G is generated
by x and H is generated by an element y. Define the function ϕ : G → H by ϕ(xa ) = y a . Since H
is abelian, ϕ is a homomorphism; we need to prove that it is a bijection.
The image of ϕ is {ϕ(xk ) | 0 ≤ k ≤ n − 1} = {y k | 0 ≤ k ≤ n − 1} = H, so ϕ is a surjection. A
surjection between finite sets is a bijection. Hence, ϕ is an isomorphism.
3.7. GROUP HOMOMORPHISMS 119
The proof is similar if G and H are infinite cyclic groups. (We leave the proof to the reader. See
Exercise 3.7.27.)
hri ∼
= Z6 but hs, r2 i ∼
= hsr, r2 i ∼
= D3 .
Definition 3.7.26
A homomorphism φ : G → G of a group into itself is called an endomorphism. An iso-
morphism ψ : G → G is called an automorphism of G. The set of all automorphisms on a
group G is denoted by Aut(G).
Proposition 3.7.27
The set of automorphisms of a group G is a group with the operation of composition. In
fact, Aut(G) ≤ SG .
Proof. We call SG the group of bijections on G. The set Aut(G) is an nonempty subset since the
identity function idG is an automorphism. Now suppose that ϕ, ψ ∈ Aut(G) and let x, y ∈ G. Then
Thus, ϕ ◦ ψ −1 ∈ Aut(G) for arbitrary ϕ, ψ ∈ Aut(G) and, by the One-Step Subgroup Criterion,
Aut(G) is a subgroup of SG .
Proposition 3.7.28
Let n be a integer greater than 2. Then Aut(Zn ) = U (n).
Proof. (The proof is left as a guided exercise for the reader. See Exercise 3.7.40.)
The proof of Cayley’s Theorem shows that G is isomorphic to a subgroup of Sn where |G| = n.
It is important to realize that the order |G| is not necessarily the least integer m such that G is
isomorphic to a subgroup of Sm . See the second half of Examples 3.7.30.
The proof of Cayley’s Theorem is not particularly profound. However, it has important conse-
quences especially in regards to using a computer algebra system (CAS) to perform group theoretic
calculations. Implementing function composition of bijections on finite sets is simple to program.
Consequently, defining a group G as some subgroup of some Sn makes it possible (if not easy) to
perform the group operations on G in a CAS.
Example 3.7.30. Let G = D4 , the dihedral group on the square. The proof of Cayley’s Theorem
establishes an isomorphism between D4 and a subgroup of S8 . To find one such isomorphism ϕ,
label the elements of D4 with
D4 ∼
= h(1 2 3 4)(5 8 7 6), (1 5)(2 6)(3 7)(4 8)i.
This is not the only natural isomorphism between D4 and a subgroup of a symmetric group.
Consider how we introduced Dn in Section 3.1.2 as depicted below.
2 r 2
s
3 1 3 1
4 4
By considering how r and s operate on the vertices, we can view r and s as elements in S4 . The
appropriate way to understand this perspective is to define a homomorphism ϕ : D4 → S4 by
ϕ(r) = (1 2 3 4) and ϕ(s) = (2 4). For all elements ra sb ∈ D4 , the function ϕ is
By exhaustively checking all operations, we can verify that this function is a homomorphism. How-
ever, we know that r and s were originally defined as the functions (1 2 3 4) and (2 4) so this ho-
momorphism is natural. The homomorphism ϕ is injective and hence establishes an isomorphism
between D4 and the subgroup h(1 2 3 4), (2 4)i in S4 . 4
10. Let ϕ : G → H be a homomorphism. Prove that for all g ∈ G, the order |ϕ(g)| divides the order |g|.
11. Prove Proposition 3.7.15.
12. Prove that ϕ : Z⊕Z → Z defined by ϕ(x, y) = 2x+3y is a homomorphism. Determine Ker ϕ. Describe
the fiber ϕ−1 (6).
13. Let ϕ and ψ be homomorphisms between two groups G and H. Prove or disprove that
{g ∈ G | ϕ(g) = ψ(g)}
is a subgroup of G.
14. Consider the function f : U (33) → U (33) defined by f (x) = x2 . Show that f is a homomorphism and
find the kernel and the image of f .
15. Prove that the function f : GL2 (R) → GL3 (R) with
2
b2
a 2ab
a b
f = ac ad + bc bd
c d
c2 2cd d2
[An automorphism of the form ψg is called an inner automorphism. The image of Ψ in Aut(G) is
called the group of inner automorphisms and is denoted Inn(G).]
39. Prove that Aut((Q, +)) ∼
= (Q∗ , ×).
40. This exercise determines the automorphism group Aut(Zn ). Suppose that Zn is generated by the
element z.
(a) Prove that every homomorphism ψ : Zn → Zn is completely determined by where it sends the
generator z.
(b) Prove that every homomorphism ψ : Zn → Zn is of the form ψ(g) = g a for some integer a
with 0 ≤ a ≤ n − 1. For the scope of this exercise, denote by ψa the homomorphism such that
ψa (g) = g a .
(c) Prove that ψa ∈ Aut(Zn ) if and only if gcd(a, n) = 1.
(d) Show that the function Ψ : U (n) → Aut(Zn ) with Ψ(a) = ψa is an isomorphism to conclude that
U (n) ∼
= Aut(Zn ).
3.8
Group Presentations
In the organizing principles outlined in the preface, one of the objectives for each algebraic structure
involved how to “conveniently describe an object with the given (algebraic) structure.” For the
groups we encountered so far, we have introduced individual notation for each one. This was
reasonable because these groups arose in a natural manner, from a context that already possessed
some habits notation. In order to study all groups, and not just those we encounter by happenstance,
it is desirable to have some form of consistent notation to describe elements and operations in
arbitrary groups.
to express that every element in Dn can be obtained by a finite number of repeated operations
between r and s and that every algebraic relation between r and s can be deduced from the relations
rn = 1, s2 = 1, and sr = r−1 s. The expression (3.7) is the standard presentation of Dn .
The relation sr = r−1 s shows that in any term sk rl it is possible to “move” the s to the left of the
term by appropriately changing the power on r. Hence, in any expression involving the generators r
and s, it is possible, by appropriate changes on powers, to move all the powers of s to the right and
all the powers of r to the left. Hence, every expression in the r and s can be rewritten as rl sk with
0 ≤ k ≤ 1 and 0 ≤ l ≤ n − 1. Though we already knew |Dn | = 2n from geometry, this reasoning
shows that there are at most 2n terms in the group given by this presentation.
Definition 3.8.1
Let G be a group. A presentation of G is an expression
G = hg1 , g2 , . . . , gk | R1 R2 · · · Rs i
where each Ri is an equation in the elements g1 , g2 , . . . , gk . This means that every element
in G can be obtained as a combination of operations on the generators g1 , g2 , . . . , gk and
that any relation between these elements can be deduced from the relations R1 , R2 , . . . , Rs .
Example 3.8.2 (Cyclic Groups). The nth order cyclic group has the presentation
Zn = hz | z n = 1i.
The distinct elements are {1, z, z 2 , . . . , z n−1 } and the operation is z a z b = z a+b , where z n = 1. 4
Example 3.8.3. Consider an alternate presentation for D5 , namely ha, b | a2 = b2 = 1, (ab)5 = 1i,
where a = rs and b = s. The relation (ab)5 = 1 corresponds to r5 = 1 and the relation a2 = 1 is
equivalent to rsrs = 1, which can be rewritten as sr = r−1 s. This shows that D5 = ha, b | a2 = b2 =
1, (ab)5 = 1i. 4
This last example illustrates two important properties. First, Example 3.8.3 and the standard
presentation for D5 give two distinct presentations for the same group. Second, in the standard
presentation of the dihedral group D5 , the relations r5 = 1 and s2 = 1 coupled with the fact that
|D5 | = 10, may lead a reader to speculate that the order of a group is equal to the product of the
order of the generators. This is generally not the case as Example 3.8.3 shows.
In the examples given so far, we began with a well-defined group and gave a presentation of it.
More importantly, we can define a group by a presentation. However, before providing examples,
it is useful to introduce free groups. Free groups possess interesting properties but force us to be
precise in how we use symbols in abstract groups.
3.8. GROUP PRESENTATIONS 125
w = sα 1 α2 αm
1 s2 · · · sm , (3.8)
where m ∈ N∗ , si ∈ S not necessarily distinct and αi ∈ Z − {0} for 1 ≤ i ≤ m. (The order of the si
matters.) If the word is not the empty word 1, we call m the length of the word. For example, if
S = {x, y, z}, then
x2 y −4 zx13 x−2 yyyz 2 , yzzzyzx−2 , yz 2 y −3 z
are examples of words in S. We call a word reduced if si+1 6= si for all 1 ≤ i ≤ m − 1. In the above
examples of words from {x, y, z}, only the last word is reduced.
We define F (S) as the set of all reduced words of S. We define the operation · of concatenation of
reduced words by concatenating the expressions and then eliminating adjacent symbols with powers
that collect or cancel. More precisely, for all w ∈ F (S), w · 1 = 1 · w = w and for two nonempty
β1 β2
reduced words a = sα 1 α2 αm βn
1 s2 · · · sm and b = t1 t2 · · · tn , then the concatenation
β1 β2
(sα1 α2 αm βn
1 s2 · · · sm ) · (t1 t2 · · · tn )
is as follows:
Case 1. the empty string 1 if m = n, sm+1−i = ti and βi = −αm+1−i for all i with 1 ≤ i ≤ m;
α β β
Case 2. the reduced word sα1 α2 βn
1 s2 · · · sm−k tk+1 tk+2 · · · tn if sm−k 6= tk+1 and if sm+1−i = ti and
m−k k+1 k+2
We call k the overlap of the pair (a, b). A few examples of this concatenation-reduction are
Theorem 3.8.4
The operation · of concatenation-reduction on F (S) is a binary operation and the pair
(F (S), ·) is a group with identity 1, and for w 6= 1 expressed as in (3.8),
w−1 = s−α
m
m
· · · s−α
2
2 −α1
s1 .
Proof. In all three cases for the definition of concatenation of reduced words, the resulting word is
such that no successive symbol is equal. Hence, the word is reduced and · is a binary operation on
F (S).
That the empty word is the identity is built into the definition of concatenation. Furthermore,
Case 1 establishes that the inverse of sα 1 α2 αm −αm
1 s2 · · · sm is sm · · · s−α
2
2 −α1
s1 .
126 CHAPTER 3. GROUPS
The difficulty of the proof resides in proving that concatenation is associative. For the rest of
this proof, we denote the length of a word w ∈ F (S) as L(w). Let a, b, and c be three arbitrary
reduced words in F (S). Let h be the overlap of (a, b) and let k be the overlap of (b, c).
First, suppose that h + k < L(b). Then we can write a = a0 a00 , b = b0 b00 b000 , and c = c0 c00 , where
a00 b0 are the symbols reduced out or reduced to a word of length 1 in the concatenation a · b, and
suppose that b000 c0 are the symbols reduced out or reduced to a word of length 1 in the concatenation
b · c. We write a · b = a0 [a00 b0 ]b00 b000 , where [a00 b0 ] stands for removed or reduced to a word of length 1
depending on Case 2 or Case 3 of the concatenation-reduction. Then, as reduced words
(a · b) · c = (a0 [a00 b0 ]b00 b000 ) · c = a0 [a00 b0 ]b00 [b000 c0 ]c00 = a · (b0 b00 [b000 c0 ]c00 ) = a · (b · c).
Suppose next that h+k = L(b). We write a = a0 a00 , b = b0 b00 and c = c0 c00 so that a·b = a0 [a00 b0 ]b00
and b · c = b0 [b00 c0 ]c00 . Then
(a · b) · c = (a0 [a00 b0 ]b00 ) · c = (a0 [a00 b0 ]) · ([b00 c0 ]c00 ) = a · (b0 [b00 c0 ]c00 ) = a · (b · c).
Finally, suppose that h + k > L(b). Now we subdivide each reduced word into three parts as
a = a0 a00 a000 , b = b0 b00 b000 , and c = c0 c00 c000 where
a · b = a0 [a00 a000 b0 b00 ]b000 and b · c = b0 [b00 b000 c0 c00 ]c000 .
Now any of these subwords can be the empty word. However, in order for the reductions to occur
as these subwords are defined, a few relations must hold. For the lengths of the subwords, we must
have
L(b00 ) = L(a00 ) = L(c00 ) = h + k − L(b),
L(a000 ) = L(b0 ) = h − L(b00 ),
L(c0 ) = L(b000 ) = k − L(b00 ).
−1 −1
Furthermore, we must have b0 = (a000 ) and c0 = (b000 ) and
α −α α
a00 = sα1 α2 m−1 αm
1 s2 · · · sm−1 sm , b00 = sm
−αm m−1
sm−1 · · · s2−α2 sβ1 , c00 = s−β α2 m−1 γ
1 s2 · · · sm−1 sm .
α
= (a0 sα1 α2 m−1 αm 000 0 −αm +γ 000
1 s2 · · · sm−1 sm a ) · (b sm c ) = a · (b0 sm
−αm +γ 000
c )
= a · (b · c).
In all three possible cases of combinations of overlap, associativity holds.
We conclude that (F (S), ·) is a group.
Definition 3.8.5
Let S be a set of symbols. The F (S) equipped with the operation · of concatenation, is
called the free group on S. The cardinality of S is called the rank of the free group F (S).
The term “free” refers to the fact that its generators do not have any relations among them
besides the relations imposed by power rules,
xα xβ = xα+β , for all x ∈ G, and all α, β ∈ Z. (3.9)
In fact, we defined the concatenation operation on reduced words as we did in order to satisfy (3.9).
Because it has no relations, the free group on a given set of symbols is as complicated as a group
can get when generated by those symbols. All free groups are infinite because each symbol in a
reduced word can carry powers of any nonzero integer and also because there exists reduced words
of arbitrary length.
3.8. GROUP PRESENTATIONS 127
Example 3.8.6. Consider the following three groups. To illustrate the result of various sets of
relations, we consider presentations that have the same number of generators, each of the same
orders.
G1 = hx, y | x3 = y 7 = 1, xy = yxi,
G2 = ha, b | a3 = b7 = 1, ab = b2 ai,
G3 = hu, v | u3 = v 7 = 1, uv = v 2 u2 i.
xk y ` = xm y n ⇐⇒ xk−m = y n−`
and this equality only holds when 3 divides k − m and 7 divides n − `. Hence, G1 is a group of
order 21 in which the elements operate as (xk y l )(xm y n ) = xk+m y l+n . In fact, it is easy to see that
G1 ∼
= Z3 ⊕ Z7 and by Exercise 3.7.19, we deduce that G1 ∼ = Z21 .
In G2 , from the relation ab = b2 a, we see that all the a symbols may be moved to the right of
any b symbols, though possibly changing the power on b. In particular,
n
an b = an−1 b2 a = an−2 (ab)ba = an−2 b2 aba = an−2 b4 a2 = · · · = b2 an ,
and also
n n n n
an bk = b2 an bk−1 = b2 b2 an bk−2 = · · · = bk2 an .
Thus, every element in G can be written as bm an . Also, since a3 = b7 = 1, then bi aj with 0 ≤ i ≤ 6
and 0 ≤ j ≤ 3 give all the elements of the group. The same reasoning used for G1 shows that all
21 of these elements are distinct. Hence, G2 is a group of order 21 but in which the group elements
operate according to
`
(bk a` )(bm an ) = bk+m2 a`+n .
In G3 , the relation uv = v 2 u2 does not readily show that in every expression of u and v, the v’s
can be moved to the left of the u’s. Consider the following equalities coming from the relation
The presentation of G3 in Example 3.8.6 shows that the combination of relations in a presentation
can lead to implicit relations.
The following example offers a more extreme example of the size of a group in relation to the
orders of the generators.
128 CHAPTER 3. GROUPS
Example 3.8.7. Recall the infinite dihedral group D∞ . Example 3.3.8 described a presentation of
D∞ as
D∞ = hx, y | x2 = y 2 = 1i.
The element xy has infinite order. Setting z = xy, we have xz = y = y −1 = y −1 x−1 x = z −1 x.
Hence, since the set {x, y} can be obtained from the set {x, z} and vice versa by operations, then
another presentation of D∞ is
D∞ = hx, z | x2 = 1, xz = z −1 xi.
This presentation resembles the standard presentation of a dihedral group, except that z (which is
similar to r) has infinite order. 4
It is always possible to find a presentation of any finite group. We can take the set of generators
to be the set of all elements in the group and the set of relations as all the calculations in the Cayley
table. More often than not, however, we are interested in describing the group with a small list
of generators and relations. Depending on the group, it may be a challenging problem to find a
minimal generating subset.
A profound result that illustrates the possible complexity of working with generators and rela-
tions, the Novikov-Boone Theorem [50, 10] states that in the context of a given presentation, given
two words w1 and w2 , there exists no algorithm to decide if w1 = w2 . With certain specific relations,
it may be possible to decide if two words are equal. For example, with the dihedral group, there is
an algorithm to reduce any word to one in a complete list of distinct words.
Proof. We define the function ϕ : G → H by ϕ(gi ) = hi for i = 1, 2, . . . , k and for each element
α`
g ∈ G, if g = uα1 α2
1 u2 · · · u` with uj ∈ {g1 , g2 , . . . , gk }, then
def
ϕ(g) = ϕ(u1 )α1 ϕ(u2 )α2 · · · ϕ(u` )α` .
By construction, ϕ satisfies the homomorphism property ϕ(xy) = ϕ(x)ϕ(y) for all x, y ∈ G. However,
since different words can be equal, we have not yet determined if ϕ is a well-defined function.
Two words v and w in the generators g1 , g2 , . . . , gk are equal if and only if there is a finite
sequence of words w1 , w2 , . . . , wn such that v = w1 , w = wn , and wi to wi+1 are related to each
other by either one application of a power rule (as given in Proposition 3.2.13) or one application
of a relation Rj . Since the elements h1 , h2 , . . . , hk ∈ H satisfy the same relations R1 , R2 , . . . , Rs as
g1 , g2 , . . . , gk , then the same equalities apply between the words ϕ(wi ) and ϕ(wi+1 ) as between wi
and wi+1 . This establishes the chain of equalities
ϕ(v) = ϕ(w1 ) = ϕ(w2 ) = · · · = ϕ(wn ) = ϕ(w).
3.8. GROUP PRESENTATIONS 129
Hence, if v = w are words in G, then ϕ(v) = ϕ(w). Thus, ϕ is a well-defined function and hence is
a homomorphism.
Example 3.8.9. We use Theorem 3.8.8 to prove that the subgroup h(1 2 3)(4 5), (1 2)i of S5 is
isomorphic to D6 . We set up a function from {r, s}, the standard generators of D6 to S5 by
r 7→ (1 2 3)(4 5) and s 7→ (1 2).
6
Obviously r6 = 1 and ((1 2 3)(4 5)) = 1 while s2 = 1 and (1 2)2 = 1. In D6 , we also have the
relation rs = sr−1 while in S5 ,
−1
(1 2 3)(4 5)(1 2) = (1 3)(4 5) and (1 2) ((1 2 3)(4 5)) = (1 2)(1 3 2)(4 5) = (1 3)(4 5).
Thus, (1 2 3)(4 5) and (1 2) satisfy the same relations as r and s. Hence, by Theorem 3.8.8, this map-
ping on generators extends to a homomorphism ϕ : D6 → S5 . Obviously, ϕ(D6 ) = h(1 2 3)(4 5), (1 2)i.
However, it is not hard to verify that h(1 2 3)(4 5), (1 2)i consists of exactly 12 elements. Hence, ϕ is
injective and by Exercise 3.7.24, D6 ∼ = ϕ(D6 ) = h(1 2 3)(4 5), (1 2)i. 4
r4 s r5 s r3
r4
r2
r2 r r5
1 r
3
r s r 3 1 s
r2 s
r4 r5
rs r5 s
2
r s rs s
The Cayley graph of a group presentation depends only on the set of generators of the group.
Hence, for a known group G, a set of generators (without necessarily supplying all the relations
between them) is sufficient to define a Cayley graph of G. For example, it is not hard to show
that S4 = h(1 2 3), (1 2 3 4)i. Figure 3.12 shows the Cayley graph for S4 using these generators. The
double edges correspond to left multiplication by (1 2 3) and the single edges to left multiplication
by (1 2 3 4). This Cayley graph has the adjacency structure of the Archimedean solid named a
rhombicuboctahedron.
130 CHAPTER 3. GROUPS
(142)
(1324)
(34)
(123)
(14)(23)
(143) (24)
(1342) (1243)
(1234)
(132)
(13)
(12)(34)
(23)
(124) (234)
(13)(24)
(14)
(1432)
(12)
(134)
(243)
(1423)
25. Consider the sculpture entitled Quintrino depicted in Figure 3.13 and let G be its group of symmetry.
(a) Show that |G| = 60.
(b) Show that G can be generated by two elements σ and τ .
(c) Show that G can be viewed as a subgroup of S12 by writing σ and τ explicitly as elements in
S12 .
(d) (*) Show that G is isomorphic to A5 .
[Other sculptures by the same artist can be found at http://bathsheba.com.]
3.9. GROUPS IN GEOMETRY 133
3.9
Groups in Geometry
Groups arise naturally in many areas of mathematics, often as groups of certain functions.
Geometry in particular offers many examples of groups. The dihedral group, which we introduced
in Section 3.1 as a motivation for groups, comes from geometry. However, there are countless
connections between group theory and geometry, ranging from generalizations of dihedral groups
(reflection groups, e.g., [40]) to applications in advanced differential geometry and topology. Elliptic
curves, a certain family of curves in the plane, themselves possess a natural group structure. In fact,
group theory became so foundational to geometry that in 1872, Felix Klein proposed the Erlangen
program: to classify all geometries using projective geometry and groups of allowed transformations.
In this section, we introduce a few instances in which groups arise in geometry in an elementary
way. This section only offers a glimpse of these topics.
The plane equipped with this distance function is called the Euclidean plane. More generally, the
Euclidean n-space is the set Rn equipped with the Euclidean distance function
v
u n
uX
d(~x, ~y ) = t (yi − xi )2 .
i=1
Definition 3.9.1
An isometry of Euclidean space is a function f : Rn → Rn that preserves the distance,
namely f satisfies
d(f (~x), f (~y )) = d(~x, ~y ) for all ~x, ~y ∈ Rn . (3.10)
The Greek etymology of isometry is “same measure.” It is possible to broaden the concept of
Euclidean distance to more general notions of distance. This is formalized by metric spaces.
Definition 3.9.2
A metric space is a pair (X, d) where X is a set and d is a function d : X × X → R≥0
satisfying
(1) (identity of equal elements) d(x, y) = 0 if and only if x = y;
(2) (symmetry) d(x, y) = d(y, x) for all x, y ∈ X;
More generally than Definition 3.9.1, an isometry between metric spaces (X, d) and (Y, d0 ) is a
bijection f : X → Y that satisfies (3.10) for all x, y ∈ X. However, for the rest of this section, we
only discuss isometries of Euclidean spaces.
Some examples of isometries in the plane include a rotation about some fixed point A by an
angle α and reflection about a line L. Many others exist. Figure 3.14 shows an isometry obtained
by reflecting about a line L followed by a translation. However, without knowing all isometries, we
can nonetheless establish the following proposition.
134 CHAPTER 3. GROUPS
B0
C0
A0 B 00
D0 C 00
A00
D C D00
L
A B
Proposition 3.9.3
The set of isometries on Euclidean space Rn is a group under composition.
The following properties of isometries are helpful for the proof of Proposition 3.9.3.
Proposition 3.9.4
Let f be an isometry of Euclidean space. For arbitrary points A, B, C ∈ Rn we denote by
A0 = f (A), B 0 = f (B), and C 0 = f (C).
(1) f preserves betweenness;
(2) f preserves collinearity;
−→ −−→ −−→ −−−→
(3) if AC = λAB, then A0 C 0 = λA0 B 0 ;
(4) 4A0 B 0 C 0 is congruent to 4ABC;
Proof. A point B is said to be between two points A and C if d(A, B) + d(B, C) = d(A, C). Since
an isometry preserves the distance function, then d(A0 , B 0 ) + d(B 0 , C 0 ) = d(A0 , C 0 ). Hence, B 0 is
between A0 and C 0 .
In Euclidean geometry, three points are collinear if one is between the other two. Since isometries
preserve betweenness, they preserve collinearity.
−→ −−→
The vector equation AC = λAB with λ ≥ 0 holds when {A, B, C} is a set of collinear points,
with A not between B and C and when d(A, C) = λd(A, B). Since f is an isometry, d(A0 , C 0 ) =
−−→ −−−→
d(A, C) = λd(A, B) = λd(A0 , B 0 ). Hence, A0 C 0 = λA0 B 0 . When λ ≤ 0, this vector equality means
that {A, B, C} is a set of collinear points, with A between B and C and when d(A, C) = |λ|d(A, B).
−−→ −−−→
Then d(A0 , C 0 ) = d(A, C) = λd(A, B) = λd(A0 , B 0 ) and thus A0 C 0 = λA0 B 0 .
That 4A0 B 0 C 0 is congruent to 4ABC follows from the Side-Side-Side Theorem of Euclidean
Geometry.
Let ABCD be a nondegenerate parallelogram. Then 4ABD and 4BDC are congruent to each
and to 4A0 B 0 D0 and 4B 0 D0 C 0 . Hence, A0 B 0 C 0 D0 is a nondegenerate parallelogram, congruent to
ABCD.
3.9. GROUPS IN GEOMETRY 135
Proof (of Proposition 3.9.3). We provide the proof for the Euclidean plane (n = 2) but the proof
generalizes to arbitrary n.
We first must check that function composition is a binary operation on the set of isometries of
Rn . Let f, g be two isometries of Rn . Then for any P, Q ∈ Rn ,
Thus, f ◦ g is an isometry.
Function composition is always associative (Proposition 1.1.15).
The identity function id : Rn → Rn is an isometry and satisfies the group axioms for an identity
element.
In order to show that the set of isometries is closed under taking inverses, we need to show that
an arbitrary isometry f is a bijection and that the inverse function is again an isometry. Suppose
that f (P ) = f (Q) for two points P, Q ∈ R2 . Then d(f (P ), f (Q)) = 0. Since f is an isometry, then
d(P, Q) = 0 and hence P = Q. This shows that every isometry is injective.
Establishing that an isometry is a surjection requires the most work. Let O be a point in
−→ −→ −→
the domain and let A1 , A2 , . . . , An be points so that OA1 , OA2 , . . . , OAn form a basis of Rn . Let
O0 = f (O) and A0i = f (Ai ). By Proposition 3.9.4(5) and an induction argument, we deduce that
−− → −−→ −−→
O0 A0 1 , O0 A0 2 , . . . , O0 A0 n is a basis of the codomain. Let Q be an arbitrary point in Rn . There exist
real numbers λ1 , λ2 , . . . , λn such that
−−→ −→
We define a sequence of points P1 , P2 , . . . , Pn by OP 1 = λ1 OA1 and
−−→ −−→ −→
OP i = OP i−1 + λi OAi for 1 < i ≤ n.
Then
−−0−→0 −−0−→0 −−→ −−→ −−→
O P = O P n = λ1 O0 A0 1 + λ2 O0 A0 2 + · · · + λn O0 A0 n
Thus, the inverse function is also an isometry. This proves that the set of isometries is closed under
taking inverses.
Theorem 3.9.5
A function f : Rn → Rn is an isometry of Euclidean space if and only if
f (~x) = A~x + ~b
for some matrix A such that A> A = I and any constant vector ~b. In particular, isometries
of the Euclidean plane are of the form
a1 −sa2 b
~x 7−→ ~x + 1 ,
a2 sa1 b2
Proof. Proposition 3.9.4 (3 and 5) show that an isometry is a linear transformation on the set of
displacement vectors (though not position vectors). Suppose that f : Rn → Rn is an isometry and
that f (O) = (b1 , b2 , . . . , bn ). Let P be an arbitrary point with coordinates (x1 , x2 , . . . , xn ) and let
P 0 = f (P ). Then
−−→0 −−→0 −−0−→0 −−→
OP = OO + O P = ~b + AOP = ~b + A~x
−−−→ −−−→
for some matrix A since O0 P 0 is related to OOP by a linear transformation.
Let P and Q be arbitrary points with coordinates (x1 , x2 , . . . , xn ) and (y1 , y2 , . . . , yn ). For the
distance, we have d(P, Q) = k~y − ~xk. Since f is an isometry,
k~y − ~xk = d(P, Q) = d(f (P ), f (Q)) = k(A~y + ~b) − (A~x + ~b)k = kA(~y − ~x)k.
1
k~xk2 + k~y k2 − k~y − ~xk2 .
~x · ~y =
2
Hence, since A is a matrix satisfying kA(~y − ~x)k = k~y − ~xk for arbitrary ~x and ~y , then
Since this holds for arbitrary vectors, in particular the basis vectors, we deduce that A> A = I, the
identity matrix.
If n = 2, we consider the identity A> A = I with a generic matrix
2
a + b2 ac + bd
1 0 a b a c
= = .
0 1 c d b d ac + bd c2 + d2
Hence,
2 2
a + b
=1
ac + bd = 0
2
c + d2 = 1.
Note first that if a = 0, then b = ±1, d = 0 and c = ±1, with the ±’s on b and c independent. If
a 6= 0, then c = −bd/a and the third equation gives, (−bd/a)2 + d2 = 1, which gives b2 d2 + a2 d2 = a2
and then a2 = d2 , using the first equation. But then 0 = (a2 + b2 ) − (c2 + d2 ) = b2 − c2 , so b2 = c2
also. Finally using the second equation, we deduce that d = εa, while c = −εb, with ε ∈ {−1, 1}
with the condition a2 + b2 = 1, still holding. The theorem follows.
A matrix A ∈ GLn (R) satisfying A> A = I is called the set of orthogonal matrices. The set of all
real orthogonal n × n matrices is a subgroup of GLn (R), called the orthogonal group and denoted
by O(n).
3.9. GROUPS IN GEOMETRY 137
If we set a point as the origin, then by Theorem 3.9.5 we see that O(n) is the subgroup of
isometries that leaves the origin fixed. However, let p~ ∈ Rn be any other point. The set of isometries
that leave p~ fixed is the set
tp~ O(n)t−1
~ = {tp
p
−1
~ ◦ f ◦ tp
~ | f ∈ O(n)},
where tp~ is a translation by the vector p~. By Exercise 3.7.37, we see that the subgroup of isometries
that leave a given point fixed is conjugate and hence isomorphic to the subgroup of isometries that
leave the origin fixed.
Because det(A> ) = det(A) for all n × n real matrices, then every orthogonal matrix satisfies
det(A)2 = 1. This gives two possibilities for the determinant of an orthogonal matrix, namely 1 or
−1.
Definition 3.9.6
An isometry f : Rn → Rn as described in Definition 3.9.5, is called direct (resp. indirect)
if det(A) = −1 (resp. det(A) = −1).
Example 3.9.7. In linear algebra, we find that equations of transformation for rotation of an angle
of α about the origin are
x cos α − sin α x
7−→ .
y sin α cos α y
It is easy to check that rotations are direct isometries. The equations of transformation for reflection
through a line through the origin making an angle of β with the x-axis are
x cos 2β sin 2β x
7−→ .
y sin 2β − cos 2β y
In contrast to rotations, reflections through lines are indirect isometries. 4
Example 3.9.8. To find the equations of transformation for the rotation of angle α about a point
p~ = (p1 , p2 ) besides the origin, we obtain it as a composition of a translation by −~
p, followed by
rotation about the origin of angle α, then followed with a translation by p~. The equations then are
x p1 cos α − sin α x − p1
7−→ + .
y p2 sin α cos α y − p2 4
From Theorem 3.9.5 applied to the case of the Euclidean plane, we can deduce (see Exercise 3.9.6)
that an isometry is uniquely determined by how it maps a triangle 4ABC into its image 4A0 B 0 C 0 .
In other words, knowing how an isometry maps three noncollinear points is sufficient to determine
the isometry uniquely.
The orthogonal group is an important subgroup of the group of isometries. In the rest of the
section, we consider two other types of subgroups of the group of isometries of the Euclidean plane.
Definition 3.9.9
A (discrete) frieze group is a subgroup of the group of Euclidean plane isometries whose
subgroup of translations is isomorphic Z.
In the usual group of isometries in the plane, the translations form a subgroup isomorphic to R2 .
We sometimes use the description of “discrete” for frieze groups in contrast to “continuous” because
there is a translation of least positive displacement.
138 CHAPTER 3. GROUPS
Example 3.9.10. Consider for example the following pattern and let G be the group of isometries
of the plane that preserve the structure of the pattern.
L1 L2
P Q Q3
−−→
The subgroup of translations of G consists of all translations that are an integer multiple of 2P Q.
Some other transformations in G include
because s21 = 1 and s23 = 1. Hence, s3 commutes with s1 . Similarly s3 commutes with s2 . Hence,
all elements in G can be written as an alternating string of s1 and s2 or an alternating string of s1
and s2 followed by s3 . For example, rotation by π about Q3 is
−−→
• tk s1 s3 for k ∈ Z≤0 (rotation through the point k P Q from P ); or
−−→
• tk s2 s3 for k ∈ Z≥0 (rotation through the point k P Q from Q).
This gives us a full description of all elements in G and it also shows that there exist no relations in
G not implied by those in the presentation in (3.11). 4
Following the terminology in Section 3.1.3, if a pattern has a Frieze group G of symmetry, we
call a subset of the pattern a fundamental region (or fundamental pattern) if the entire pattern is
obtained from the fundamental region by applying elements from G and that this region is minimal
among all subsets that generate the whole pattern. For example, a fundamental region for the
pattern in Example 3.9.10 is the following.
Frieze patterns are ubiquitous in artwork and architecture throughout the world. Figures 3.15
through 3.18 show a few such patterns.
Definition 3.9.11
A wallpaper group is a subgroup of the group of Euclidean plane isometries whose subgroup
of translations is isomorphic Z ⊕ Z.
Consider for example, Figures 3.19a and 3.19b. These two figures illustrate patterns covering
the Euclidean plane, each preserved by a different wallpaper group.
To first see that each is a wallpaper group, notice that in Figure 3.19a, the fundamental region
consisting of two starfish
can be translated to an identical pattern along any translation ~t = a~ı + b~, where a, b ∈ Z and ~ı
corresponds to one horizontal unit translation and ~ to one vertical unit. Clearly, the translation
subgroup of isometries preserving the pattern is Z ⊕ Z. In Figure 3.19b, a flower can be translated
into any other flower (ignoring shading) by
√ !
3 1
a ~ı + ~ + b~ for a, b ∈ Z.
2 2
3.9. GROUPS IN GEOMETRY 141
The two unit translation vectors involved are linearly independent and again the translation subgroup
for this pattern of symmetry is isomorphic to Z ⊕ Z.
To see that the wallpaper groups for Figures 3.19a and 3.19b are not isomorphic, observe that in
Figure 3.19b, the pattern is preserved under a rotation by π/3 around any center of a flower, while
Figure 3.19a is not preserved under any isometry of order 6.
We point out that Figure 3.19a displays an interesting isometry, called a glide reflection. We can
pass from one fundamental region to another copy thereof via a reflection through a line, followed
by a vertical translation.
The glide reflection composed with itself is equal to the vertical translation with distance equal
to the least vertical gap between identical regions. Note that in a glide reflection, the translation
(glide) is always assumed to be parallel to the line through which the reflection occurs. One must
still specify the length of the translation. For example, a glide reflection f through the x-axis with
a translation distance of +2 has for equations
x 1 0 x 2 x+2
f = + = .
y 0 −1 y 0 −y
The Dutch artist M. C. Escher (1898–1972) is particularly well-known for his exploration of
interesting artwork involving wallpaper symmetry groups.1 Part of the genius of his artwork resided
in that he devised interesting and recognizable patterns that were tessellations, patterns in which,
unlike Figure 3.19a, there is no blank space. In a tessellation, the fundamental region tiles and
completely covers R2 .
It is possible to classify all wallpaper groups. Since this section only offers a glimpse of applica-
tions of group theory to geometry, we do not offer a proof here or give the classification.
Over the centuries, the study of symmetry patterns of the plane has drawn considerable interest
both by artists and mathematicians alike. Many books study this topic from a variety of directions.
From a geometer’s perspective, [32] offers a careful and encyclopedic analysis of planar symmetry
patterns.
Frieze groups and wallpaper groups are examples of crystallographic groups. A crystallographic
group of Rn is a subgroup G of the group of isometries on Rn such that the subgroup of translations
in G is isomorphic to Zn . The adjective “crystallographic” is motivated by the fact that regular
crystals fill Euclidean space in a regular pattern whose translation subgroup is isomorphic to Z3 .
5. We work in R3 . Show by direct matrix calculations that rotation by θ about the x-axis, followed by
rotation by α about the y-axis is not generally equal to the rotation by α about the y-axis followed
by rotation by θ about the x-axis.
6. Use Theorem 3.9.5 to prove the claim that it suffices to know how an isometry f maps three non-
collinear points to know f uniquely.
7. Let L1 and L2 be two parallel lines in the Euclidean plane. Prove that reflection through L1 followed
by reflection through L2 is a translation of vector ~v that is twice the perpendicular displacement from
L1 to L2 .
8. Let L1 and L2 be two lines in the plane that are not parallel. Prove that reflection through L1 followed
by reflection through L2 is a rotation about their point of intersection of an angle that is double the
angle from L1 to L2 (in a counterclockwise direction).
9. Let A = (0, 0), B = (1, 0), and C = (0, 1). Determine the isometry obtained by composing a rotation
about A of angle 2π/3, followed by a rotation about B of angle 2π/3, followed by a rotation about C
of angle 2π/3. Find the equations of transformation and describe it in simpler terms.
10. Let f be the isometry of rotation by α about the point A = (a1 , a2 ) and let g be the isometry of
rotation by β about the point B = (b1 , b2 ). Show that f ◦ g may be described by a rotation by α + β
about B followed by a translation and give this translation vector.
11. Let f be the plane Euclidean isometry of rotation by α about a point A and let t be a translation by a
vector ~v . Prove that the conjugate f ◦ t ◦ f −1 is a translation and determine this translation explicitly.
12. Let g be the plane Euclidean isometry of reflection through a line L and let t be a translation by a
vector ~v . Prove that the conjugate g ◦ t ◦ g −1 is a translation and determine this translation explicitly.
13. Prove that orthogonal matrices have determinant 1 or −1. Prove also that
is a subgroup of O(n) and hence of GLn (R). [The subgroup SO(n) is called the special orthogonal
group.]
14. Prove that SO(2) (see Exercise 3.9.13) consists of matrices
cos θ − sin θ
θ ∈ [0, 2π) .
sin θ cos θ
satisfies the conditions of a metric on R2 as defined in Definition 3.9.2. Prove also that the set of
surjective isometries for dt is a group and show that it is not equal to the group of Euclidean isometries.
[Hint: This metric on R2 is called the taxi metric because it calculates that distance a taxi would
travel between two points if it could only drive along north-south and east-west streets.]
16. Find a presentation for the frieze group associated to the pattern in Figure 3.15.
17. Find a presentation for the frieze group associated to the pattern in Figure 3.16.
18. Find a presentation for the frieze group associated to the pattern in Figure 3.18. Show that it is the
same as the frieze group associated to the pattern in Figure 3.17.
19. Find a presentation for the frieze group associated to the following pattern and then sketch the
fundamental pattern.
20. (*) Prove that there are only 7 nonisomorphic frieze groups.
21. Prove that the wallpaper group for the following pattern is not isomorphic to the wallpaper groups
for either Figure 3.19a or 3.19b.
3.9. GROUPS IN GEOMETRY 143
22. Prove that the wallpaper group for the following pattern is not isomorphic to the wallpaper groups
for either Figure 3.19a or 3.19b.
For Exercises 3.9.23 through 3.9.26, sketch a reasonable portion of the pattern generated by the following fun-
damental pattern and the group indicated in each exercise. Assume the minimum distance for any translation
is 2.
3.10
Diffie-Hellman Public Key
3.10.1 – A Brief Background on Cryptography
In this section, we will study an application of group theory to cryptography, the science of keeping
information secret.
Cryptography has a long history, with one of the first documented uses of cryptography attributed
to Caesar. When writing messages he wished to keep in confidence, the Roman emperor would shift
each letter by 3 to the right, assuming the alphabet wraps around. In other words, he would
substitute a letter of A with D, B with E and so forth, down to replacing Z with C. To anyone
who intercepted the modified message, it would look like nonsense. This was particularly valuable
if Caesar thought there existed a chance that an enemy could intercept orders sent to his military
commanders.
After Caesar’s cipher, there came letter wheels in the early Renaissance, letter codes during the
American Civil War, the Navajo windtalkers during World War II, the Enigma machine used by
the Nazis, and then a whole plethora of techniques since then. Military uses, protection of financial
data, and safety of intellectual property have utilized cryptographic techniques for centuries. For
a long time, the science of cryptography remained the knowledge of a few experts because both
governments and companies held that keeping their cryptographic techniques secret would make it
even harder for “an enemy” to learn one’s information security tactics.
Today, electronic data storage, telecommunication, and the Internet require increasingly complex
cryptographic algorithms. Activities that are commonplace like conversing on a cellphone, opening
a car remotely, purchasing something online, all use cryptography so that a conversation cannot be
intercepted, someone else cannot easily unlock your car, or an eavesdropper cannot intercept your
credit card information.
Because of the proliferation of applications of cryptography in modern society, no one should
assume that the cryptographic algorithm used in any given instance remains secret. In fact, modern
cryptographers do not consider an information security algorithm at all secure if part of its effective-
ness relies on the algorithm remaining secret. But not everything about a cryptographic algorithm
can be known to possible eavesdroppers if parties using the algorithm hope to keep some message
secure. Consequently, most, if not all, cryptographic techniques involve an algorithm but also a
“key,” which can be a letter, a number, a string of numbers, a string of bits, a matrix or some other
mathematical object. The security of the algorithm does not depend on the algorithm staying secret
but rather on the key remaining secret. Users can change keys from time to time without changing
the algorithm and have confidence that their messages remain secure.
A basic cryptographic system involves the following objects.
(1) A message space M. This can often be an n-tuple of elements from some alphabet A (so
M = An ) or any sequence from some alphabet A (so M = AN ). The original message is
called plaintext.
(2) A ciphertext space C. This is the set of all possible hidden messages. It is not uncommon for
C to be equal to M.
(3) A keyspace K that provides the set of all possible keys to be used in a cryptographic algorithm.
(4) An encryption procedure E ∈ Fun(K, Fun(M, C)) such that for each key k ∈ K, there is an
injective function Ek : M → C.
(5) A decryption procedure D ∈ Fun(K, Fun(C, M)) such that for each key k ∈ K, there exists a
key k 0 ∈ K, with a function Dk0 : C → M satisfying
In many algorithms, k 0 = k but that is not necessarily the case. (The requirement that Ek be
injective makes the existence of Dk0 possible.)
In an effective cryptographic algorithm, it should be very difficult to recover the keys k or k 0 given
just ciphertext c = Ek (m) (called a “ciphertext only attack”) or even given ciphertext c = Ek (m)
and the corresponding plaintext m (called a “ciphertext and known plaintext attack”).
From the mathematician’s viewpoint, it is interesting that all modern cryptographic techniques
rely on number theory and advanced abstract algebra that is beyond the understanding of the vast
majority of people. Companies involved in designing information security products or protocols
must utilize advanced mathematics.
Now imagine that you begin a communication with a friend at a distance (electronically or
otherwise) and that other people can listen in on everything that you communicate to each other.
Would it be possible for that communication to remain secret? More specifically, would it be
possible to together choose a key k (for use in a subsequent cryptographic algorithm) so that people
who are eavesdropping on the whole communication do not know what that key is. It seems very
counterintuitive that this should be possible but such algorithms do exist and are called public key
cryptography techniques.
In this section, we present the Diffie-Hellman protocol. Devised in 1970, it was one of the first
public key algorithms. An essential component of the effectiveness of the Diffie-Hellman protocol is
the Fast Exponentiation Algorithm.
which involves n − 1 operations. (If fact, when we implement this into a computer algorithm, since
we must take into account the operation of incrementing a counter, the above direct calculation
takes a minimum of 2n − 1 computer operations.) If the order |g| and the power n are large, one
may not notice any patterns in the powers of g that would give us any shortcuts to determining g n
with fewer than n − 1 group operations.
The Fast Exponentiation Algorithm allows one to calculate g n with many fewer group operations
than n, thus significantly reducing the calculation time.
The reason that x has the value of g n at the end of the for loop is because when the algorithm
terminates,
k k−1
x = g bk 2 +bk−1 2 +···+b1 2+b0 ,
which is precisely g n . Note that if we write n in binary with bits n = (bk bk−1 · · · b1 b0 )2 , then there
is an assumption that bk = 1.
Each time through the for loop, we do either one or two group operations. Hence, we do at
most 2k = 2blog2 nc groups operations. In practice, when implementing this algorithm, getting the
146 CHAPTER 3. GROUPS
binary expansion for n takes k + 1 = blog2 nc + 1 operations (integer divisions) and the operation of
decrementing the counter i takes a total of k operations. This gives a total of at most 4blog2 nc + 1
computer operations.
Example 3.10.1. Let G = U (311). Note that 311 is a prime number. We propose to calculate 7̄39 .
The binary expansion of 39 is 39 = (100111)2 = 25 + 22 + 21 + 20 . Following the steps of the
algorithm, we
• assign x := 7̄;
2
• for i = 3, since b3 = 0, assign x := x2 = 49 = 224;
2
• for i = 2, since b2 = 1, assign x := 7̄x2 = 7̄ × 224 = 113;
2
• for i = 1, since b1 = 1, assign x := 7̄x2 = 7̄ × 113 = 126;
2
• for i = 0, since b0 = 1, assign x := 7̄x2 = 7̄ × 126 = 105.
In the above example, we performed 8 group calculations as opposed to the necessary 38 had
we simply multiplied 7̄ to itself 38 times. This certainly sped up the process for calculating 7̄39 by
hand. However, running a for loop with 39 iterations obviously does not come close to straining a
computer’s capabilities.
1234567890123
379
in U (435465768798023). Using the standard way of finding powers, it would take 1234567890122
operations in G, a number that begins to require a significant computing time. However, using the
Fast Exponentiation Algorithm, we only need to do at most 2(blog2 1234567890123c + 1) = 82 group
operations. For completeness, we give the result of the calculation
1234567890123
379 = 370162048004176. 4
Both of the above examples involve groups of the form U (p) but Fast Exponentiation applies in
any group.
Send g a
(2) Choose a ∈ N
Send g b
(3) Choose b ∈ N
(4) Calculate (g b )a
(5) Calculate (g a )b
(1) Alice (on the left) and Bob (on the right) settle on a group G and a group element g, called
the base. Ideally, the order of g should be very large. (If you are doing calculations by hand,
the order of g should be in the hundreds. If we are using computers, the order of g is ideally
larger than what a typical for loop runs through, say 1015 or much more.)
(2) Alice chooses a relatively large integer a and sends to Bob the group element g a , calculated
with Fast Exponentiation.
(3) Bob chooses a relatively large integer b and sends to Alice the group element g b , calculated
with Fast Exponentiation.
(6) Alice and Bob will now use the group element g ab as the key.
The reason why Eve cannot easily figure out g ab is simply a matter of how long it would take
her to do so. We should assume that Eve intercepts the group G, the base g, the element g a , and
the element g b . However, Eve does not know the integers a or b. In fact, Alice does not know b and
Bob does not know a. The reason why this is safe is that Fast Exponentiation makes it possible to
calculate powers of group elements quickly while, on the other hand, the problem of determining n
from knowing g and g n could take as long as n. If the power n is very large, it simply takes far too
long. Not knowing a or b, Eve cannot quickly determine the key g ab if she only knows g, g a , and g b .
The fact that the security of the Diffie-Hellman public key exchange relies on the speed of one
calculation versus the relative slowness of a reverse algorithm may seem unsatisfactory at first.
However, suppose that it takes only microseconds for Alice and Bob to establish the key g ab but
centuries (at the present computing power) for Eve to recover a from g and g a . The secret message
between Alice and Bob would have lost its relevance long before Eve could recover the key.
The problem of finding n given g and g n in some group is called the Discrete Log Problem. Many
researchers in cryptography study the Discrete Log Problem, especially in groups of the form U (n),
for n ∈ N. For reasons stemming from modular arithmetic, there are algorithms that calculate n
from g and g n using fewer than n operations. However, Diffie-Hellman using some other groups,
do not have such weaknesses. A popular technique called elliptic curve cryptography is one such
example of the Diffie-Hellman protocol that does not possess some of the weaknesses of U (n).
In any Diffie-Hellman public key exchange, some choices of group G and base g are wiser than
others. It is preferable to choose a situation in which |g| is high. Otherwise, an exhaustive calculation
on the powers of g would make a brute-force solution to the Discrete Log Problem a possibility for
148 CHAPTER 3. GROUPS
Eve. Furthermore, given a specific g, some choices of a and b are not wise either. For example, if g a
(or g b ) has a very low order, then g ab would also have a very low order. Then, even if Eve does not
know the key for certain, she would only be a small number of possible keys in the form (g a )k .
(1) Alice and Bob settle on a group G and on the base, a group element g. The plaintext space
M can be any set, the ciphertext space C will be sequences of elements in G, and the keypsace
is K = G.
(2) Alice and Bob also choose a method of encoding the message into a sequence of elements in
G, i.e., they choose an injective function h : M → Fun(N∗ , G).
(3) They run Diffie-Hellman to obtain their public key k = g ab .
(4) Alice encodes her message m with a sequence of group elements h(m) = (m1 , m2 , . . . , mn ).
(5) Alice sends to Bob the group elements Ek (m) = (km1 , km2 , . . . , kmn ) = (c1 , c2 , . . . , cn ). Note
that for each i, we have ci = g ab mi .
(6) To decipher the ciphertext, we have k 0 = k = g ab . Bob calculates the mi from mi = ci k −1 =
ci (g ab )−1 .
[We point out that with a group element of large order, it is not always obvious how to
determine the inverse of a group element. We use Corollary 4.1.12, which says that g |G| = 1.
Hence, to calculate g −ab , without knowing a but only knowing g a , using Fast Exponentiation,
Bob calculates
(g a )|G|−b = g a|G|−ab = g −ab .]
(7) Since h is injective, Bob can find Alice’s plaintext message m from (m1 , m2 , . . . , mn ).
Example 3.10.3. We use the group G = U (3001) and choose for a base the group element g = 2̄.
(It turns out that |g| = 1500. In general, one does not have to know the order of g, but merely
hope that it is high.) The message space is the set of sequences M = Fun(N∗ , A) where A is the set
consisting of the 26 English letters and the space character.
Alice and Bob decide to encode their messages (define h : M → Fun(N∗ , G)) as follows. Ignore all
punctuation, encode a space with the integer 0 and each letter of the alphabet with its corresponding
ordinal, so A is 1, B is 2, and so on, where we allow for two digits for each letter. Hence, a space
is actually 00 and A is 01. Then group pairs of letters simply by adjoining them to make (up to) a
four-digit number. Thus, “GOOD-BYE CRUEL WORLD” becomes the finite sequence
715, 1504, 225, 500, 318, 2105, 1200, 2315, 1812, 400
where we completed the last pair with a space. We now view these numbers as elements in U (3001).
Alice chooses a = 723 while Bob chooses b = 1238. In binary, a = (1, 0, 1, 1, 0, 1, 0, 0, 1, 1)2 and
b = (1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0)2 . Using Fast Exponentiation, Alice calculates that g a = 2̄723 = 1091
and Bob calculates that g b = 2̄1238 = 1056.
723
Alice just wants to say, “HI BOB.” She first calculates g ab = 1056 = 2442. Her corresponding
ab
message in group elements is: 809, 2, 1502. The ciphertext mi g for i = 1, 2, 3 is the string of group
elements: 920, 1883, 662.
1762
On his side, Bob now first calculates the element (g a )|G|−b = 1091 = 102. The deciphered
code is
920 × 102 = 809, 1883 × 102 = 2̄, 662 × 102 = 1502.
Bob then easily recovers “HI BOB” as the original message. 4
3.10. DIFFIE-HELLMAN PUBLIC KEY 149
In the following exercises, the reader should understand that the situations presented all are
small enough to make it simple for a computer to perform a brute force attack to find a from g a
or b from g b . In real applications, the group G, the base g, the elements a and b are chosen large
enough by parties so make the Discrete Log Problem not tractable using a brute force attack.
in GL2 (R).
6. Suppose that Alice and Bob decide to use the group U (3001) as their group G and the element 5̄ as
the base. If Alice chooses a = 73 and Bob chooses b = 129, then what will be their common key using
the Diffie-Hellman public key exchange algorithm?
7. Suppose that Alice and Bob decide to use the group U (4237) as their group G and the element 11
as the base. If Alice chooses a = 100 and Bob chooses b = 515, then what will be their common key
using the Diffie-Hellman public key exchange algorithm?
In Exercises 3.10.8 to 3.10.11, use the method in Example 3.10.3 to take strings of letters to strings of
numbers. Each time, use the same method to change a letter to a numbers and only collect two letters to
make an integer that is at most 4 digits.
8. Play the role of Alice. Use the group G = U (3001) and the base g = 2̄. Change a and b to a = 437
and b = 1000. Send to Bob the ciphertext for the following message: “MEET ME AT DAWN”
9. Play the role of Alice. Use the group G = U (3001) but use the base g = 7̄. Bob sends you g b = 2442
and you decide to use a = 2319. Send to Bob the ciphertext for the following message: “SELL ENRON
STOCKS NOW”
10. Play the role of Bob. Use the group G = U (3517) and use the base g = 11. Alice sends you g a = 1651
and you tell Alice you will use b = 789. You receive the following ciphertext from Alice:
Show the steps using fast exponentiation to recover the decryption key g −ab and use this to recover
the plaintext for the message that Alice sent you.
11. Play the role of Bob. Use the group G = U (7522) and use the base g = 3. Alice sends you g a = 2027
and you tell Alice you will use b = 2013. You receive the following ciphertext from Alice:
(b0 , b1 , . . . , b9 ) −→ b0 + b1 · 2 + · · · + bk · 2k + · · · + b9 · 29 = mi .
This is a plaintext unit mi and we view it as an element in U (p). With this setup to convert strings
of bits to elements in U (p), we then apply the usual Diffie-Hellman key exchange and the ElGamal
150 CHAPTER 3. GROUPS
encryption. As one extra layer, when Alice sends the cipher text to Bob, she writes it as a bit string,
but with the difference that since numbers c = mg ab can be expressed uniquely as an integer less than
211 , blocks of 10 bits become blocks of 11 bits.
Here is the exercise. You play the role of Alice. You and Bob decide to use p = 1579 and the base
of g = 7. Bob sends you g b = 993 and you decide to use a = 78. Show all the steps to create the
Diffie-Hellman key g ab . Use this to create the ciphertext as described in the previous paragraph for
the following string of bits:
1011010110 1011111001 1111100010.
Show all the work.
13. Use the setup as described in Exercise 3.10.12 to encipher bit strings. This time, however, you play
the role of Bob. You and Alice use the group U (1777) and the base g = 10, Alice sends you g a = 235,
and you choose to use b = 1573. Alice has sent you the following string of bits:
Show all the steps to create the Diffie-Hellman decryption key g −ab . Turn the ciphertext into strings of
elements in U (1777). By multiplying by g −ab , recover the list of elements in U (1777) that correspond
to plaintext. (These should be integers mi with 0 ≤ mi ≤ 1023.) Convert them to binary to recover
the plaintext message in bit strings.
14. We design the following application of Diffie-Hellman. We choose to encrypt 29 characters: 26 letters
of the alphabet, space, the period “.”, and the comma “,”. We associate the number 0 to a space
character, 1 through 26 for each of the letters, 27 to the period, and 28 to the comma. We use the
group G = GL2 (F29 ), the general linear group on modular arithmetic base 29. Given a message in
English, we write the numerical values of the characters in a 2 × n matrix, reading the characters of
the alphabet by successive columns. Hence, “SAY FRIEND AND ENTER” would become the matrix
19 25 6 9 14 0 14 0 14 5
M= ∈ M2×10 (F29 )
1 0 18 5 4 1 4 5 20 18
where we have refrained from putting the congruence bars over the top of the elements only for brevity.
Then given a key K ∈ GL2 (F 29 ), we
encrypt the message into ciphertext by calculating the matrix
3 4
C = KM . Hence, with K = the ciphertext matrix becomes
5 9
3 17 3 18 0 4 0 20 6 0
C=
17 9 18 3 19 9 19 16 18 13
and “CQQICRRI SDI STPFR M” is the ciphertext message in characters. [Note that this enciphering
scheme is not an ElGamal enciphering scheme as described in Section 3.10.4. Here, C = M and we do
not use a function h so that the enciphering function Ek does not involve products in the group G.]
Here is the exercise. You play the role of Alice. You and Bob decide on the above enciphering scheme.
You will chosethe key
K in the usual Diffie-Hellman
manner. You use the group G = GL2 (F29 ) and
1 2 b 27 24
the base g = . Bob sends you g = and you choose to use a = 17. Calculate the
3 5 7 17
Diffie-Hellman key and use this to determine the ciphertext corresponding to “COME HERE, NOW.”
(Since there is an odd number of characters in the message, append a space on the end to make a
message of even length.)
15. We use the communication protocol as described in Exercise 3.10.14. Youuse the same group and the
a 5 27
same base but this time you play the role of Bob. Alice sends you g = and you chose to
26 1
use b = 12. After you send your g b to Alice, she then creates the public key and sends you the message
“LPSKQIMBW.ECRBHL” in ciphertext. Show all the steps with fast exponentiation to calculate
the deciphering key g −ab and recover the plaintext message. (Note that since we know how to take
inverses of matrices in GL2 (F28 ), it suffices to calculate g ab and then find the inverse as opposed to
calculating (g a )|G|−b .)
16. We design the following Diffie-Hellman/ElGamal setup. We choose to encrypt 30 characters: 26 letters
of the alphabet, space, the period “.”, the comma “,” and the exclamation point “!”. We associate
the number 0 to a space character, 1 through 26 for each of the letters, 27 to the period, 28 to the
3.11. SEMIGROUPS AND MONOIDS 151
comma and 29 to “!”. We choose to compress triples of characters as follows: (b1 , b2 , b3 ), where each
bi ∈ {0, 1, . . . , 29}, corresponds to the number
b1 × 302 + b2 × 30 + b3 .
The resulting possible numbers are between 0 and 303 − 1 = 26, 999. Now, the smallest prime bigger
than 303 is p = 27011. We will work in the group U (27011) and we will view messages as sequences
of elements in G encoded as described above. For example: “HI FRANK!” uses the compression of
Using Fast Exponentiation, determine the inverse of the public key, g −ab . Decipher the sequence of
numbers corresponding to Alice’s plaintext message. From the message coding scheme, determine
Alice’s original message (in English).
17. Use the text to strings of groups elements as described in Exercise 3.10.16. Use the same group
G = U (27011) but use the base g = 5. Play the role of Alice. Select a = 10, 000 while Bob sends you
g b = 15128. Show all the steps to create the string of elements in G that are the ciphertext for the
message “I WILL SURVIVE.”
3.11
Semigroups and Monoids
In this section, we present two more algebraic structures closely related to groups. They possess
value in their own right but we present them here to illustrate more examples of algebraic structures.
Monoids in particular are regularly studied in far more detail than our brief overview. Consequently,
for each structure we follow the outline in the preface of this book.
3.11.1 – Semigroups
Definition 3.11.1
A semigroup is a pair (S, ◦), where S is a set and ◦ is an associative binary operation on S.
Having introduced groups already, we can see that a semigroup resembles a group but with only
the associativity axiom. Note that Proposition 2.3.7 holds in any semigroup. Obviously, every group
is a semigroup.
In every semigroup (S, ◦), because of associativity, the order in which we group the operations
in an expression of the form a ◦ a ◦ · · · ◦ a does not change the result. Hence, we denote by ak the
k times
z }| {
(unique) element a ◦ a ◦ · · · ◦ a.
Example 3.11.2. The set of positive integers equipped with addition (N>0 , +) is a semigroup. 4
Example 3.11.3. All integers equipped with multiplication (Z, ×) is also a semigroup. That not
all elements have inverses prevented (Z, ×) from being a group but that does not matter for a
semigroup. 4
152 CHAPTER 3. GROUPS
R2
R 1 ∩ R2
R1
Example 3.11.4. Let S = Z and suppose that a ◦ b = max{a, b}. It is easy to see that for all
integers a, b, c,
Note that this includes finite vertical (if a = b) or horizontal (if c = d) lines as well as the empty set
∅ (if b < a of d < c). The intersection of two elements in S is
so ∩ is a binary operation on S. (See Figure 3.20.) We know that ∩ is associative so the pair (S, ∩)
is a semigroup. As in Example 3.11.4, there is no identity element. If an identity element U existed,
then U ∩ R = R for all R ∈ S. Hence, R ⊆ U for all R ∈ S and thus,
!
[
R = R2 ⊆ U.
R∈S
Out of the outline given in the preface for the study of different algebraic structures, we briefly
mention direct sum semigroups, subsemigroups, generators, and homomorphisms.
Definition 3.11.6
Let (S, ◦) and (T, ?) be two semigroups. Then the direct sum semigroup of (S, ◦) and (T, ?)
is the pair (S × T, ·), where S × T is the usual Cartesian product of sets and where
Definition 3.11.7
A subset A a semigroup (S, ◦) is called a subsemigroup if a ◦ b ∈ A for all a, b ∈ A.
3.11. SEMIGROUPS AND MONOIDS 153
A subsemigroup is a semigroup in its own right, using the operation inherited from the containing
semigroup. In the theory of groups, in order for a subset H of a group G to be a group in its own
right, H needed to be closed under the operation and taking inverses. For semigroups, the condition
on taking an inverse does not apply.
Definition 3.11.8
Let (S, ◦) and (T, ?) be two semigroups. A function f : S → T is called a semigroup
homomorphism if
f (a ◦ b) = f (a) ? f (b) for all a, b ∈ S.
A semigroup homomorphism that is also a bijection is called a semigroup isomorphism.
Example 3.11.9. Let p be a prime number and recall the ordp : N∗ → N function from (2.4). The
ordp function is a semigroup homomorphism from (N∗ , ×) to (N, +). This claim means that for all
a, b ∈ N∗ , we have
ordp (ab) = ordp (a) + ordp (b).
This was proven in Exercise 2.1.26. 4
Example 3.11.10. Consider the following four functions fi : {1, 2, 3} → {1, 2, 3} expressed in chart
notation (see Section 3.4.1) as
1 2 3 1 2 3 1 2 3 1 2 3
f1 = , f2 = , f3 = , f4 = .
1 2 2 3 2 3 2 2 2 3 2 2
We claim that the set S = {f1 , f2 , f3 , f4 } together with function composition is a semigroup. We
know that function composition is associative. To prove the claim, we simply need to show that ◦
is a binary operation on S. The table of composition is:
f1 f2 f3 f4
f1 f1 f3 f3 f3
f2 f4 f2 f3 f4
f3 f3 f3 f3 f3
f4 f4 f3 f3 f3
This last example illustrates the use of a Cayley table to easily see the operations in a semigroup.
Though we introduced Cayley tables in Section 3.2.1 in reference to groups, it makes sense to discuss
a Cayley table in the context of any finite set S equipped with a binary operation. In such a general
Cayley table, the elements that appear on the top row or leftmost column are the elements of the
set S and all the entries of the table must be other elements of the set S. However, for semigroups,
we see that there need not be a row and column indicating the identity operation. Furthermore,
since a semigroup does not necessarily contain inverses to elements, the Cayley table of a semigroup
is not necessarily a Latin square.
3.11.2 – Monoids
Definition 3.11.11
A monoid is a pair (M, ∗), where S is a set and ∗ is a binary operation on M , such that
Definition 3.11.12
A monoid (M, ∗) is called commutative if a ∗ b = b ∗ a for all a, b ∈ M .
If a monoid is commutative, we typically write the operation with an addition symbol and denote
the identity by 0. Consequently, it is not uncommon to say “let (M, +) be a commutative monoid”
when discussing a commutative monoid.
Definition 3.11.13
A monoid (M, ∗) is said to possess the cancellation property if for all a, b, c ∈ M ,
a ∗ c = b ∗ c ⇐⇒ a = b and
c ∗ a = c ∗ b ⇐⇒ a = b.
It is important to note that a monoid may have the cancellation property without possessing
inverses, as we will see in some examples below.
Example 3.11.14. The nonnegative integers with addition (N, +) is a monoid. The presence of 0
in N gives the nonnegative integers the needed identity. This monoid is commutative. This monoid
also possess the cancellation property. 4
Example 3.11.15. Positive integers with multiplication (N∗ , ×) is a monoid. This possesses the
necessary identity but has no inverses. Note that (N, ×) is also a monoid: Associativity still holds
and because 1 × 0 = 0, so 1 is still an identity for all N. This monoid is commutative.
Observe that (N∗ , ×) has the cancellation property but (N, ×) does not. A counterexample for
the latter is 0 × 1 = 0 × 2 but 1 6= 2. 4
Example 3.11.16. Let A be a set and let F = Fun(A, A) be the set of all functions f : A → A. Since
function composition is associative, (F, ◦) is a (noncommutative) monoid under function composition
◦ with the identity function idA as the identity element. Functions that are bijections have inverses,
but in a monoid not every element must have an inverse. 4
Example 3.11.17. Let (M, ∗) be a monoid. If S, T are subsets of M , then we define
def
S ∗ T = {s ∗ t | s ∈ S and t ∈ T }.
Then (P(M ), ∗) is a monoid. This result is not obvious and we leave it as an exercise for the reader
(Exercise 3.11.10). This is called the power set monoid. 4
Example 3.11.18 (Free Monoid, Monoid of Strings). Let Σ be a set of characters. Consider
the set of finite strings of elements from Σ, including the empty string, denoted by 1. Authors who
work with this structure in the area of theoretical computer science usually denote this set of strings
by Σ∗ . We define the operation of concatenation · on Σ∗ by
def
a1 a2 · · · am · b1 b2 · · · bn = a1 a2 · · · am b1 b2 · · · bn .
The pair (Σ∗ , ·) is a monoid where the empty string is the identity. This is called the free monoid
on the set of characters Σ. 4
The concepts of product structures, subobjects, and homomorphisms are nearly identical as with
groups or semigroups. We give an explicit description here for completeness.
Definition 3.11.19
Let (M, ∗) and (N, ◦) be two monoids. Then the direct sum monoid of (M, ∗) and (N, ◦)
is the pair (M × N, ·), where M × N is the usual Cartesian product of sets and where
We commonly say that the operation is taken componentwise. Note that as with groups and
other algebraic structures, the product structure generalizes immediately to a product of any finite
number of monoids. In fact, it is also possible to make sense of the direct product of an infinite
collection of monoids but this requires some technical care and we leave it for later.
Definition 3.11.20
A nonempty subset A a monoid (M, ∗) is called a submonoid if a ∗ b ∈ A for all a, b ∈ A
(closed under ∗) and if 1M ∈ A.
A submonoid is a monoid in its own right, using the operation inherited from the containing
monoid. If we compare this definition to Definition 3.5.1, we observe that in Definition 3.5.1 we
required the subgroup to be closed under the operation and closed under taking inverses. Then a
simple proof showed that the identity must be in every subgroup. However, that simple proof involved
taking an inverse. Since elements do not necessarily have inverses in a monoid, the definition of a
submonoid explicitly needed to refer to the identity being in the submonoid.
Definition 3.11.21
Let (M, ∗) and (N, ◦) be two monoids. A function f : M → N is called a monoid homo-
morphism if
Definition 3.11.22
Let (M, ∗) be a monoid. We define the opposite monoid as the pair (M op , ∗op ) where, as
sets, M op = M and the operation is
a ∗op b = b ∗ a.
Obviously, if (M, ∗) is a commutative monoid then (M op , ∗op ) = (M, ∗). More precisely, the
identity function id : M → M is a monoid isomorphism if and only if the monoid is commutative.
Example 3.11.23 (State Machine). In theoretical computer science, a state machine or semi-
automaton is a triple (Q, Σ, T ) where
This set theoretic construct models a machine that can possess various states and depending on an
input from Σ changes from one state to another.
Now for every word w ∈ Σ∗ we define a function Tw : Q → Q as follows:
The set of functions gives all possible finite compositions of transition functions arising from tran-
sitions produced from one input symbol at a time. This set of functions is denoted by M (Q, Σ, T ).
It is a monoid under function composition and it is called the transition monoid or input monoid .
156 CHAPTER 3. GROUPS
This shows that t is a monoid homomorphism. Hence, we have shown t is a monoid isomorphism.
Note that the use of the opposite monoid was necessary primarily because we are reading a word
w = s1 s2 · · · sm from left to right, while we apply the functions Tsi from right to left. 4
We leave the proof that this is an equivalence relation as an exercise. (See Exercise 3.11.19.) The
presence of the “+k for some k ∈ M ” may seem strange but we explain it later. We write [(m1 , m2 )]
as the equivalence class for the pair (m1 , m2 ) and we denote by M f = (M ⊕ M )/ ∼ the set of all
equivalence classes of ∼ on M × M .
Now suppose that (a1 , a2 ) ∼ (b1 , b2 ) and (c1 , c2 ) ∼ (d1 , d2 ). Then
a1 + b2 + k = a2 + b1 + k
c1 + d2 + ` = c2 + d1 + `
is in fact well-defined.
Proposition 3.11.24
The set M f equipped with the operation + defined in (3.13) is a group. The identity in M f
is [(0, 0)] and the inverse of [(m, 0)] is [(0, m)]. Furthermore, if M possesses the cancellation
property, then M is isomorphic to the submonoid {[(m, 0)] ∈ M f | m ∈ M }.
Suppose now that M possesses the cancellation property. Because of (3.13), the function f : M →
M with f (m) = [(m, 0)] is a monoid homomorphism. The image of f is {[(m, 0)] ∈ M
f f | m ∈ M }.
However, f (m) = f (n) is equivalent to (m, 0) ∼ (n, 0) which is equivalent to
m+k =n+k
for some k ∈ M . Thus, m = n by the cancellation property. Then f is an injective function and is
thus a monoid isomorphism between M and {[(m, 0)] ∈ M f | m ∈ M }.
We now see why we needed the “+k for some k ∈ M ” in (3.12). It is possible in M with two
unequal elements m and n for there to exist a k such that m + k = n + k. In an abelian group
m + k = n + k implies m = n. Hence, in M f, we would need f (m) and f (n) to be the same element.
In the Grothendieck construction applied to (N, +), elements in N e are equivalence classes of pairs
(a, b). Since N has the cancellation property, (a, b) ∼ (c, d) if and only if a + d = b + c. We can
think of these pairs as a displacement of magnitude the difference of a and b to the right (positive)
if a ≥ b and to the left (negative) if b ≥ a. Identifying Z = N,
e we view positive and negative integers
as displacement integers with a direction.
Though we have illustrated the Grothendieck construction for a simple example of a way to
properly define negative integers, the construction first arose in a far more abstract context. It
appears again in abstract algebra in a variety of interesting places.
a 4op b ⇐⇒ b 4 a.
Prove that the function f : L → Lop defined by f (a) = a is a semigroup isomorphism between (L, ◦),
where a ◦ b gives the least upper bound between a and b in (L, 4) and (Lop , ?), where a ? b gives the
greatest lower bound between a and b in (Lop , 4op ).
6. Let U be a set and P(U ) the power set of U . Prove that (P(U ), ∩) is a monoid but not a group.
7. Let U be a set and P(U ) the power set of U . Prove that (P(U ), ∪) is a monoid but not a group.
8. Let U be a set and consider the function f : P(U ) → P(U ) defined by f (A) = A, set complement.
Prove that f is a monoid isomorphism from (P(U ), ∩) to (P(U ), ∪).
9. Let F be the set of functions from {1, 2, 3, 4} to itself and consider the monoid (F, ◦) with the operation
of function composition. Let S be the smallest subsemigroup of F that contains the functions
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
f1 = , f2 = , f3 = , f4 = .
1 1 2 3 4 2 2 3 4 1 3 3 4 1 2 4
12. Call M the set of nonzero polynomials with real coefficients. The pair (M, ×) is a monoid with the
polynomial 1 as the identity. Decide which of the following subsets of M are submonoids and justify
your answer.
(a) Nonconstant polynomials.
(b) Polynomials whose constant coefficient it 1.
(c) Palindromic polynomials. [A polynomial an xn + · · · + a1 x + a0 is called palindromic if an−i = ai
for all 1 ≤ i ≤ n.]
(d) Polynomials with odd coefficients.
(e) Polynomials with an−1 = 0.
(f) Polynomials with no real roots.
13. Let ϕ : M → N be a monoid homomorphism. We define the kernel of the monoid homomorphism as
Ker ϕ = {m ∈ M | ϕ(m) = 1N }.
Show that Ker ϕ is a submonoid of M .
14. Let ϕ : M → N be a monoid homomorphism. We define the image of the monoid homomorphism as
Im ϕ = {n ∈ N | n = ϕ(m) for some m ∈ M }.
Show that Im ϕ is a submonoid of N .
15. We denote by Z[x] the set of polynomials with coefficients in Z. Consider the monoids (Z[x] − {0}, ×)
and (C, +). Define the function γm : Z[x] − {0} → C by
d
X
γm (p(x)) = rim ,
i=1
where deg p(x) = d and r1 , r2 , . . . , rd are the roots of p(x), listed with multiplicity. Note that γm is
only defined on nonzero polynomials and that if p(x) is a constant polynomial then γm (p(x)) = 0
because p(x) has no roots.
(a) Prove that γm is a monoid homomorphism.
(b) (*) Prove that the image of γm is the submonoid (Z, +).
16. Prove that C 0 (R, R), the set of continuous real-valued function from R, is a submonoid of Fun(R, R)
(equipped with composition).
17. Consider the monoid M = Fun(R, R) and consider the function
−x if x<0
0 if 0≤x<1
f (x) =
1 − x if 1≤x<2
2 ≤ x.
x if
3.12
Projects
Project I. Rubik’s Cube. Let K be the group of operations on the Rubik’s Cube.
(1) Determine a set of generators and explore some of the relations among them.
(2) Is K naturally a subgroup of some Sn ? If so, how and for what n?
(3) What are the orders of some elements that are not generators?
(4) Can you determine the size of K either from the previous question or from other reason-
ing?
(5) Explore any other properties of the Rubik’s Cube group.
Project II. Matrix Groups. Consider the family of groups GLn (Fp ), where n is some positive
integer with n ≥ 2 and p is a prime number. Study some of them. Can you provide generators
for some of them? What is the center? What are some subgroups? Explore any related topics.
Maple has a package, accessible by typing
with(LinearAlgebra[Modular]);
that is optimized for working with linear algebra in modular arithmetic. Refer to the help files
to understand how to use the procedures in this package.
Project III. Shuffling Cards and S52 . A shuffle of a card deck is an operation that changes the
order of a deck and hence can be modeled by an element of S52 . In this project, study patterns
of shuffling cards. Two popular kinds of shuffles are the random riffle shuffle and the overhand
shuffle. A perfect riffle involves cutting the deck in half and interlacing one card from one half
with exactly one card from the other half. How would you model shuffling styles by certain
permutations in S52 ? Might patterns occur in shuffling?
Project IV. Sudoku and Group Theory. If you are not familiar with the Sudoku puzzle, visit
the Wikipedia website found at
http://en.wikipedia.org/wiki/Sudoku
for a description but do not consult other sources. Let S be the set of all possible solutions
to a Sudoku puzzle, i.e., all possible ways of filling out the grid according to the rules. There
exists a group of transformations G on specific numbers, rows, and columns such that given any
solution s and any g ∈ G, we can know for certain that g(s) is another solution. Determine and
describe concisely this group G. Can you describe it as a subgroup of some large permutation
group? Can you find the size of G?
We will call two fillings s1 and s2 of the Sudoku grid equivalent if s2 = g(s1 ) for some g ∈ G.
Explore if there exists nonequivalent Sudoku fillings.
Project V. The 15 Puzzle. Visit
http://en.wikipedia.org/wiki/Fifteen_puzzle
to learn about the so-called 15-puzzle. Here is an interesting question about this puzzle.
Suppose that you start with tiles 1 through 15 going left to right, top row down to bottom
row, with the empty square in the lower right corner. Is it possible to obtain every theoretical
configuration of tiles on the board? (For reference, label the empty slot as number 16.) If it
is not, try to find the subgroup of S16 or S15 of transformations that you can perform on this
puzzle. Make sure what you work with is a group. Can you generalize?
160 CHAPTER 3. GROUPS
ax + b
Project VI. Groups of Functions. Consider the set of functions of the form f (x) = ,
cx + d
where a, b, c, d ∈ Z. (Do not worry about domains of definition or, if you insist, just think of
the domains of these functions as on R − Q.) Show that (with perhaps a restriction on what
a, b, c, d are) this set of functions is a group. Try to find generators and relations for this group
or any other way to describe the group.
Project VII. Group Theory in a CAS. Many computer algebra systems (CAS) have a package
for group theory. In Maple version 16 or below, the appropriate package is accessed with the
command with(group);. In Maple version 17 or higher, this package was deprecated in favor
of with(GroupTheory);. Your CAS should offer help files on the command provided in the
package. By reading the help files, become familiar with as many commands as possible at the
level available to you. Become familiar with these commands and demonstrate your ability by
doing the following: Be able to define a subgroup of Sn , define a group with generators and
relations, calculate some group orders, and find 10 interesting nonisomorphic subgroups that
have order 40 or greater (than are not symmetric or dihedral groups).
Project VIII. Groups of Rigid Motions of Polyhedra. Let Π be a polyhedron. Call G(Π)
the group of rigid motions in R3 that map Π into Π. For example, If Π is the cone over a
pentagon, then G(Π) = Z5 . Does G(Π) consist of only transformations that are rotations
about an axis? Find G(Π) for the regular polyhedra. Find G(Π) for some irregular polyhedra
that do not have G(Π) = {1}. For a given group G, does there exist a polyhedron Π such that
G(Π) = G?
Project IX. Music and Group Theory. Read the article “Musical Actions of the Dihedral
Group” (see [18]). Using between 5 and 7 pages, summarize the main themes of this article in
your own words, making careful use of the group theory. Offer analysis according to the music
theory described in this article of some musical pieces of your own choosing.
Project X. Permutations and Inversions. In Section 3.4.3, we introduced the notion of the
number of inversions of a permutation to discuss the parity of a permutation. Let n be a fixed
positive integer. Consider the function F : Sn → P(Tn ) such that F (σ) consists of the set of
pairs in Tn that are inverted by σ. Study as many properties about this function as you can.
Here are a few questions to motivate your investigations. Attempt to ask and answer other
questions. Is F injective or surjective? Given an element U in F (Sn ), can you give a concise
method to find all σ ∈ Sn such that F (σ) = U ? Is F (Sn ) closed under taking unions? If so,
can you describe how to calculate w from σ and τ where F (σ) ∪ F (τ ) = F (w)? Repeat the
same two questions with intersections and set complements.
Project XI. Diffie-Hellman and ElGamal. Program and document a computer program that
implements the ElGamal encryption scheme on a file using a key that is created via the Diffie-
Hellman procedure. Use a group G and a base g ∈ G such that the order of g is larger than
current computers will run a for loop in a reasonable amount of time. Feel free to choose M
as you think is effective and h : M → Fun(N∗ , G) as you wish.
4. Quotient Groups
One of the most fascinating aspects of group theory is how much internal structure follows by virtue
of the three group axioms. Groups possess much more internal structure than we have yet seen in
Chapter 3. The internal properties will often permit us to create a “quotient group,” which is a
smaller group that retains some of the group information and conflates other information.
The process of describing this type of internal structure and creating a quotient group already
arose in Section 2.2 on modular arithmetic. Consequently, we review modular arithmetic as a
motivating example for defining quotient groups.
Fix a positive integer n greater than 1. Let G = (Z, +) and consider the subgroup H = nZ. We
defined the congruence relation as
a ≡ b ⇐⇒ n | (b − a) ⇐⇒ b − a ∈ H.
We proved that ≡ was an equivalence relation and we defined the congruence class a as the set of
all elements that are congruent to a. Note that
a = {. . . , a − 2n, a − n, a, a + n, a + 2n, . . .} = a + nZ = a + H.
We defined Z/nZ as the set of equivalence classes modulo n. Explicitly, Z/nZ = {0̄, 1̄, 2̄, . . . , n − 1}.
Furthermore, we showed that addition behaves well with respect to congruence, by which we mean
that a ≡ c and b ≡ d imply that a + b ≡ c + d. Then, defining addition on Z/nZ as
def
ā + b̄ = a + b
is in fact well-defined. From this, we were able to create the group (Z/nZ, +). We will soon say
that Z/nZ is the quotient group of Z by its subgroup nZ.
··· -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 ··· Z
quotient process
0 1 2 3 4 Z/5Z
The construction of quotient groups will be similar to the construction of modulo arithmetic.
However, Z has many properties (e.g., abelian, cyclic, infinite) that make the construction simpler
or do not illustrate some important consequences of the general construction of quotient groups.
Section 4.1 introduces the concept of cosets, which immediately leads to Lagrange’s Theorem,
a deep theorem about the internal structure of groups. Section 4.2 presents characterizations of
normal subgroups, which are necessary for the construction of quotient groups. Section 4.3 gives
the construction for quotient groups, provides many examples, and illustrates the connection to
equivalence classes on the group that behave well with respect to the group operation. Section 4.4
develops a number of theorems that illustrate how to understand the internal structure of a group
from knowing the structure of a quotient group. Finally, in Section 4.5, using the quotient process,
we prove a classification theorem for all abelian groups that are finitely generated.
161
162 CHAPTER 4. QUOTIENT GROUPS
4.1
Cosets and Lagrange’s Theorem
4.1.1 – Cosets
Following the guiding example of modular arithmetic provided at the beginning of the chapter, we
first consider what should play the role of ā = a + nZ in groups in general.
Definition 4.1.1
Let G be a group and let H be a subgroup. The set of elements defined by gH (respectively
Hg) is called the left (respectively right) coset of H by g.
Example 4.1.2. Consider G = D4 and consider the subgroups R = hri = {1, r, r2 , r3 } and H =
{1, s}. The left cosets of R are
Note that 1R = rR = r2 R = r3 R and sR = srR = sr2 R = sr3 R. Hence, there are only two distinct
left cosets of R, namely 1R = R, which consists of all the rotations in D4 , and sR, which consists
of all the reflections. The right cosets of R are
R = {1, r, r2 , r3 }, Rs = {s, rs, r2 s, r3 s} = {s, sr3 , sr2 , sr},
Rr = {r, r2 , r3 , 1}, Rsr = {sr, rsr, r2 sr, r3 sr} = {sr, s, sr3 , sr2 },
Rr2 = {r2 , r3 , 1, r}, Rsr2 = {sr2 , rsr2 , r2 sr2 , r3 sr2 } = {sr2 , sr, s, sr3 },
Rr3 = {r3 , 1, r, r2 }, Rsr3 = {sr3 , rsr3 , r2 sr3 , r3 sr3 } = {sr3 , sr2 , sr, s}.
Note that R = Rr = Rr2 = Rr3 and Rs = Rsr = Rsr2 = Rsr3 . Again, there are only two distinct
right cosets of R, namely R and Rs. We also note that for all g ∈ D4 , we have gR = Rg. This
resembles commutativity, but we must recall that D4 is not commutative. Before we are tempted
to think this happens for all subgroups (and we should be inclined to doubt that it would), let us
do the same calculations with the subgroup H.
The left cosets of H are
1H = {1, s}, sH = {s, 1} = {1, s},
rH = {r, rs} = {r, sr3 }, srH = {sr, srs} = {sr, r3 },
r2 H = {r2 , r2 s} = {r2 , sr2 }, sr2 H = {sr2 , sr2 s} = {sr2 , r2 },
r3 H = {r3 , r3 s} = {r3 , sr}, sr3 H = {sr3 , sr3 s} = {sr3 , r}.
There are four distinct left cosets: H = sH, rH = sr3 H, r2 H = sr2 H, and r3 H = srH. The right
cosets are
H = {1, s}, Hs = {s, 1} = {1, s},
Hr = {r, sr}, Hsr = {sr, r},
Hr2 = {r2 , sr2 }, Hsr2 = {sr2 , r2 },
3 3 3
Hr = {r , sr }, Hsr3 = {sr3 , r3 }.
Notice again that there are four distinct right cosets: H = Hs, Hr = Hsr, Hr2 = Hsr2 and
Hr3 = Hsr3 . However, with the subgroup H, it is not true that gH = Hg for all g ∈ G. For
example, Hr = {r, sr} while rH = {r, sr3 }. 4
A key point to extract from the above example is that, in general, left cosets are not equal to
right cosets, but that depending on the subgroup, it may happen nonetheless. Of course, if G is an
abelian group then for all H ≤ G and for all g ∈ G, it is true that gH = Hg.
4.1. COSETS AND LAGRANGE’S THEOREM 163
Proposition 4.1.3
Let H be a subgroup of a group G and let g ∈ G be arbitrary. Then there exists a bijection
between H and gH and between H and Hg. Furthermore, if H is a finite subgroup of G,
then |H| = |gH| = |Hg|.
Proof. Consider the function f : H → gH defined by f (x) = gx. This function is injective because
f (x1 ) = f (x2 ) implies that gx1 = gx2 so that x1 = x2 . Furthermore, the function is surjective by
definition of gH. Thus, f is a bijection between H and gH. If H is finite, then |H| = |gH|.
Similarly, the function ϕ : H → Hg defined by ϕ(x) = xg is a bijection between H and Hg.
Recall that in the motivating example of modular arithmetic, the cosets a + nZ corresponded
to the congruence classes modulo n. In general groups, cosets correspond to certain equivalence
relations. However, because groups are not commutative in general, we must consider two equivalence
relations.
Proposition 4.1.4
Let G be a group and let H be a subgroup. The relations ∼1 and ∼2 , defined respectively
as
a ∼1 b ⇐⇒ a−1 b ∈ H,
a ∼2 b ⇐⇒ ba−1 ∈ H,
are equivalence relations. Furthermore, the equivalence classes for ∼1 (resp. ∼2 ) are the
left (resp. right) cosets of H.
Proof. We prove the proposition for ∼1 . The proof for ∼2 is identical in form.
Let G be a group and let H be any subgroup. For all a ∈ G, a−1 a = 1 ∈ H, so ∼1 is reflexive.
Suppose that a ∼1 b. Then a−1 b ∈ H. Since H is closed under taking inverses, we know that
(a−1 b)−1 = b−1 a ∈ H, so b ∼1 a. Thus, ∼1 is symmetric. Now suppose that a ∼1 b and b ∼1 c. By
definition, a−1 b ∈ H and b−1 c ∈ H. Since H is a subgroup, (a−1 b)(b−1 c) = a−1 c ∈ H, which means
that a ∼1 c. Hence, ∼1 is transitive. We conclude that ∼1 is an equivalence relation.
The ∼1 equivalence class of a ∈ G consists of all elements g such that a ∼1 g, i.e., all elements g
such that there exists h ∈ H with a−1 g = h. Thus, g = ah so g ∈ aH. Conversely, if g ∈ aH, then
a−1 g ∈ H and thus a ∼1 g. This shows that the ∼1 equivalence class of a is aH.
The equivalence relations ∼1 and ∼2 are not necessarily distinct. The relations are identical if
and only if all the left cosets of H match up with right cosets of H.
If the group is abelian, left and right cosets are equal for all subgroups H and hence the relations
∼1 and ∼2 are equal. It is for this reason that we did not need to define two concepts of congruence
relations on Z. Indeed, for all a, b ∈ Z,
−a + b = b − a.
Proposition 4.1.5
Let H be a subgroup of a group G and let g1 , g2 ∈ G. Then
It is also possible to deduce Proposition 4.1.5 from the following reasoning. The equality holds
g1 H = g2 H if and only if H = g1−1 g2 H if and only if for all h ∈ H, there exists h0 ∈ H such
164 CHAPTER 4. QUOTIENT GROUPS
Hgn
..
.
H g2 H g3 H ··· gn H H
Hg3
Hg2
G G
that g1−1 g2 h = h0 . This implies that g1−1 g2 = h0 h−1 ∈ H. Conversely, if g1−1 g2 ∈ H, then writing
g1−1 g2 = h00 , for all h in H, we have g1−1 g2 h = h00 h ∈ H, so g1−1 g2 H = H and thus g2 H = g1 H. A
similar reasoning holds with right cosets.
Because equivalence classes on a set partition that set, Proposition 4.1.4 leads immediately to
the following corollary.
Corollary 4.1.6
Let H be a subgroup of a group G. The set of left (respectively right) cosets form a partition
of G.
Figure 4.1 illustrates how left and right cosets of a subgroup H partition a group G. It is
important to note that the subgroup H is both a left and a right coset, and the remaining left and
right cosets partition G − H. In the figure, the left cosets are shown to overlap the right cosets. In
general, it is possible for each left coset to be a right coset or for only some left cosets to intersect
with some right cosets.
In the example of modular arithmetic, though, the original group G = Z was infinite, for any
given modulus n, there were exactly n (a finite number of) left cosets, 0, 1, . . . , n − 1. This may
happen in a general group setting.
Definition 4.1.7
Let H be a subgroup of a group G that is not necessarily finite. If the number of distinct
left cosets is finite then this number is denoted by |G : H| and is called the index of H in
G.
For any subgroup H of a group G, the set of inverse elements H −1 is equal to H. Consequently,
the inverse function f (x) = x−1 on the group G, maps the left coset gH to the right coset (gH)−1 =
H −1 g −1 = Hg −1 . Thus, the inverse function gives a bijection between the set of left cosets and the
set of right cosets of H. In particular, if |G : H| is finite, then it also counts the number of right
cosets.
Example 4.1.8. Let G = S5 and let H = {σ ∈ G | σ(5) = 5}. It is not hard to see that H ∼ = S4 .
Hence, |G| = 120 and |H| = 24. (Ideally, if studying the cosets of H by hand, we would rather not
list out all the elements in each coset.) Note that the index of H in S5 is |S5 : H| = 5, so whether
we consider left or right cosets, we will find 5 of them. Obviously, H is both a left coset and a right
coset of H. We investigate a few other cosets of H.
Consider the left coset (1 2)H. Since (1 2) ∈ H, then (1 2)H = H. Furthermore, any transposition
(a b) in which a, b < 5 satisfies (a b)H = H. Another left coset (1 5)H 6= H because (1 5) ∈ / H. Now
consider a third coset (2 5)H. Since (2 5) ∈ / H, we know that (2 5)H 6= H. However, by Proposition
4.1.5, since (2 5)−1 (1 5) = (1 2 5) ∈ / H so (2 5)H 6= (1 5)H. In fact, for any a and b satisfying
4.1. COSETS AND LAGRANGE’S THEOREM 165
cosets in R∗ . Let S be a complete set of distinct representatives of the partition formed by the cosets.
Then, since the cosets of Q∗ partition R∗ , every positive real number must be written uniquely as
sq for s ∈ S and q ∈ Q∗ . This creates a bijection between R∗ and S × Q∗ . By Exercise 1.2.7,
if we assume that S is countable, then S × Q∗ is countable. Since R∗ is uncountable, this is a
contradiction. 4
Proof. By Proposition 4.1.3, each left coset of H has the same cardinality of |H|. Since the set of
left cosets partitions G, then the sum of cardinalities of the distinct cosets is equal to |G|. But since
each coset has cardinality |H|, we have
|G| = |H| · |G : H|,
and the theorem follows.
In the language of posets, Lagrange’s Theorem can be rephrased by saying that if G is a finite
group, then the cardinality function from (Sub(G), ≤) to (N∗ , |) is monotonic.
A number of corollaries follow from Lagrange’s Theorem.
Corollary 4.1.11
For every element g in a finite group G, the order |g| divides |G|.
166 CHAPTER 4. QUOTIENT GROUPS
Proof. The order |g| is the order of the subgroup hgi. Hence, |g| divides |G| by Lagrange’s Theorem.
Corollary 4.1.12
For every element g in a finite group G, we have g |G| = 1.
Proof. By Corollary 4.1.11, if |g| = k, then k divides |G|. So |G| = km for some m ∈ Z. Since
g k = 1 by definition of order, then g |G| = g km = (g k )m = 1m = 1.
Lagrange’s Theorem and its corollaries put considerable restrictions on possible subgroups of a
group G. For example, in Exercise 3.3.31 concerning the classification of groups of order 6, that G
does not contain elements of order 4 or of order 5 follows immediately from Lagrange’s Theorem.
As a simple application, knowing that the size of a subgroup can only be a divisor of the size of
the group may tell us whether or not we have found all the elements in a subgroup. In particular,
as the following example illustrates, if we know that a subgroup H has |H| strictly greater than the
largest strict divisor of |G|, then we can deduce that |H| = |G| and hence that H = G.
Example 4.1.13. Consider the group A4 and the subgroup H = h(1 2 3), (1 2 4)i. Since |A4 | = 12,
by Lagrange’s Theorem, the subgroups of A4 can only be of order 1, 2, 3, 4, 6, or 12. By taking
powers of the generators of H, we know that
are in H. Furthermore, (1 2 3)(1 4 2) = (1 4 3) is in H as must be its square (1 3 4). This shows that
H contains at least 7 elements. Since |H| is greater than 6, by Lagrange’s Theorem, |H| = 12 and
hence H = A4 . 4
Lagrange’s Theorem also leads immediately to the following important classification theorem.
Proposition 4.1.14
Let p be a prime number and suppose that G is a group such that |G| = p. Then G ∼
= Zp .
Proof. Let g ∈ G be a nonidentity element. Then hgi is a subgroup of G that has at least 2 elements.
By Lagrange’s Theorem, |g| = |hgi| divides p. Hence, |g| = p and hgi = G. Therefore, G is cyclic.
The proposition follows from Proposition 3.7.25.
8.5.6), guarantees that if p is a prime divisor of |G|, then G has a subgroup of order pn , where pn is
the highest power of p that divides |G|.
Even without Cauchy’s Theorem, it is sometimes possible to determine whether a group G
contains elements of certain orders by virtue of Corollary 4.1.11. The following example illustrates
the reasoning.
Example 4.1.15. Let G be a group of order 35. We prove that G must have an element of order
5 and an element of order 7. If G contains an element z of order 35 (which would imply that G is
cyclic), then z 5 has order 7 and z 7 has order 5.
Assume that G has no elements of order 7. By Corollary 4.1.11, the only allowed orders of
elements would be 1 and 5. Obviously, the identity is the only element of order 1. But if two
elements a, b are of order 5 and not powers of each other, then hai ∩ hbi = {1}. Hence, each
nonidentity element would be in a cyclic subgroup of order 5 but containing 4 nonidentity elements.
If there are k such subgroups, then we would have 4k + 1 = 35. This is a contradiction. Hence, G
must have an element of order 7.
Similarly, assume that G has no elements of order 5. Again, by Corollary 4.1.11, the only allowed
orders of elements would be 1 and 7. Any element of order 7 generates a cyclic subgroup, containing
the identity element and 6 elements of order 7. If there are h such subgroups, then we would have
6h + 1 = 35. Again, this is a contradiction. Hence, G must contain an element of order 5. 4
This subset is, in general, not a subgroup of G. It is, however, a union of certain cosets, in particular
a union of right cosets of H and also a union of left cosets of K via
[ [
HK = Hk = hK. (4.1)
k∈K h∈H
In either of the above expressions, it is possible that many terms in the union are redundant as some
of the cosets may be equal. By an analysis of cosets of H and of H ∩ K, we can prove the following
proposition.
Proposition 4.1.16
If H and K are finite subgroups of a group G, then
|H| |K|
|HK| = .
|H ∩ K|
Proof. Consider HK as the union of left cosets of K as given in (4.1). Each left coset of K has |K|
elements, so |HK| is a multiple of |K|. We simply need to count the number of distinct left cosets
of K in HK. By Proposition 4.1.5, h1 K = h2 K if and only if h−1 −1
2 h1 ∈ K. Since h2 h1 ∈ H, then
−1
h2 h1 ∈ H ∩ K, which again by Proposition 4.1.5 is equivalent to h1 (H ∩ K) = h2 (H ∩ K).
However, H ∩ K ≤ H. By the above reasoning, the number of distinct left cosets of K in
HK is the number of distinct left cosets of H ∩ K in H. By Lagrange’s Theorem, this number is
|H|/|H ∩ K|. Thus,
|H| |H| |K|
|HK| = |K| = .
|H ∩ K| |H ∩ K|
Recall that the join hH ∪ Ki of H and K is the smallest (by inclusion) subgroup of G that
contains both H and K. Obviously, hH ∪ Ki must contain all products of the form hk with h ∈ H
168 CHAPTER 4. QUOTIENT GROUPS
and k ∈ K but perhaps much more. Hence, HK is a subset of the join of H and K. If HK happens
to be a subgroup of G, then HK = hH ∪ Ki. By Lagrange’s Theorem, we can deduce that
|H| |K|
≤ |hH ∪ Ki| and |hH ∪ Ki| |G|. (4.2)
|H ∩ K|
Note that when HK is not a subgroup of G, we cannot use Lagrange’s Theorem to deduce that
|HK| divides |hH ∪ Ki|.
Example 4.1.17. As an application of the result in (4.2), let G = S5 and let H be the subgroup
of G that fixes 4 and 5, while K is the subgroup of G that fixes 2 and 3. Note that H ∼ = S3 and
K∼ = S3 so |H| = |K| = 6. Furthermore, if σ ∈ H ∩ K, then σ fixes 2, 3, 4, and 5 and hence must
also fix 1. Thus, H ∩ K = {1}. By Proposition 4.1.16, |HK| = 36. Since 36 - 120 = |S5 |, then by
Lagrange’s Theorem, we know that HK is not a subgroup of S5 . By (4.2), we can also deduce that
the join of H and K is greater than 36, but a divisor of 120. Thus, given this information, we know
that |hH ∪ Ki| is 40, 60, or 120. 4
(1, 1, z)
(1, z, z)
(z, 1, z)
(z, z, z)
(1, 1, 1)
(1, z, 1)
(z, 1, 1)
(z, z, 1)
14. Show that a complete set of distinct representative of ∼1 is not necessarily a complete set of distinct
representatives of ∼2 .
15. Let G be a group and do not assume it is finite. Prove that if H ≤ K ≤ G, then
|G : H| = |G : K| · |K : H|.
16. Let G be a group and let H and K be subgroups with |G : H| = m and |G : K| = n finite. Prove that
(a) |G : H ∩ K| ≤ mn;
(b) lcm(m, n) ≤ |G : H ∩ K|.
[Hint: Use Exercise 4.1.15.]
17. Let ϕ : G → H be a group homomorphism.
(a) Prove that the left cosets of Ker ϕ are the fibers of ϕ. [Recall that a fiber of a function f : A → B
is a subset of A of the form f −1 (b) for some b ∈ B. See (1.4).]
(b) Deduce that for all g ∈ G, the left coset g(Ker ϕ) is equal to the right coset (Ker ϕ)g.
18. Consider the Cayley graph for S4 given in Figure 3.12.
(a) Prove that the triangles (whose edges are double edges) correspond to right cosets of h(1 2 3)i.
(b) Prove that the squares with all single edges correspond to the right cosets of h(1 2 3 4)i.
(c) Prove that the squares with mixed edge styles are not the left or right cosets of any subgroup.
19. Suppose that a group G has order |G| = 105. List all possible orders of subgroups of G.
20. Suppose that a group G has order |G| = 48. List all possible orders of subgroups of G.
21. Prove that n − 1 ∈ U (n) for all integers n ≥ 3. Apply Lagrange’s Theorem to hn − 1i to deduce that
Euler’s totient function φ(n) is even for all n ≥ 3.
22. Prove or disprove that Z6 ⊕ Z10 has
(a) a subgroup of order 4;
(b) a subgroup isomorphic to Z4 .
23. Suppose that G is a group with |G| = pq, where p and q are primes, not necessarily distinct. Prove
that every proper subgroup of G is cyclic.
24. Let G = GL2 (F5 ).
(a) Use Lagrange’s Theorem to determine all the possible sizes of subgroups of G.
(b) Show that the orders of the following elements are respectively 3, 4, and 5,
0 2 3 0 1 1
A= , and B = , and C = .
2 4 0 3 0 1
(c) Determine the order of AB and BC without performing any matrix calculations.
170 CHAPTER 4. QUOTIENT GROUPS
25. Let G = GL2 (F5 ) and let H be the subgroup of upper triangular matrices. [See Exercise 3.5.22.] Prove
that |G : H| = 6 and find 6 different matrices g1 , g2 , . . . , g6 such that the cosets gi H for i = 1, 2, . . . , 6
are all the left cosets of H.
26. Let p be a prime number. Prove that the subgroups of Zp ⊕ Zp consist of {1}, Zp ⊕ Zp and p + 1
subgroups that are cyclic and of order p.
27. Let G be a group of order 21. (a) Prove that G must have an element of order 3. (b) By the strategy
of Example 4.1.15, can we determine whether G must have an element of order 7?
28. Let G be a group of order 3p, where p is a prime number. Prove that G has an element of order 3.
29. Let G be a group of order pq, where p and q are distinct odd primes such that (p − 1) - (q − 1) and
(q − 1) - (p − 1). Prove that G contains an element of order p and an element of order q.
30. Let G be a group and let H, K ≤ G. Prove that if gcd(|H|, |K|) = 1, then H ∩ K = {1}.
31. Let G be a group and let H be a subgroup with |G : H| = p, a prime number. Prove that if K is a
subgroup of G that strictly contains H, then K = G.
32. Let G be a group of order pqr, where p, q, r are distinct primes. Let A be a subgroup of order pq and
let B be a subgroup of order qr. Prove that AB = G and that |A ∩ B| = q.
33. Use Lagrange’s Theorem applied to U (n) to prove Euler’s Theorem (a generalization of Fermat’s Little
Theorem), which states that if gcd(a, n) = 1 then
34. Show that there exists a subgroup of order d for each divisor d of |S4 | = 24 and give an example of a
subgroup for each divisor.
35. Classification of Groups of Order 2p. Let p be a prime number. This exercise guides the proof that a
group of order 2p is isomorphic to Z2p or Dp . Let G be an arbitrary group with |G| = 2p.
(a) Without using Cauchy’s Theorem, prove that G contains an element a of order 2 and an element
b of order p.
(b) Prove that if ab = ba, then G ∼
= Z2p .
(c) Prove that if a and b do not commute, then aba = b−1 . Deduce in this case that G ∼
= Dp .
36. Let G be a group and let H, K ≤ G. Prove that that HK ≤ G if and only if HK = KH as sets.
4.2
Conjugacy and Normal Subgroups
In the previous section, we discussed left cosets and right cosets in a group G. By considering simple
examples, we found that generally a left coset of a subgroup H is not necessarily equal to a right
coset. However, in Example 4.1.2 we saw that every left coset of hri in D4 is a right coset. This
property, called normal, that subgroups may have turns out to play a vital role for the construction
of quotient groups.
This section studies normal subgroups. Some of the constructions or criteria developed in this
section may at first pass seem unnecessarily complicated if we are simply trying to generalize the
construction employed to create modular arithmetic. The important difference between general
groups and (Z, +) is that the latter is abelian, while groups in general are not. All the difficulty to
construct quotient groups stems from the possible noncommutativity of a group. As we shall see in
Section 4.3, normality of a subgroup is a necessary property to generalize the modular arithmetic
construction.
4.2. CONJUGACY AND NORMAL SUBGROUPS 171
Definition 4.2.1
Let G be a group. A subgroup N ≤ G is called normal if gN = N g for all g ∈ G. If N is
a normal subgroup of G, we write N E G.
In Example 4.1.2, we saw that while hri E D4 , in contrast hsi is not a normal subgroup of D4 .
In notation, we write hsi 5 D4 .
The criterion for a normal subgroup is equivalent to a variety of other conditions on the subgroup.
Before we list these conditions in Theorem 4.2.4, we mention a few results that are immediate from
the definition. The first observation is that every group G has at least two normal subgroups: the
trivial group {1} and itself G. The next proposition gives another sufficient condition for a subgroup
to be normal.
Proposition 4.2.2
Let G be a group (not necessarily finite). If H is a subgroup such that |G : H| = 2, then
H E G.
Proof. By definition, if |G : H| = 2, then H has two left cosets, just as it has two right cosets. Now,
H = 1H is a left coset. Since the collection of left cosets form a partition of G, then G − H is the
other left coset. Similarly, H = H1 is a right coset and, by the same reason as before, G − H is the
other right coset. Hence, every left coset is equal to a right coset, and thus H is a normal subgroup
of G.
Example 4.2.3. From Proposition 4.2.2, we immediately see that hri, hr2 , si, and hr2 , rsi are nor-
mal subgroups of D4 , simply because each of those subgroups has order 4 and |D4 | = 8. 4
Theorem 4.2.4
Let N be a subgroup of G. The following are equivalent:
(1) N E G.
(2) gN g −1 = N for all g ∈ G.
(3) NG (N ) = G.
(4) For all g ∈ G and all n ∈ N , gng −1 ∈ N .
(5) ∼1 and ∼2 as defined in Proposition 4.1.4 are equal relations.
(1)⇐⇒(5): The condition that N is normal in G means that every left coset is a right coset.
By Proposition 4.1.4, this is equivalent to saying that every ∼1 -equivalence class is equal to a ∼2 -
equivalence class. By Proposition 1.3.14, the equivalence relations ∼1 and ∼2 are equal.
We underscore that the condition gN g −1 = N does not imply gng −1 = n for all g ∈ G and all
n ∈ N . It merely implies that the process of operating on the left by g and right by g −1 produces a
bijection on N . We explore this issue more in Section 4.2.2.
In practice, as we explore properties of normal subgroups, the various criteria of Theorem 4.2.4
may be more useful in some contexts than in others. Proposition 4.2.2 illustrated a situation where
the original definition is sufficiently convenient to establish the result. The following two propo-
sitions, which are important in themselves, illustrate situations where a different criteria is more
immediately useful in the proof.
Proposition 4.2.5
If G is abelian, then every subgroup H ≤ G is normal.
This proposition hints at why generalizing the construction of modular arithmetic to all groups
poses some subtleties that were not apparent in modular arithmetic: (Z, +) is abelian. By a similar
reason, we can also conclude the more general proposition.
Proposition 4.2.6
Let G be a group. Any subgroup H in the center Z(G) is a normal subgroup H E G.
Proposition 4.2.7
Let ϕ : G → H be a homomorphism between groups. Then Ker ϕ E G.
This proposition, though easy to prove, leads to some natural and profound consequences. For
example, if n ≥ 3, there exists no homomorphism ϕ : Dn → G, where G is any group, such that
Ker ϕ = hsi. Proposition 4.2.7 establishes some profound restrictions on how homomorphisms can
map from one group to another. In particular, a fiber of a homomorphism ϕ : G → H, i.e., a set
ϕ−1 ({h}), is either the empty set or a coset of Ker ϕ. If ϕ(g) = h, then ϕ−1 ({h}) = g(Ker ϕ). Note
that if G is finite, then all nonempty fibers have the same cardinality as | Ker ϕ|.
When attempting to determine whether a subgroup of a group G is a normal subgroup, using
the original definition of a normal subgroup or Theorem 4.2.4(4) requires a possibly large number
of calculations. For example, using the latter criterion, we would need to make |G| · |N | number
of calculations gng −1 to determine if N E G. However, if a finite group and its subgroup are both
presented by generators, the following theorem provides a quick shortcut.
4.2. CONJUGACY AND NORMAL SUBGROUPS 173
Theorem 4.2.8
Let G be a finite group generated by a subset T . Let N = hSi be the subgroup generated
by the subset S. Then N E G if and only if for all t ∈ T and all s ∈ S, tst−1 ∈ N .
We see that if gSg −1 ⊆ N , then the above product on the right occurs as a product in N and hence
g ∈ NG (N ).
Suppose now that we only know that for all t ∈ T and all s ∈ S, tst−1 ∈ N . Every element g ∈ G
can be written as a product g = t1 t2 · · · tl , for ti ∈ T , possibly with repetitions. Note that since G is
finite, the inverse to any t ∈ T is tn−1 , where |t| = n. We prove that N is normal by induction on l.
By what was said above, if all t ∈ T satisfy tSt−1 ⊆ N , then every T ⊆ NG (N ). Now suppose
that every product of length l − 1 of elements in T is in NG (N ). Consider a product of length l,
namely t1 t2 · · · tl and let n ∈ N .
Example 4.2.9. Consider the group D8 and consider the subgroup H = hr4 , si. We test to see if
H is a normal subgroup in D8 . Notice first that H = {1, r4 , s, sr4 }. We only need to perform four
calculations:
r(r4 )r−1 = r4 sr4 s−1 = r4 rsr−1 = r6 sss−1 = s.
By Theorem 4.2.8, the third calculation rsr−1 = r6 6∈ H shows that H is not a normal subgroup.4
It is important to remark that, in contrast to the subgroup relation ≤ on the set of subgroups
of G, the relation of normal subgroup E is not transitive, and hence is not a partial order on
Sub(G). The easiest illustration comes from the dihedral group D4 . By Proposition 4.2.2, we see
that hr2 , si E D4 and hsi E hr2 , si. However, hsi is not a normal subgroup of D4 . Therefore,
K E H E G 6=⇒ K E G.
One intuitive reason for this property of the relation of normal subgroup is that even if hKh−1 ⊆ K
for all h ∈ H, the condition gKg −1 ⊆ K for all g ∈ G is a stronger condition and might not still
hold.
We can now put into context the terminology of “normalizer” of a subgroup. Recall that
Consequently, NG (H) is the largest subgroup K of G such that H E K. The normalizer gives some
way of measuring how far a subgroup H is from being normal. For all H ≤ G, we have
H ≤ NG (H) ≤ G,
with NG (H) = G if and only if H E G. Intuitively speaking, we can say that H is farthest from
being normal when NG (H) = H.
Exercise 4.1.36 establishes that if H, K ≤ G are two subgroups, then HK is a subgroup of G if
and only if HK = KH as sets.
174 CHAPTER 4. QUOTIENT GROUPS
Corollary 4.2.10
If H ≤ NG (K), then HK is a subgroup of G. In particular, if K E G, then HK is a
subgroup of G for all H ≤ G.
4.2.2 – Conjugacy
The expression in criteria (4) of Theorem 4.2.4 is not new. We encountered it before when discussing
centralizers and normalizers. (See Definitions 3.5.15 and 3.5.17.) We remind the reader of a definition
given in Section 3.5.
Definition 4.2.11
Let G be a group and let g ∈ G.
• If x ∈ G, then the element gxg −1 is called the conjugate of x by g.
• If S ⊆ G is any subset, then the subset gSg −1 is also called the conjugate of S by g.
We have seen the conjugate of a group element in other contexts previously. In Example 3.9.10,
we found a presentation for the Frieze group of a certain pattern. In that example, we noted that
−−→
rotation by π about Q3 is equal to trt−1 , where t corresponds to translation along QQ3 and r is
rotation by π about Q. In linear algebra, one encounters the change of basis formula. If A is the
n × n matrix associated to a linear transformation T : Rn → Rn with respect to a basis B, and if B
is the matrix associated to the same linear transformation, but with respect to the basis B 0 , then
B = M AM −1 ,
where M is the coordinate transition matrix from B to B 0 coordinates. In this latter example, A
and B need not be invertible, but, if T is an isomorphism, then A and B are invertible and the
conjugation B = M AM −1 occurs entirely in the group GLn (R).
The above examples give us the intuitive sense that conjugation corresponds to a change of origin,
a change of basis, or some change of perspective more generally. Consider the conjugation rsr−1 in
the dihedral group Dn . Explicitly, rsr−1 = sr−2 , which is the reflection through the line L0 that is
related to the s-reflection line L by a rotation by r. (See Figure 4.3.)
L0
rsr−1
L
s
Proposition 4.2.12
Let G be a group and define the relation ∼c on G by x ∼c y if y = gxg −1 for some g ∈ G.
Then ∼c is an equivalence relation. The relation ∼c is called the conjugacy relation.
The conjugacy class of an element x ∈ G is the set [x] = {gxg −1 | g ∈ G}. By Proposition 4.2.12
and properties of equivalence relations, the conjugacy classes in G partition G.
Example 4.2.13 (Conjugacy Classes in Sn ). In order to determine all the conjugacy classes in
Sn , we first prove the following claim. For all m-cycles (a1 a2 · · · am ) and for all permutations
σ ∈ Sn ,
σ(a1 a2 · · · am )σ −1 = (σ(a1 ) σ(a2 ) · · · σ(am )). (4.3)
Write τ = σ(a1 a2 · · · am )σ −1 and let i = 1, 2, . . . , m − 1. The permutation τ applied to σ(ai ) gives
If i = m, then similarly we easily see that τ applied to τ (σ(am )) = σ(a1 ). We have seen that τ
permutes the set {σ(a1 ), σ(a2 ), . . . , σ(am )}. If b ∈ {1, 2, . . . , n}, but b ∈/ {σ(a1 ), σ(a2 ), . . . , σ(am )},
then there exists c ∈ {1, 2, . . . , n} − {a1 , a2 , . . . , am } such that b = σ(c). Then
We have calculated where τ sends all the elements and found that τ is given by (4.3).
Now every element ω ∈ Sn is a product of disjoint cycles, ω = τ1 τ2 · · · τm . Furthermore, we can
write
σωσ −1 = (στ1 σ −1 )(στ2 σ −1 ) · · · (στm σ −1 ), (4.4)
where each στi σ −1 is calculated from (4.3).
As a numerical example, only using (4.3) and (4.4) we determine that
Consequently, we can see that if a permutation ω has a given cycle type, then for all σ ∈ Sn ,
σωσ −1 will have the same cycle type. Conversely, if two permutations ω1 , ω2 ∈ Sn have the same
cycle type, then by using (4.3) and (4.4), we can find a σ ∈ Sn such that ω2 = σω1 σ −1 . Thus,
the conjugacy classes in Sn are precisely the sets of permutations that have the same cycle type.
Therefore, the table in Example 3.4.5 gives the conjugacy classes in S6 and gives the cardinality of
each. 4
Conjugacy classes and normal subgroups are closely related in the following way. If N is a normal
subgroup of a group G and x ∈ N , then by Theorem 4.2.4, gxg −1 ∈ N for all g ∈ G. Consequently,
the conjugacy class of x is in N . This leads to the following proposition.
Proposition 4.2.14
A subgroup H ≤ G is normal if and only if it is the union of some conjugacy classes.
Proposition 4.2.15
Let H be any subgroup of a group G. Then for all g ∈ NG (H), the function ψg : H → H
defined by ψg (h) = ghg −1 is an automorphism of H. Furthermore, the association Ψ :
NG (H) → Aut(H) defined by Ψ(g) = ψg is a homomorphism.
This proves that ψg is a homomorphism. It is easy to check that ψg−1 = ψg−1 , so for all g ∈ NG (H),
the function ψg is a bijection and, hence, an automorphism of H.
Now let a, b ∈ NG (H) be arbitrary. Then for all h ∈ H,
Hence, Ψ(ab) = Ψ(a) ◦ Ψ(b), which establishes that Ψ : NG (H) → Aut(H) is a homomorphism.
Definition 4.2.16
A group G is called simple if it contains no normal subgroups besides {1} and itself.
We discuss simple groups in more detail in Section 9.1. By what we said above, Zp is a simple
group whenever p is a prime number. Determining if a group is simple is not always a “simple”
task. We have encountered one other family of groups that is simple, namely An with n ≥ 5.
Exercise 4.2.27 guides the reader to prove that A5 is simple. The proof that An is simple for n ≥ 6
is more challenging, but we will see a proof in Theorem 9.2.7.
SLn (F ) E GLn (F ).
(b) Conclude that the subgroup of Tn (F ) consisting of matrices with 1s down the diagonal is a
normal subgroup.
9. Let n be a positive integer and let G = GLn (R).
(a) Prove that H = GLn (Q) is a subgroup that is not a normal subgroup.
(b) Define the subset K as
K = {A ∈ GLn (R) | det(A) ∈ Q}.
Prove that K is a normal subgroup that contains H.
10. Prove Proposition 4.2.12.
11. Prove Corollary 4.2.10.
12. Let G be a group. Prove that if H ≤ G is the unique subgroup of a given order n, then H E G.
13. Let ϕ : G → H be a group homomorphism and let N E H. Prove that ϕ−1 (N ) E G. [Note that this
generalizes the fact that kernels of homomorphisms are normal subgroups of the domain group.]
14. Let G be a group, H ≤ G and N E G. Prove that H ∩ N E H.
15. Prove that the intersection of two normal subgroups N1 , N2 of a group G is again a normal subgroup
N1 ∩ N2 E G.
16. Let N1 and N2 be normal subgroups is G. Prove that N1 N2 is the join of N1 and N2 and that it is a
normal subgroup in G.
17. Let {Ni }i∈I be a collection of normal subsets of G. Prove that the intersection
\
Ni
i∈I
4.3
Quotient Groups
We began Chapter 4 by proposing to generalize to all groups the construction that led from (Z, +)
to addition in modular arithmetic, (Z/nZ, +). Our discussion sent us far afield, but we never
constructed something analogous to modular arithmetic. As promised in Section 4.2, we are in a
position to generalize the modular arithmetic construction to general groups.
g1 ∼ g2 and h1 ∼ h2 =⇒ g1 h1 ∼ g2 h2 . (4.5)
Let ∼ be an equivalence relation on a group G that behaves well with respect to the operation.
Then on the quotient set G/ ∼, i.e., the set of ∼-equivalence classes, we can define the operation ·
by
def
[x] · [y] = [xy]. (4.6)
By virtue of condition (4.5), this operation is well-defined. We leave as an exercise for the reader to
show that (G/ ∼, ·) is a group. (See Exercise 4.3.13.)
Proposition 4.3.1
Suppose that ∼ is an equivalence relation on G that behaves well with respect to the
operation. Then the equivalence class of 1 is a normal subgroup N . Furthermore, all
equivalence classes of ∼ are of the form gN .
This proposition establishes that an equivalence relation that behaves well with respect to the
group operation defines a normal subgroup. However, the converse is true.
Proposition 4.3.2
Let N be a normal subgroup of a group G. The left cosets of N form a partition of G,
which corresponds to an equivalence relation ∼ that behaves well with respect to the group
operation.
4.3. QUOTIENT GROUPS 179
Proof. We already know that the left cosets of N (which are also right cosets because N is normal)
partition G and that a partition defines a unique equivalence relation on G. (See Proposition 1.3.14.)
Let g1 , g2 be in the same left coset of N and let h1 , h2 be in the same left coset of N . Then g2−1 g1 ∈ N
and h−1
2 h1 ∈ N . Then
h−1 −1
1 g2 g1 h1 ∈ N because g2−1 g1 ∈ N and N E G
=⇒ (h−1 −1 −1
2 h1 )(h1 g2 g1 h1 ) ∈ N because h−1
2 h1 ∈ N
=⇒ h−1 −1
2 g2 g1 h1 ∈ N
=⇒ (g2 h2 )−1 (g1 h1 ) ∈ N
and we conclude that g2 h2 and g1 h1 are in the same left coset of N . Consequently, the equivalence
relation defined by the partition of left cosets of N behaves well with respect to the group operation.
Proposition 4.3.2 can be restated in the following way.
Corollary 4.3.3
Let G be a group and N a normal subgroup. The set of left cosets of N with the operation
def
(xN )(yN ) = (xy)N (4.7)
has the structure of a group with identity N and inverses given by (xN )−1 = x−1 N .
Proof. The expression (4.7) is precisely the property (4.6) of an equivalence relation that is well
behaved with respect to the group operation. Associativity follows immediately from associativity
in G and (4.7). It is obvious that (gN )(N ) = (gN )(1N ) = gN , which shows that N is the identity.
Finally, for all g ∈ G, (4.7) gives (gN )(g −1 N ) = (gg −1 )N = N so that gN has an inverse (gN )−1 =
g −1 N .
Definition 4.3.4
Let G be a group and let N be a normal subgroup. The group defined as the set of left
cosets (which are the same as right cosets since N E G) with the operation defined in
Corollary 4.3.3, is called the quotient group of G by N , and is denoted by G/N .
The importance of Proposition 4.3.1 is that a quotient group G/N , where N E G, is the only
manner in which any quotient set G/ ∼ of G can be made into a group with an operation inherited
from G via (4.6).
D4
2 3
1 r r r s sr sr2 sr3 D4 /hri
1 1 r r2 r3 s sr sr2 sr3 1̄ s̄
r r r2 r3 1 sr3 s sr sr2
1̄ 1̄ s̄
r2 r2 r3 1 r sr2 sr3 s sr
r3 r3 1 r r2 sr sr2 sr3 s s̄ s̄ 1̄
s s sr sr2 sr3 1 r r2 r3
sr sr sr2 sr3 s r3 1 r r2
sr2 sr2 sr3 s sr r2 r3 1 r
sr3 sr3 s sr sr2 r r2 r3 1
· rotation reflection
rotation rotation reflection
reflection reflection rotation
So every rotation composed with a rotation is a rotation; every rotation composed with a reflec-
tion is a reflection, and so forth. 4
As illustrated in the previous two examples, it is not uncommon to mimic the notation used in
modular arithmetic and denote a coset gN in the quotient group G/N by ḡ. In modular arithmetic,
the modulus is understood by context. Similarly, when we use this notation ḡ, the normal subgroup
is understood by context.
Example 4.3.7. As another example, consider the subgroup N = h−1i in Q8 . By Proposition 4.2.6,
since N is the center of Q8 , it is a normal subgroup. The elements in the quotient group Q8 /N are
{1̄, ī, j̄, k̄}. It is easy to see that ī2 = −1 = 1̄ and similarly for j̄ and k̄. Hence, all the nonidentity
elements have order 2. Consequently, we can conclude that Q8 /N ∼ = Z2 ⊕ Z2 . 4
G/hyi = {1, x, x2 } ∼
= Z3 . 4
Definition 4.3.9
Let N E G. The function π : G → G/N defined by π(g) = gN is called the canonical
projection of G onto G/N .
By the definition of the operation on cosets in (4.7), the canonical projection is a homomorphism.
In Proposition 4.2.7, we saw that kernels of homomorphisms are normal subgroups; the converse is
in fact true, which leads to the following proposition.
4.3. QUOTIENT GROUPS 181
Proposition 4.3.10
A subgroup N of G is normal if and only if it is the kernel of some homomorphism.
Proof. This follows from Proposition 4.2.7 and the fact that N is the kernel of the canonical projec-
tion π : G → G/N .
By the result of Exercise 3.7.10, this proposition implies that for all g ∈ G, the order of gN in
G/N divides the order of g in G. We can prove this same result in another fashion and obtain more
precise information. By Exercise 4.3.14,
|hgiN |
|gN | = .
|N |
By Proposition 4.1.16, in G we have
|g| |N |
|hgiN | = ,
|hgi ∩ N |
which implies that
|hgiN |
|g| = |hgi ∩ N | = |hgi ∩ N | · |gN |.
|N |
The subgroup order |hgi ∩ N | is the divisibility factor between |gN | and |g|.
As a final set of examples of quotient groups, by Proposition 4.2.6, Z(G) E G for all groups G. In
Exercise 4.3.21, the reader is asked to prove the important result that if G/Z(G) is cyclic, then G is
abelian. In some intuitive sense, the quotient group G/Z(G) “removes” any elements that commute
with everything else. This intuitive manner of thinking may be misleading because the center of
the quotient group G/Z(G) is not necessarily trivial (Exercise 4.3.20). Nonetheless, given a group
G, the quotient group G/Z(G) often tells us something important about the group G. As specific
examples, we mention in the so-called projective linear groups.
Example 4.3.11 (Projective Linear Groups). Suppose that F is C, R, Q or Fp , where p is
prime. In Example 3.5.14, we proved that the center of GLn (F ) consists of matrices of the form
aI, where a ∈ F × = F − {0}. A similar result still holds for the center of SLn (F ), except not all
diagonal matrices of the form aI are in SLn (F ).
The projective general linear group of order n is
G̃i = {(1, 1, . . . , 1, gi , 1, . . . , 1) | gi ∈ Gi } ⊆ G1 ⊕ G2 ⊕ · · · ⊕ Gn
is isomorphic to Gi and is a normal subgroup of G. By the result of Exercise 4.2.16, the product set
of normal subgroups is the join of the subgroups. In this situation, as a join of subgroups,
Furthermore,
G1 ⊕ G2 ⊕ · · · ⊕ Gn /G̃i ∼
= G1 ⊕ · · · ⊕ Gi−1 ⊕ G
ci ⊕ Gi+1 ⊕ · · · ⊕ Gn ,
where the Gci notation indicates that the corresponding term is omitted.
In the above discussion, we assumed that we started with a collection of groups, constructed
the direct sum group, and studied some of its properties. In contrast, suppose that we encounter a
group G either from some natural context, as a quotient group, as a presentation or by any other
means. It is possible that G is isomorphic to a direct sum of groups G1 ⊕ G2 ⊕ · · · ⊕ Gn . If it is,
then these groups Gi would be isomorphic to subgroups of G and possess the properties mentioned
above. The following theorem states when and how a group G may be isomorphic to a direct sum
of its own subgroups.
Then G ∼
= N1 ⊕ N2 ⊕ · · · ⊕ Nk .
Proof. First, we show that for any indices i 6= j, the elements in Ni commute with the elements in
Nj . Let ni ∈ Ni and nj ∈ Nj and consider the element n−1 −1
i nj ni nj . This element is called the
commutator of ni and nj and is denoted by [ni , nj ]. (See Exercise 4.3.24.) Since n−1j ∈ Nj E G,
−1 −1 −1
then ni nj ni ∈ Nj and [ni , nj ] ∈ Nj . Similarly, Ni E G, so nj ni nj ∈ Ni so again [ni , nj ] ∈ Ni .
Thus, [ni , nj ] ∈ Ni ∩ Nj . However, by Condition 2, Ni ∩ Nj = {1}, so
n−1 −1 −1
i nj ni nj = 1 =⇒ nj ni nj = ni =⇒ ni nj = nj ni .
Since n−1 0 −1 0 0
i ni ∈ Ni , by our previous remark, we deduce that ni ni = 1 hence ni = ni for all
i = 1, 2, . . . , k. Hence, g can be expressed uniquely as a product of elements in the subgroups
N1 , N2 , . . . , Nk .
Finally, consider the function ψ : G → N1 ⊕ N2 ⊕ · · · ⊕ Nk defined by
This is well-defined precisely because g can be expressed uniquely as a product of elements in the
Ni . Let g = n1 n2 · · · nk and h = n01 n02 · · · n0k be arbitrary elements in G, where ni , n0i ∈ Ni . From
the first claim in this proof, we have
It is important to note that gh is still not necessarily hg because ni and n0i need not commute in
Ni . Then
ψ(gh) = ψ(n1 n01 n2 n02 · · · nk n0k ) = (n1 n01 , n2 n02 , · · · , nk n0k ) = ψ(g)ψ(h).
Hence, ψ is a homomorphism. It is surjective since for all (n1 , n2 , . . . , nk ) ∈ N1 ⊕ N2 ⊕ · · · ⊕ Nk , we
have ψ(n1 n2 · · · nk ) = (n1 , n2 , . . . , nk ). It is also injective, since Ker ψ = {1}. Consequently, ψ is an
isomorphism and the theorem follows.
Example 4.3.13. As an example of the Direct Sum Decomposition Theorem, consider the group
U (35). This is an abelian group with φ(35) = 24 elements. Again, recall that in any abelian group,
all subgroups are normal. Consider the subgroups N1 = h6i and N2 = h2i. For elements, we have
U (35) ∼
= h6i ⊕ h2i ∼
= Z2 ⊕ Z12 .
The group U (35) can be decomposed even further. Consider the three subgroups h6i, h8i, and
h16i. We leave it to the reader (Exercise 4.3.26) to verify that these subgroups satisfy the conditions
of the Direct Sum Decomposition Theorem. Then, we conclude that
U (35) ∼
= h6i ⊕ h16i ⊕ h8i ∼
= Z2 ⊕ Z3 ⊕ Z4 .
Note that since gcd(3, 4) = 1, then Z12 ∼
= Z4 ⊕ Z3 , so the above two decompositions are equivalent.
In this latter decomposition, depicted visually in Figure 4.5, we could write
a b c
U (35) = {6 16 8 | a = 0, 1; b = 0, 1, 2; c = 0, 1, 2, 3}. 4
31
3
24
16 2 17
11 26
18 33
4 19
161 32 12
16 6
23 13
9 34
160 2 27
1
8
29 61
8 0 22
81
b 82
83 60
a
11. Prove that if G is generated by a subset {g1 , g2 , . . . , gn }, then the quotient group G/N is generated
by {ḡ1 , ḡ2 , . . . , ḡn }.
12. Consider the dihedral group Dn and let d be a divisor of n.
(a) Prove that hrd i E Dn .
∼ Dd .
(b) Show that Dn /hrd i =
(c) Give a geometric interpretation of this last result. (What information is conflated when taking
the quotient group?)
13. Let G be a group and let ∼ be an equivalence relation on G that behaves well with respect to the
operation. Prove that (G/ ∼, ·) is a group.
14. Let N be a normal subgroup of a group G and write g for the coset gN in the quotient group G/N .
(a) Show that for all g ∈ G, if the order of g is finite, then |g| is the least positive integer k such
that g k ∈ N .
(b) Deduce that the element order |g| is equal to |hgiN |/|N |.
15. Consider the group called G, which is given in generators and relations as
G = hx, y | x4 = y 3 = 1, x−1 yx = y −1 i.
G = hx, y | x4 = y 5 = 1, x−1 yx = y 2 i.
G = hx, y | x2 = y 8 = 1, yx = xy 5 i.
has order 12. Determine, with proof, to which group in the table of Section A.2.1 it is isomorphic.
19. Consider the group GL2 (F3 ), i.e. the general linear group of 2 by 2 invertible matrices with elements
in modular arithmetic modulo 3. (By the result of Exercise 3.2.32, this group has 48 elements.) We
consider the group G = PGL2 (F3 ), the projective general linear group of order 2 on F3 .
(a) Prove that |G| = 24 and show that G and S4 have the same number of elements of any given
order.
(b) Show explicitly that G ∼= S4 . [Showing that they have the same number of elements of a given
order is evidence but not sufficient for a proof of the isomorphism.]
20. Find an example of a group G in which the center of G/Z(G) is not trivial.
21. Prove that if G/Z(G) is cyclic, then G is abelian. Give an example to show that G is not necessarily
abelian if we only assume that G/Z(G) is abelian.
22. Prove that if |G| = pq, where p and q are two primes, not necessarily distinct, then G is either abelian
or Z(G) = {1}. [Hint: Use Exercise 4.3.21.]
23. Let N be a normal subgroup of a finite group G and let g ∈ G. Prove that if gcd(|g|, |G/N |) = 1, then
g ∈ N.
24. Let G be a group. The commutator subgroup, denoted G0 , is defined as the subgroup generated by all
products x−1 y −1 xy, for any x, y ∈ G. In other words,
G0 = hx−1 y −1 xy | x, y ∈ Gi.
25. Let A and B be two groups and let G = A ⊕ B. The subgroup A × {1} ≤ G is isomorphic to A. Prove
that A × {1} E G and that G/(A × {1}) ∼
= B.
26. In Example 4.3.13, prove that the subgroups h6i, h8i, and h16i satisfy the conditions of the Direct
Sum Decomposition Theorem.
27. Use the direct sum decomposition to show that U (100) ∼
= Z20 ⊕ Z2 ∼
= Z5 ⊕ Z4 ⊕ Z2 .
28. Let p be a prime number and let k be a positive integer. Prove that Zpk is not isomorphic to the
direct product of any other groups.
29. Prove that Q8 is not isomorphic to the direct product of any other groups.
4.4
Isomorphism Theorems
As we saw in the previous section, properties of quotient groups of a group G may imply relationships
between certain subgroups of G. Much more can be said, however. In this section, we discuss some
theorems, the four Isomorphism Theorems, that describe further structure within a group.
ϕ : G/ Ker ϕ −→ Im ϕ = ϕ(G)
g(Ker ϕ) 7−→ ϕ(g).
For any g ∈ G, the element g is only one of many possible representatives of the coset g Ker ϕ.
Thus, in order to verify that this is even a function, we first need to check that ϕ(g(Ker ϕ)) gives the
same output for every representative of g(Ker ϕ). Suppose gh is any element in the coset g Ker ϕ.
Then ϕ(gh) = ϕ(g)ϕ(h) = ϕ(g). Thus, the choice of representative has no effect on the stated
output ϕ(g). This simply means that ϕ is well-defined as a function.
However, it is easy to check that ϕ is a homomorphism. Furthermore, ϕ is surjective since every
element ϕ(g) in ϕ(G) is obtained as the output ϕ(g Ker ϕ). To prove injectivity, let g1 , g2 ∈ G. Then
This proves injectivity of ϕ. Consequently, ϕ is bijective and thus an isomorphism and the theorem
follows.
4.4. ISOMORPHISM THEOREMS 187
The First Isomorphism Theorem shows that the image of an isomorphism, as a subgroup of the
codomain, must already exist within the structure of the domain group, not as a subgroup but as
a quotient group. This theorem also shows how any homomorphism ϕ : G → H can be factored
into the surjective (canonical projection) map π : G → G/ Ker ϕ and an injective homomorphism
ϕ : G/ Ker ϕ → H so that ϕ = ϕ ◦ π. We often depict this relationship by the following commutative
diagram.
π
G G/ Ker ϕ
ϕ
ϕ
The First Isomorphism Theorem leads to many consequences about groups, some elementary
and some more profound. One implication is that if ϕ : G → H is an injective homomorphism, then
Ker ϕ = {1} and so G/ Ker ϕ = G ∼ = ϕ(G). In this situation, we sometimes say that G is embedded
in H or that ϕ is an embedding of G into H because ϕ maps G into an exact copy of itself as a
subgroup of H.
As another example, suppose that G is a simple group. By definition, it contains no normal
subgroups besides {1} and itself. Hence, by the First Isomorphism Theorem, any homomorphism ϕ
from G is either injective (Ker ϕ = {1}) or trivial (ϕ(G) = {1}). Thus, under any homomorphism,
a simple group is either an embedding into any other group or its image is trivial.
Combining the First Isomorphism Theorem with Lagrange’s Theorem, we are able to deduce the
following not obvious corollary.
Corollary 4.4.2
Let G and H be finite groups with gcd(|G|, |H|) = 1. Then the only homomorphism
ϕ : G → H is the trivial homomorphism, ϕ(g) = 1H .
Proof. Since ϕ(G) ≤ H, then by Lagrange’s Theorem |ϕ(G)| divides |H|. By the First Isomor-
phism Theorem, |ϕ(G)| = |G|/| Ker ϕ|. Consequently |ϕ(G)| divides |G|. Hence, |ϕ(G)| divides
gcd(|G|, |H|) = 1, so |ϕ(G)| = 1. The only subgroup of H that has only 1 element is {1H }.
The First Isomorphism Theorem leads to many other more subtle results in group theory. The
following Normalizer-Centralizer Theorem is an immediate consequence but is important in its own
right. In future sections, we will see how this theorem implies more subtle constraints on the internal
structure of a group, leading to consequences for the classification of groups.
Proof. By Proposition 4.2.15, the function Ψ : NG (H) → Aut(H), defined by Ψ(g) = ψg , where
ψg : N → N with ψg (n) = gng −1 is a homomorphism. The image subgroup Ψ(NG (H)) ≤ Aut(H)
could be strictly contained in Aut(H).
Now the kernel of Ψ is precisely
By the First Isomorphism Theorem, we deduce that NG (H)/CG (H) is isomorphic Ψ(NG (H)), which
is a subgroup of Aut(H).
188 CHAPTER 4. QUOTIENT GROUPS
As a general hint to the reader, if someone ever asks for a proof that G/N ∼
= H, where G and
H are groups with N E G, then there must exist a surjective homomorphism ϕ : G → H such that
Ker ϕ = N . Hence, when attempting to prove such results, the First Isomorphism Theorem offers
the strategy of looking for an appropriate homomorphism.
AB/B ∼
= A/A ∩ B.
Proof. Since A ≤ NG (B), then A normalizes B and AB, which is also a subgroup of G, normalizes
B. Thus, B E AB.
Define the function φ : A → AB/B by φ(a) = aB. This is a homomorphism precisely because
the group operation in AB/B is well-defined:
Clearly φ is surjective so φ(A) = AB/B, but we would like to determine the kernel. Now φ(a) = 1B
if and only if aB = 1B if and only if a ∈ B. This means that Ker φ = A ∩ B. Thus, A ∩ B E A and
by the First Isomorphism Theorem
A/A ∩ B = AB/B.
The Second Isomorphism Theorem is also called the Diamond Isomorphism Theorem because it
concerns the relative sides of particular “diamonds” inside the lattice structure of a group.
AB
// /
A B
/ //
A∩B
{1}
In the above diagram, assuming A ≤ NG (B), then the opposite /-sides not only have the same
index, |AB : B| = |A : A ∩ B|, but correspond to normal subgroups and satisfy AB/B ∼= A/A ∩ B.
On the other hand, if B ≤ NG (A), then the opposite //-sides satisfy the same property. Finally,
in the special case that A and B are both normal subgroups of G, then the Second Isomorphism
Theorem applies to both pairs of opposite sides of the diamond.
4.4. ISOMORPHISM THEOREMS 189
8 8 8 120◦
7 7 7
5 5 5
6 6 6
4 4 4
3 3 3
1 1 1
2 2 2
Proof. Consider the mapping ϕ : G/H → G/K with ϕ(gH) = gK. We first need to show that it
is a well-defined function. Suppose that g1 H = g2 H. Then g2−1 g1 ∈ H. But since H ≤ K, then
g2−1 g1 ∈ H implies that g2−1 g1 ∈ K. Thus, g1 K = g2 K. Hence, ϕ is well-defined. By properties of
cosets, ϕ is also a surjective homomorphism. The kernel of ϕ is
Now by the First Isomorphism Theorem, we deduce that K/H E G/H and that
(G/H)/(K/H) ∼
= G/K.
Example 4.4.6. A simple example of the Third Isomorphism Theorem concerns subgroups of G =
Z. Let H = 48Z and K = 8Z. We have H ≤ K and, since G is abelian, both H and K are normal
subgroups. We have
where in K/H, the element 8 is in Z/48Z. The Third Isomorphism Theorem gives
(Z/48Z)/(8Z/48Z) ∼
= Z/8Z. 4
Example 4.4.7. Let G be the group of symmetries in R3 on the vertices of the cube that preserve
the cube structure. This group is similar to the dihedral group on the square but more complicated
because the rotations in the plane correspond to rigid motions of the cube. Furthermore, this group
is strictly larger than the group of rigid motions of a cube since it also includes reflections through
planes.
We can see that |G| = 48 by reasoning in the following way. Under a symmetry σ of the cube,
the vertex 1 may be mapped to any of the eight other vertices. Then, under a cube symmetry, the
three edges that are incident with vertex 1 may be mapped in any way to the three edges that are
incident with σ(1). There are 3! = 6 possibilities for this mapping of incident edges. Once we know
190 CHAPTER 4. QUOTIENT GROUPS
where 1 goes and where its incident edges go, the rest of the mapping of the cube is completely
determined. Hence, |G| = 8 × 6 = 48.
By Exercise 3.7.36, the group of rigid motions R, which includes no reflections through planes,
is isomorphic to S4 as the permutation group on the 4 maximal diagonals. The group R of rigid
motions of a cube is only a subgroup of G. Since |G : R| = 2, then R E G. Figure 4.6 illustrates three
symmetries of a cube, two reflections through a plane, and one rotation about a maximal diagonal.
The reflections through planes are not rigid motions. The reflection through a maximal diagonal is
a rigid motion but there are many other rigid motions.
Consider also the subgroup H of G generated by the rotations by multiples of 120◦ about the
maximal diagonals. It is not hard to see that all symmetries of the cube are generated by reflections
through planes. If f is a reflection through a plane and r is a rotation by 2π/3 through a maximal
diagonal L, then f rf −1 is the rotation through the maximal diagonal L0 = f (L), i.e., obtained by
reflecting L via f . From Theorem 4.2.8, we conclude that H E G. Note that if we view R as S4
via how it permutes the maximal diagonals of the cube, H is generated by 3-cycles and hence H
corresponds to the subgroup A4 in R.
This example gives a situation in which H ≤ R ≤ G and both H and R are normal subgroups of G.
We can interpret the quotient group G/R ∼ = Z2 as carrying the information of whether a symmetry
is a rigid motion (orientation preserving) or a reflected rigid motion (orientation reversing). The
quotient group R/H ∼ = Z2 carries information about whether a rigid motion is odd or even in
the identification of R with S4 . Finally, G/H ∼ = Z2 ⊕ Z2 contains information about both even
versus odd and orientation-preserving versus orientation-reversing. Intuitively speaking, the Third
Isomorphism Theorem, which states that
(G/H)/(R/H) ∼
= G/R,
says that this information about orientation is contained without loss of structure in G/H. 4
Proof. (Parts (2) through (5) are left as exercises for the reader. See Exercise 4.4.13.)
For part (1), suppose first that A ≤ B. Then for all gN ∈ A/N , we have g ∈ A ⊆ B and
hence gN ∈ B/N . Conversely, suppose that A/N ≤ B/N . Let a ∈ A. Then by the hypothesis,
aN ∈ B/N . If aN = bN for some b ∈ B, then b−1 a ∈ N . But N ≤ B so b−1 a = b0 and hence
a = bb0 . Thus, a ∈ B. Since a was arbitrary, A ≤ B.
The Fourth Isomorphism Theorem is also called the Lattice Isomorphism Theorem because it
states that the lattice of a quotient group G/N can be found from the lattice of G by ignoring all
vertices and edges that are not above N in the lattice of G. Furthermore, part 2 indicates that if we
4.4. ISOMORPHISM THEOREMS 191
labeled each edge in the lattice of subgroups with the index between groups, then even these indices
are preserved when passing to the quotient group.
Example 4.4.9 (Quaternion Group). Consider the quaternion group Q8 . Note that Z(Q8 ) =
h−1i and hence this is a normal subgroup. The following lattice diagram of Q8 depicts with double
edges all parts of the diagram above h−1i. Hence, according to the Fourth Isomorphism Theorem,
the lattice of Q8 /h−1i is the sublattice involving double edges.
Q8
h−1i
{1}
4
The Fourth Isomorphism Theorem parallels how lattices interact with subgroups. The lattice of
a subgroup H of a group G can be found from the lattice of G by ignoring all vertices and edges
that are not below H in the lattice of G. If this similarity tempts someone to guess that a group
might be completely determined from knowing G/N and N , the reader should remain aware that
this is not the case. We do not need to look further than G = D3 with N = hri as compared to Z6
with N = hz 2 i for an example. In both cases G/N ∼= Z2 and N ∼ = Z3 .
Proof. Suppose that G can be generated by the elements g1 , g2 , . . . , gn ∈ G. Consider the free group
on n symbols F (x1 , x2 , . . . , xn ). Since the symbols x1 , x2 , . . . , xn have no relations among them, by
Theorem 3.8.8, there exists a unique homomorphism ϕ : F (x1 , x2 , . . . , xn ) → G such that ϕ(xi ) = gi
for all i = 1, 2, . . . , n.
The homomorphism ϕ is surjective, since Im ϕ = hg1 , g2 , . . . , gn i = G. Hence, by the First
Isomorphism Theorem,
G∼ = F (x1 , x2 , . . . , xn )/ Ker ϕ, (4.8)
and the theorem follows.
More can be said about Ker ϕ in the above theorem and its connection to a presentation of G.
Suppose that G can be presented by
G = hg1 , g2 , . . . , gn | w1 = 1 w2 = 1 · · · wm = 1i (4.9)
where each wj is a word in the generators of G. (It is always possible to write the relations in a
presentation in this way. For example, the relation rs = sr−1 in a dihedral group can be written as
rsrs = 1.)
Let H be another group generated by elements {h1 , h2 , . . . , hn }. Suppose that there exists a
homomorphism ψ : G → H such that ψ(gi ) = hi . Since H is generated by {h1 , h2 , . . . , hn },
192 CHAPTER 4. QUOTIENT GROUPS
then ψ is surjective. Furthermore, the generators hi satisfy the relations given by ψ(wj ) = 1 for
j = 1, 2, . . . , m. By the First Isomorphism Theorem, H ∼
= G/ Ker ψ. If Ker ψ is trivial, then G ∼
= H.
If Ker ψ is not trivial, then for each element u ∈ Ker ψ, expressed as a word in the generators
α`
u = uα1 α2
1 u2 · · · u` with ui ∈ {g1 , g2 , . . . , gn },
leads to a relation ψ(u1 )α1 ψ(u2 )α2 · · · ψ(u` )α` = 1 in the generators of H. By definition of a presenta-
tion, this relation on the generators of H does not follow from the relations wj = 1 for j = 1, 2, . . . , m.
Consequently, we can now understand a group defined by the presentation (4.9) as the largest group
G generated by n elements satisfying the relations wj = 1 for j = 1, 2, . . . , m, where by “largest”
we mean that for any other group H generated by n elements, which also satisfy the corresponding
relations, there exists a surjective homomorphism G → H.
Return now to the proof of Theorem 4.4.10. Let w̃j be the equivalent word in F (x1 , x2 , . . . , xn )
where in each wj the symbol gi is replaced with xi . Because of the isomorphism in (4.8) and wj = 1
in G, we have w̃j ∈ Ker ϕ for all j. Hence,
However, Ker ϕ is a normal subgroup of F (x1 , x2 , . . . , xn ) whereas, in general, hw̃1 , w̃2 , . . . , w̃m i is
not a normal subgroup of F (x1 , x2 , . . . , xn ).
Now suppose that there exists a normal subgroup N E F (x1 , x2 , . . . , xn ) such that
is a surjective homomorphism with kernel (Ker ϕ)/N . In (Ker ϕ)/N , the generators xi N satisfy
the same relations as the generators of G. Consequently, by our remark above, there also exists
a surjective homomorphism ψ : G → F (x1 , x2 , . . . , xn )/N . Since π and ψ are both surjective
and π ◦ ψ is the identity on G, both homomorphisms are bijections and we deduce that G ∼ =
F (x1 , x2 , . . . , xn )/N so N = Ker ϕ. We conclude that Ker ϕ is the smallest (by inclusion) normal
subgroup of F (x1 , x2 , . . . , xn ) that contains hw̃1 , w̃2 , . . . , w̃m i. We say that Ker ϕ is the normal
closure of hw̃1 , w̃2 , . . . , w̃m i in F (x1 , x2 , . . . , xn ).
Consequently, a group given by a presentation is a quotient group of F (x1 , x2 , . . . , xn ) by
this normal closure of the subgroup generated by the corresponding relations hw̃1 , w̃2 , . . . , w̃m i in
F (x1 , x2 , . . . , xn ).
9. Let F = Q, R, C, or Fp where p is prime. Let T2 (F ) be the set of 2 × 2 upper triangular matrices with
a nonzero determinant. We consider T2 (F ) as a group with the operation of multiplication.
n 1 b o
(a) Prove that U2 (F ) = | b ∈ F is a normal subgroup that is isomorphic to (F, +).
0 1
(b) Prove that the corresponding quotient group is T2 (F )/U2 (F ) = ∼ U (F ) ⊕ U (F ), where U (F ) is
the group of elements with multiplicative inverses in F equipped with multiplication.
10. Let G be a group and let N E G such that |G| and | Aut(N )| are relatively prime. Then N ≤ Z(G).
[Hint: Use Proposition 4.2.15.]
11. Suppose that H and K are distinct subgroups of G, each of index 2. Prove that H ∩ K is a normal
subgroup of G and that G/(H ∩ K) ∼= Z2 ⊕ Z2 .
12. Let p be a prime number and suppose that ordp (|G|) = a. Assume P ≤ G has order pa and let N E G
with ordp (|N |) = b. Prove that |P ∩ N | = pb and |P N/N | = pa−b .
13. Prove all parts of the Fourth Isomorphism Theorem.
14. Let G be the group of isometries of Rn . Prove that the subgroup D of direct isometries is a normal
subgroup and that G/D ∼ = Z2 . [See Definitions 3.9.5 and 3.9.6.]
4.5
Fundamental Theorem of Finitely Generated
Abelian Groups
Early on in the study of groups, we discussed classifications theorems, which are theorems that
list all possible groups with some specific property. The First Isomorphism Theorem leads to the
Fundamental Theorem of Finitely Generated Abelian Groups (abbreviated by FTFGAG), which,
among other things, provides a complete classification of all finite abelian groups. The proof begins
with a study of free abelian groups.
Definition 4.5.1
A subset X ⊆ G of an abelian group is called linearly independent if for every finite subset
{x1 , x2 , . . . , xr } ⊆ X, linear combinations satisfy
c1 x1 + c2 x2 + · · · + cr xr = 0 =⇒ c1 = c2 = · · · = cr = 0. (4.10)
The cyclic group Z has {1} as a basis. The direct sum Z ⊕ Z has {(1, 0), (0, 1)} as a basis
since every element (m, n) can be written as m(1, 0) + n(0, 1). However, {(3, 1), (1, 0)} is a basis
as well because (3, 1) − 3(1, 0) = (0, 1), so again h(3, 1), (1, 0)i = Z ⊕ Z and the set is also linearly
independent. In contrast, Z/10Z does not have a basis since for all x ∈ Z/10Z, we have 10x = 0
194 CHAPTER 4. QUOTIENT GROUPS
and 10 6= 0 in Z. Note that a basis cannot contain the identity 0 since then 1 · 0 = 0 is a nontrivial
linear combination of the basis elements that gives 0.
Definition 4.5.2
An abelian group (G, +) is called a free abelian group if it has a basis.
In particular, Z, Z ⊕ Z, and more generally Zr for a positive integer r are free abelian groups. A
free abelian group could have an infinite basis. For example, Z[x] is a free abelian group with basis
{1, x, x2 , . . .} because every polynomial in Z[x] is a (finite) linear combination of the powers of the
variable x. If a free abelian group has an infinite basis, every element must still be a finite linear
combination of basis elements.
Proposition 4.5.3
Let (G, +) be a free abelian group with a basis X. Every element g ∈ G can be expressed
uniquely as a linear combination of elements in X.
are two linear combination expressions of the element g. By allowing some ci and some c0j to be 0,
and by taking the union {x1 , x2 , . . . , xr , x01 , x02 , . . . , x0s }, we can assume that r = s and that xi = x0i
for i = 1, 2, . . . , r. Then subtracting these two expressions, gives
Since the set of elements {x1 , x2 , . . . , xr } is linearly independent, ci − c0i = 0 so ci = c0i for all i.
Hence, there is a unique expression for g as a linear combination of elements in X.
Example 4.5.4. The group (Q, +) is not a free abelian group. Assume that Q does have a basis
X. Assume that X contains at least two elements nonzero ab and dc . Then
a c
(−bc) + (da) = −ac + ac = 0.
b d
This contradicts the condition of linear independence. Now assume that X contains only one element
a a
b 6= 0. Then X does not generate Q because the element 2b ∈ / hXi. We conclude by contradiction
that Q does not have a basis. 4
Theorem 4.5.5
Let G be a nonzero free abelian group with a basis of r elements. Then G is isomorphic to
Z ⊕ Z ⊕ · · · ⊕ Z = Zr .
ϕ(c1 , c2 , . . . , cr ) = c1 x1 + c2 x2 + · · · + cr xr .
Ker ϕ = {(c1 , c2 , . . . , cr ) ∈ Zr | c1 x1 + c2 x2 + · · · + cr xr = 0}
= {(0, 0, . . . , 0)},
Proposition 4.5.6
Let G be a finitely generated free abelian group. Then every basis of G has the same
number of elements.
Proof. Suppose that G has a basis with r elements. Then G is isomorphic to Zr . The subgroup
2G = {g + g | g ∈ G} is isomorphic to (2Z)r so by a generalization of Exercise 4.4.7,
Thus, |G/2G| = 2r . Assume that G also has a finite basis with s 6= r elements. Then |G/2G| =
2s 6= 2r a contradiction.
We must also prove that G cannot also have an infinite basis. Assume that G does have an infinite
basis X. Let x1 , x2 ∈ X. Assume x1 = x2 in G/2G, then x1 − x2 ∈ 2G, so x1 − x2 is a finite linear
combination of elements in X (with even coefficients). In particular, X is not a linearly independent
set, which contradicts that X is a basis. Thus, in the quotient group G/2G, the elements {x | x ∈ X}
are all distinct and hence G/2G is an infinite group. This contradicts the fact that |G/2G| = 2r .
Consequently, if G has a basis of r elements, then every other basis is finite and has r elements.
Definition 4.5.7
If G is a finitely generated free abelian group, then the common number r of elements in a
basis is called the rank . The rank is also called the Betti number of G and is denoted by
β(G).
In our efforts to classify all finitely generated groups, we introduced free groups as a stepping
stone to the general classification theorem (Theorem 4.5.11). We are still faced with a few questions.
As remarked earlier, a free abelian group has no elements of finite order; however, our theorems to
this point do not establish the converse, namely whether a group with no elements of finite order
is free. Furthermore, Theorem 4.5.5 does not tell us much about the possible subgroups of a free
group and this will become a key ingredient in what follows.
Lemma 4.5.8
Let X = {x1 , x2 , . . . , xr } be a basis of a free abelian group G. Let i be an index 1 ≤ i ≤ r
with i 6= j and let t ∈ Z. Then
is also a basis of G.
Theorem 4.5.9
Let G be a nonzero free abelian group of finite rank s and let H ≤ G be a nontrivial
subgroup. Then H is a free abelian group of rank t ≤ s. There exists a basis {x1 , x2 , . . . , xs }
for G and positive integers n1 , n2 , . . . , nt , where ni divides ni+1 for all 1 ≤ i ≤ t − 1 such
that {n1 x1 , n2 x2 , . . . , nt xt } is a basis of H.
Proof. We prove the theorem by starting from a basis of G and repeatedly adjusting it using
Lemma 4.5.8.
By the well-ordering of the integers, there exists a minimum value n1 in the set
{c1 ∈ N∗ | c1 y1 + c2 y2 + · · · + cs ys ∈ H for any basis {y1 , y2 , . . . , ys } of G}.
We emphasize that, in the above set, we consider all possible bases {y1 , y2 , . . . , ys } of G. We write
the element that gives this minimum value as z1 = n1 y1 + c2 y2 + · · · + cs ys . By integer division,
for all i ≥ 2, we can write ci = n1 qi + ri with 0 ≤ ri < n1 . Set x1 = y1 + q2 y2 + · · · + qs ys . By
Lemma 4.5.8, {x1 , y2 , . . . , ys } is a basis of G. Furthermore,
z1 = n1 x1 + r2 y2 + · · · + rs ys .
However, since n1 is the least positive coefficient that occurs in any linear combination over any
basis of G and, since 0 ≤ ri < n1 , we have r2 = · · · = rs = 0. So, in fact z1 = n1 x1 .
If {n1 x1 } generates H then we are done and {n1 x1 } is a basis of H. If not, then there is a least
positive value of c2 for linear combinations
z2 = a1 x1 + c2 y2 + · · · + cs ys ∈ H
where y2 , . . . , ys ∈ G such that {x1 , y2 , . . . , ys } is a basis of G. Call this least positive integer n2 .
Note that we must have n1 |a1 , because otherwise, since n1 x1 ∈ H by subtracting a suitable multiple
of n1 x1 from a1 x1 + c2 y2 + · · · + cs ys we would obtain a linear combination of {x1 , y2 , . . . , ys } that
is in H and has a lesser positive coefficient for x1 , which would contradict the minimality of n1 .
Again, if we take the integer division of ci by n2 , ci = n2 qi + ri with 0 ≤ ri < n2 for all i ≥ 3, then
z2 = a1 x1 + n2 (y2 + q3 y3 + · · · + qs ys ) + r3 y3 + · · · + rs ys .
We denote x2 = y2 + q3 y3 + · · · + qs ys . Also, all the ri = 0 because if any ri > 0, it would contradict
the minimality of n2 . Thus, z2 = a1 x1 + n2 x2 ∈ H and also n2 x2 = z2 − (a1 /n1 )n1 x1 ∈ H. By
Lemma 4.5.8, {x1 , x2 , y3 , . . . , ys } is a basis of G. Furthermore, if we consider the integer division of
n2 by n1 , written as n2 = n1 q + r with 0 ≤ r < n1 , then
n1 x1 + n2 x2 = n1 (x1 + qx2 ) + rx2 ∈ H.
But then, since r < n1 , by the minimal positive condition on n1 , we must have r = 0. Hence, n1 | n2 .
If {n1 x2 , n2 x2 } generates H, then we are done because the set is linearly independent since by
construction {x1 , x2 , y3 , . . . , ys } is a basis of G so {x1 , x2 } is linearly independent and {n1 x1 , n2 x2 }
is then a basis of H. The pattern continues and only terminates when it results in a basis
{x1 , . . . , xt , yt+1 , . . . , ys }
of G such that {n1 x1 , n2 x2 , . . . , nt xt } is a basis of H for some positive integers ni such that ni | ni+1
for 1 ≤ i ≤ t − 1.
4.5. FUNDAMENTAL THEOREM OF FINITELY GENERATED ABELIAN GROUPS 197
This proof is not constructive since it does not provide a procedure to find the ni , which is
necessary to construct the x1 , x2 , and so on. We merely know the ni exist by the well-ordering of
integers. In some instances it is easy to find a basis of the subgroup as in the following example.
Example 4.5.10. Consider the free abelian group G = Z3 and the subgroup H = {(x, y, z) ∈
Z3 | x + 2y + 3z = 0}. If we considered the equation x + 2y + 3z = 0 in Q3 , then the Gauss-
Jordan elimination algorithm gives H as the span of Span({(−2, 1, 0), (−3, 0, 1)}. Taking only integer
multiples of these two vectors does give all points (x, y, z) with y and z taking on every pair of
integers. Hence, {(−2, 1, 0), (−3, 0, 1)} is a basis of H and we see clearly that H is a free abelian
group of rank 2. 4
G∼
= Zr ⊕ Zd1 ⊕ Zd2 ⊕ · · · ⊕ Zdk (4.11)
for some nonnegative integers r, d1 , d2 , . . . , dk satisfying di ≥ 2 for all i and di+1 | di for
1 ≤ i ≤ k − 1.
Proof. Since G is finitely generated, there is a finite subset {g1 , g2 , . . . , gs } that generates G. Define
the function h : Zs → G by
h(n1 , n2 , . . . , ns ) = n1 g1 + n2 g2 + · · · + ns gs .
By the same reasoning as the proof of Theorem 4.5.5, h is a surjective homomorphism. Then Ker h
is a subgroup of Zs and by the First Isomorphism Theorem, since h is surjective, Zs /(Ker h) ∼ = G.
By Theorem 4.5.9, there exists a basis {x1 , x2 , . . . , xs } for Zs and positive integers n1 , n2 , . . . , nt ,
where ni divides ni+1 for all 1 ≤ i ≤ t − 1 such that {n1 x1 , n2 x2 , . . . , nt xt } is a basis of Ker h. Then
G∼
= (Z ⊕ Z ⊕ · · · ⊕ Z)/(n1 Z ⊕ · · · ⊕ nt Z ⊕ {0} ⊕ · · · ⊕ {0})
∼
= Zn ⊕ Zn ⊕ · · · ⊕ Zn ⊕ Z ⊕ · · · ⊕ Z.
1 2 t
Definition 4.5.12
As with free groups, the integer r is called the rank or the Betti number of G. It is sometimes
denoted by β(G). The integers d1 , d2 , . . . , dk are called the invariant factors of G and the
expression (4.11) is called the invariant factors decomposition of G.
It is interesting to note that the proof of Theorem 4.5.11 is not constructive in the sense that it
does not provide a method to find specific elements in G whose orders are the invariant factors of
G. The invariant factors exist by virtue of the well-ordering principle of the integers.
Theorem 4.5.11 applies to any finitely generated abelian group. However, applied to finite groups,
which are obviously finitely generated, it gives us an effective way to describe all abelian groups of
a given order n. If G is finite, the rank of G is 0. Then we must find all finite sequences of integers
d1 , d2 , . . . , dk such that
• di ≥ 2 for 1 ≤ i ≤ k;
198 CHAPTER 4. QUOTIENT GROUPS
• di+1 | di for 1 ≤ i ≤ k − 1;
• n = d1 d2 · · · dk .
The first two conditions are explicit in the above theorem. The last condition follows from the fact
that di = |xi | where the {x1 , x2 , . . . , xk } is a list of corresponding generators of G. Then every
element in G can be written uniquely as
g = α1 x1 + α2 x2 + · · · + αk xk
Lemma 4.5.16
Let q be a prime number and let G be an abelian group of order q m . Then G is uniquely
isomorphic to a group of the form
Proof. (This follows as a corollary to Theorem 4.5.11 so we leave the proof as an exercise for the
reader. See Exercise 4.5.13.)
Definition 4.5.17
Let m be a positive integer. Any decreasing sequence of the form α1 ≥ α2 ≥ · · · ≥ αk ≥ 1
such that α1 + α2 + · · · + αk = m is called a partition of m. The partition function,
sometimes denoted by p(m), is the number of partitions of m. The partition function is
often extended to nonnegative integers by assigning p(0) = 1.
4.5. FUNDAMENTAL THEOREM OF FINITELY GENERATED ABELIAN GROUPS 199
According to Lemma 4.5.16, if G is an abelian group of order q m for some prime q, then there
are p(m) possibilities for G, each corresponding to a partition of m.
Theorem 4.5.18
Let G be a finite abelian group of order n > 1 and let n = q1β1 q2β2 · · · qtβt be the prime
factorization of n. Then G can be written in a unique way as
G∼
= A1 ⊕ A2 ⊕ · · · ⊕ At where |Ai | = qiβi
A∼
= Zq α1 ⊕ Zq α2 ⊕ · · · ⊕ Zq αl
with α1 ≥ α2 ≥ · · · ≥ αl ≥ 1 and α1 + α2 + · · · + αl = m.
G∼
= Zd1 ⊕ Zd2 ⊕ · · · ⊕ Zdk
is the prime factorization of di , where by αij = 0 we mean that qj is not a prime factor of di .
The condition di+1 | di implies that for each j, the exponents on qj satisfy αi+1,j ≤ αij . The
condition that n = d1 d2 · · · dk implies that for each j,
Note that gcd(qja , qjb0 ) = 1 for any nonnegative integers a and b if j 6= j 0 . Therefore,
Zdi ∼
= Zq1αi1 ⊕ Zq2αi2 ⊕ · · · ⊕ Zqtαit .
Definition 4.5.19
The integers qiαi that arise in the expression of G described in the above theorem are called
the elementary divisors of G. The expression in Theorem 4.5.18 is the elementary divisors
decomposition.
We use the terminology “elementary divisors” because every cyclic group Zpα where p is a prime
number is not isomorphic to a direct sum of any smaller cyclic groups.
As a first example, notice that since 16 is a prime power, then because of Lemma 4.5.16, the
list of groups given in Example 4.5.13 provides both the elementary divisor decompositions and the
invariant factor decompositions of all 5 abelian groups of order 16.
Example 4.5.20. Let n = 2160 = 24 33 5. We find all abelian groups of order 2160. We remark
that 3 has three partitions, namely
3,
2 + 1,
1 + 1 + 1,
200 CHAPTER 4. QUOTIENT GROUPS
Z8 ⊕ Z2 ⊕ Z27 ⊕ Z5 ∼
= Z1080 ⊕ Z2
Z8 ⊕ Z2 ⊕ Z9 ⊕ Z3 ⊕ Z5 ∼
= Z360 ⊕ Z6
Z8 ⊕ Z2 ⊕ Z3 ⊕ Z3 ⊕ Z3 ⊕ Z5 ∼
= Z120 ⊕ Z6 ⊕ Z3
Z4 ⊕ Z4 ⊕ Z27 ⊕ Z5 ∼
= Z540 ⊕ Z4
∼ Z180 ⊕ Z12
Z4 ⊕ Z4 ⊕ Z9 ⊕ Z3 ⊕ Z5 =
Z4 ⊕ Z4 ⊕ Z3 ⊕ Z3 ⊕ Z3 ⊕ Z5 ∼
= Z60 ⊕ Z12 ⊕ Z3
Z4 ⊕ Z2 ⊕ Z2 ⊕ Z27 ⊕ Z5 ∼
= Z540 ⊕ Z2 ⊕ Z2
Z4 ⊕ Z2 ⊕ Z2 ⊕ Z9 ⊕ Z3 ⊕ Z5 ∼
= Z180 ⊕ Z6 ⊕ Z2
Z4 ⊕ Z2 ⊕ Z2 ⊕ Z3 ⊕ Z3 ⊕ Z3 ⊕ Z5 ∼
= Z60 ⊕ Z6 ⊕ Z6
Z2 ⊕ Z2 ⊕ Z2 ⊕ Z2 ⊕ Z27 ⊕ Z5 ∼
= Z270 ⊕ Z2 ⊕ Z2 ⊕ Z2
∼
Z2 ⊕ Z2 ⊕ Z2 ⊕ Z2 ⊕ Z9 ⊕ Z3 ⊕ Z5 = Z90 ⊕ Z6 ⊕ Z2 ⊕ Z2
Z2 ⊕ Z2 ⊕ Z2 ⊕ Z2 ⊕ Z3 ⊕ Z3 ⊕ Z3 ⊕ Z5 ∼
= Z30 ⊕ Z6 ⊕ Z6 ⊕ Z2 4
When listing out each possible isomorphism type for a group of given order, each group listed
according to the invariant factors decomposition corresponds to a unique group in the list according
to elementary divisors. The proof of Theorem 4.5.18 describes how to go from the invariant factors
decomposition to the corresponding elementary divisors decomposition. To go in the opposite di-
rection, collect the highest prime powers corresponding to each prime to get Zd1 ; collect the second
highest prime powers corresponding to each prime to get Zd2 ; and so forth.
We often refer to Theorems 4.5.11 and 4.5.18 collectively as the Fundamental Theorem of Finitely
Generated Abelian Groups. The two theorems simply provide alternative ways to uniquely describe
the torsion part of the group.
The power of the Fundamental Theorem of Finitely Generated Abelian Groups (FTFGAG), and
of classification theorems in general, is that in many applications of group theory, we encounter
abelian groups for which we naturally know the order. Then the FTFGAG gives us a list of possible
isomorphism types.
4.5. FUNDAMENTAL THEOREM OF FINITELY GENERATED ABELIAN GROUPS 201
Example 4.5.21. Consider the group U (20) of units in modular arithmetic modulo 20. We know
that this multiplicative group is abelian and that it has order φ(20) = φ(4)φ(5) = 8. According to
FTFGAG, U (20) may be isomorphic to Z8 , Z4 ⊕ Z2 , or Z2 ⊕ Z2 ⊕ Z2 . To determine which of these
possibilities it may be, consider powers of some elements. First, lets consider 3:
1 2 3 4
3 = 3, 3 = 9, 3 = 7, 3 = 1.
This actually gives us enough information to determine the isomorphism type. The element 3 has
order 4 so U (20) 6∼ = Z2 ⊕ Z2 ⊕ Z2 . Furthermore, both 9 and 19 = −1 have order 2. In Z8 , there
is only one element of order 2. Hence, U (20) 6∼
= Z8 . By FTFGAG and elimination of possibilities,
U (20) ∼ Z
= 4 ⊕ Z 2 . 4
Example 4.5.22. Consider the group U (46). This is an abelian group of order φ(46) = φ(2)φ(23) =
22. Using the elementary divisors decomposition of FTFGAG, we see that the only abelian group
of order 22 is Z22 . Hence, U (46) is cyclic. 4
Corollary 4.5.23
Let n > 1 be an integer and let n = q1β1 q2β2 · · · qtβt be the prime factorization of n. There
are
p(β1 )p(β2 ) · · · p(βt )
abelian groups of order n, where p is the partition function.
The main diagonal of a Young diagram consists of the boxes descending regularly starting with
the top left corner of the diagram. For (5, 2, 2, 1), the main diagonal only has two boxes.
•
•
202 CHAPTER 4. QUOTIENT GROUPS
The conjugate α is the partition α0 obtained by reflecting the Young diagram of α through its
main diagonal. Algebraically, the values of the conjugate partition are
αj0 = {1 ≤ i ≤ k | λi ≥ j} .
With α = (5, 2, 2, 1), the conjugate partition is α0 = (4, 3, 1, 1, 1). The intuition of the Young diagram
makes it clear that |α0 | = |α|, so if α is a partition of n, then so is α0 . In Young diagram notation,
we write:
λ= =⇒ λ0 =
We have already seen a place where partitions of integers arise naturally. Recall that the conju-
gacy classes in Sn correspond to all permutations with a given cycle type. Each cycle type corre-
sponds uniquely to a partition of n. For example, in S6 , the cycle type (a b c)(d e) corresponds to the
partition 3 + 2 + 1. More generally, each partition α with |α| = n corresponds to the conjugacy class
of permutations which, when written in cycle notations form, have cycles of length α1 , α2 , . . . , αk .
18. Let G = hx, y | x12 = y 18 = 1, xy = yxi. Consider the subgroup H = hx4 y −6 i. Find the isomorphism
type of G/H.
19. Let G = (Z/12Z)2 and let H = {(a, b) | 2a + 3b = 0}. Determine the isomorphism type of G/H.
20. Let G = (Z/12Z)2 and let H = {(a, b) | a + 5b = 0}. Determine the isomorphism type of G/H.
21. Prove Cauchy’s Theorem for abelian groups. In other words, let G be an abelian group and prove
that if p is a prime number dividing |G| then G contains an element of order p.
22. Prove Corollary 4.5.23.
23. Let p be a prime number. Prove that Aut(Zp ⊕ Zp ) ∼
= GL2 (Fp ).
24. Let G be an abelian group. Prove that Aut(G) is abelian if and only if G is cyclic.
25. What is the first integer n such that there exist two distinct partitions α and β of n such that α0 = α
and β 0 = β?
26. Let p(n) be the partition function of n. Prove that as power series
∞ ∞
X Y 1
p(n)xn = .
n=0
1 − xk
k=1
1
[Hint: = 1 + y + y 2 + y 3 + · · · .]
1−y
27. Find the error in the proof of the following erroneous proposition.
⇒ g2−1 g1 a−1
2 a1 = 1
⇒ g2−1 g1 ∈ A
⇒ g2 A = g1 A.
28. Prove that the set of functions from N to Z, denoted Fun(N, Z), equipped with function addition + is
not a free abelian group.
29. Consider the subgroup
4.6
Projects
Project I. Fermat’s Theorem on Matrix Groups. Consider the family of groups G =
GLn (Fp ), where n is some positive integer with n ≥ 2 and p is a prime number. Lagrange’s
Theorem ensures that for all matrices A ∈ G, we have A|G| . Is |G| the smallest integer k such
that Ak = I for all A ∈ G? If not, try to find
[Use a computer algebra system to assist with the calculations. For example, Maple has the
package LinearAlgebra[Modular] for working with matrices with coefficients in Z/nZ.]
Project II. Orders of Elements in Abelian p-Groups. Let α be a partition of an integer of
length r and denote by G2,α the group
Try to find a formula for how many elements G2,α contains of order 2k for all integers k. Also
study the same question for similar groups of the form Gp,α , where p is prime.
Project III. Escher and Symmetry. Consider some of M. C. Escher’s artwork that depicts
interesting tessellations of the plane. For each of these, consider their corresponding wallpaper
group E.
(1) Describe a set of natural (simple) generators for E.
(2) Write down relations between the generators of E and thus give a presentation of E.
(3) Find some normal subgroups N of E, determine E/N , and explain geometrically why N
is normal and what information is contained in E/N .
(4) Describe some subgroups of E that are not normal and explain geometrically why they
are not normal.
(5) Find an Escher tessellation of the Poincaré disk and find generators and relations for that
group.
[For this project, you should feel free to look up Escher art and information about the Poincaré
disk but nothing else beyond that.]
Project IV. The Special Projective Group PSL2 (F5 ). In Exercise 4.2.27, we proved that A5
is simple and we know that it has order 60. In Example 4.3.11, we introduced the projective
linear groups. Intuitively speaking, PSLn (Fp ), where p is prime, “removes” from GLn (Fp )
normal subgroups that are obvious. A quick calculation (that you should do) shows that
PSL2 (F5 ) has order 60. For this project, consider the following questions. Is PSL2 (F5 ) simple?
What are conjugacy classes in PSL2 (F5 )? Is PSL2 (F5 ) isomorphic to A5 ? Can you generalize
any of these investigations to other n and other p?
[Results about projective linear groups are well-known. The value of the project consists not
in doing some Internet search and reporting on the work of other people but attempting to
discover/prove things yourself.]
Project V. Quotient Groups and Cayley Graphs. Let G be a group and let N be a subgroup.
Suppose that G has a presentation with a set of generators {g1 , g2 , . . . , gs }. Then we know
that {g 1 , g 2 , . . . , g s }, where g i = gi N is a generating set of G/N . Interpret geometrically how
the Cayley graph of G/N with generators {g 1 , g 2 , . . . , g s } is related the Cayley graph of G
with generators {g1 , g2 , . . . , gs }. Illustrate this relationship with examples that can be realized
as polyhedra in R3 .
4.6. PROJECTS 205
Project VI. Embeddings of Sn . It is not hard to show (and you should do it) that for every
integer n, there is an embedding of Sn in GLn (Fp ) for all primes p. However, the fact that
GL2 (F2 ) ∼
= S3 gives an example where Sn is embedded in (in this case actually isomorphic to)
a group GLk (Fp ), where k < n. Investigate for what k < n and what p it may be possible or
is impossible to embed Sn into GLk (Fp ). Illustrate some embeddings with specific examples.
Project VII. A Visualization of a Group. This project describes a way (useful or not is up
to you to decide) to visualize some properties of a group with two generators. Suppose that
G = ha, bi and that there may be relations between a and b. We use the first quadrant of the
xy-plane. Start with the identity 1 at the origin. Then, translating a point by (1, 0) corresponds
to operating on the left by a and translating a point by (0, 1) corresponds to operating on the
left by b. Study this visualization and its usefulness for groups (or subgroups) generated by
two elements. In particular, will every element arise as some point in the first quadrant? Can
we “see” subgroups or normal subgroups in this visualization? Are cyclic subgroups visible?
Are quotient groups visible?
Project VIII. Pascal’s Triangle for Groups. The entries of Pascal’s (usual) Triangle are
nonnegative integers. If we consider Pascal’s Triangle in modular arithmetic Z/nZ, then all
the operations in the triangle occur in the group (Z/nZ, +). We generalize Pascal’s Modular
Triangle to a Pascal’s Triangle for a group in the following way. Let a and b be elements in
a group G. Start with a on the first row. On a diagonal going down from 1 to the left put
a’s. On a diagonal going down and to the right put b’s. Then, in rows below, similar to the
constructing of Pascal’s Triangle, we fill in the rows by operating the element above to the left
with the element above and to the right. The following diagram shows the first few rows.
a
a b
a ab b
a a2 b ab2 b
a a3 b a2 bab2 ab3 b
One way to visualize this Pascal’s Triangle for a group is to color code boxes based on the
group element. Write a program that draws the color-coded Pascal Triangle for a small group
and certain elements a and b up to any number of rows you specify. Then, explore any patterns
that emerge in Pascal’s Triangle for the group. (Try, for example, cyclic groups, a dihedral
group, or Z2 ⊕ Z2 .) Are there any patterns related to subgroup structure or quotient group
structure?
[This project was inspired by the article [7].]
5. Rings
Depending on the order of chapters covered, the reader will have encountered the following algebraic
structures so far: sets, posets, real vector spaces, groups, monoids, semigroups, and representations
of a group G. We now turn to the study of another algebraic structure, that of rings.
As with groups, rings possess a surprising number of properties and considerable internal struc-
ture. Furthermore, they arise in many natural contexts within mathematics. However, rings possess
some properties and applications with no equivalents in group theory while some issues that are im-
portant in group theory have no equivalent or are no longer interesting in ring theory. For example,
in ring theory, it is common to define various classes of rings in order to expedite the statement of
important theorems. We will again follow the outline provided in the preface to approach algebraic
structures.
Section 5.1 defines rings, gives a few initial examples, and points out some novel issues to consider
in rings. As a place to gather many more examples, Section 5.2 explores a few common classes of
rings. Section 5.3 specifically considers the examples of matrix rings. Section 5.4 introduces the
concept of ring homomorphism as functions that preserve the ring structure.
Section 5.5 defines ideals of a ring, discusses convenient ways to describe ideals, and presents
operations on ideals. Section 5.6 discusses quotient rings and presents two important applications
of quotient rings: the Chinese Remainder Theorem and the isomorphism theorems. Section 5.7
explores the concepts of maximal and prime ideals, two generalizations to commutative rings of the
notion of primeness in the integers.
5.1
Introduction to Rings
The arithmetic on the integers carries more than just the structure of a group. The arithmetic of
integers involves both addition and multiplication. (Subtraction and division simply involve the
inverses of addition and multiplication, so we do not need to distinguish them.) Rings have two
operations that satisfy a certain set of axioms inspired by some properties of integers. However, the
ring axioms are loose enough to include many other algebraic contexts that possess both an addition
and a multiplication and that differ considerably from the integers.
Definition 5.1.1
A ring is a triple (R, +, ×) where R is a set and + and × are binary operations on R that
satisfy the following axioms:
(1) (R, +) is an abelian group.
(2) × is associative.
(3) × is distributive over +, i.e., for all a, b, c ∈ R:
(a) (a + b) × c = a × c + b × c (right-distributivity);
(b) a × (b + c) = a × b + a × c (left-distributivity).
207
208 CHAPTER 5. RINGS
As in group theory, we will often simply refer to the ring by R if there is no confusion about
what the operations might be. In an abstract ring, we will denote the additive identity by 0 and
refer to it as the “zero” of the ring. The additive inverse of a is denoted by −a. As with typical
algebra over the reals, we will often write the multiplication as ab instead of a × b. Furthermore, if
n ∈ N and a ∈ R, then n · a represents a added to itself n times,
n times
def
z }| {
n · a = a + a + · · · + a.
We extend this notation to all integers by defining 0 · a = 0 and, if n > 0, then (−n) · a = −(n · a).
Definition 5.1.2
• A ring (R, +, ×) is said to have an identity, denoted by 1, if there is an element 1 ∈ R
such that 1 × a = a × 1 = a for all a ∈ R.
• A ring is said to be ring!commutative if × is commutative.
Since a ring always possesses an additive identity, when one simply says “a ring with identity,”
one refers to the multiplicative identity, the existence of which is not required by the axioms.
It is possible for the multiplicative identity 1 of a ring to be equal to the additive identity 0.
However, according to part (1) in the following proposition, the ring would then consist of just one
element, namely 0. This case serves as an exception to many theorems we would like to state about
rings with identity. Consequently, since this is not a particularly interesting case, we will often refer
to “a ring with identity 1 6= 0” to denote a ring with identity but excluding the case in which 1 6= 0.
We will soon introduce a number of elementary examples of rings. However, before we do so,
we prove the following proposition that holds for all rings. Many of these properties, as applied to
integers and real numbers, are rules that elementary school children learn early on.
Proposition 5.1.3
Let R be a ring.
(1) ∀a ∈ R, 0a = a0 = 0.
(2) ∀a, b ∈ R, (−a)b = a(−b) = −(ab).
Proof. For (1), note that 0 + 0 = 0 since it is the additive identity. By distributivity,
a0 = a(0 + 0) = a0 + a0.
Adding −(a0) to both sides of this equation, we deduce that 0 = a0. A similar reasoning holds for
0a = 0.
For (2), note that 0 = 0b = (a + (−a))b = ab + (−a)b. Adding −(ab) to both sides, we deduce
that (−a)b = −(ab). A similar reasoning holds for a(−b) = −(ab).
An application of (2) twice gives (−a)(−b) = −(a(−b)) = −(−(ab)) = ab.
Finally, for part (4), suppose that e1 and e2 satisfy the axioms of an identity. Then e1 = e1 e2
since e2 is an identity but e2 = e1 e2 since e1 is an identity. Thus, e1 = e1 e2 = e2 . Furthermore,
1a = a by definition so
0 = 0a = (1 + (−1))a = a + (−1)a.
In subsequent sections, we encounter many examples of rings and see a number of methods to
define new rings from old ones. In this first section, however, we just introduce a few basic examples
of rings.
Example 5.1.4. The triple (Z, +, ×) is a commutative ring. Note that (Z − {0}, ×) is not a group.
In fact, the only elements in Z that have multiplicative inverses are 1 and−1. 4
Example 5.1.5. The sets Q, R, and C with their usual operations of + and × form commutative
rings. In each of these, all nonzero elements have multiplicative inverses. 4
Example 5.1.6 (Modular Arithmetic). For every integer n ≥ 2, the context of modular arith-
metic, namely the triple (Z/nZ, +, ×) forms a ring. 4
Definition 5.1.7
Let R be a ring with identity 1 6= 0. The characteristic of R is the smallest positive integer
n such that n · 1 = 0. If such a positive integer does not exist, the characteristic of R is
said to be 0. The characteristic of a ring is often denoted by char(R).
H = {a + bi + cj + dk | a, b, c, d ∈ R},
where elements add like vectors with {1, i, j, k} acting as a basis. He defined multiplication × on
H where arbitrary elements must satisfy distributivity and the elements 1, i, j, k multiply together
as they do in the quaternion group Q8 . The triple (H, +, ×) is a ring. (We have not checked
associativity but we leave this as an exercise for the reader. See Exercise 5.1.14.) It is obvious that
H is not commutative since ij = k, whereas ji = −k.
To illustrate a few simple calculations, consider for example the elements α = 2 − 3i + 2k and
β = 5 + 4i − 4j. We have
α + β = 7 + i − 4j + 2k,
αβ = (2 − 3i + 2k)(5 + 4i − 4j)
= 10 + 8i − 8j − 15i − 12i2 + 12ij + 10k + 8ki − 8kj
= 10 + 8i − 8j − 15i − 12(−1) + 12k + 10k + 8j − 8(−i)
= 22 + i + 22k,
βα = (5 + 4i − 4j)(2 − 3i + 2k)
= 10 − 15i + 10k + 8i − 12i2 + 8ik − 8j + 12ji − 8jk
= 10 − 15i + 10k + 8i − 12(−1) + 8(−j) − 8j + 12(−k) − 8i
= 22 − 15i − 16j − 2k.
It is vital that we do not change the order of the quaternion basis elements in any product, in
particular when applying distributivity. Hence, (4i)(2k) = 8ik = −8j, while (2k)(4i) = 8ki = 8j. 4
Example 5.1.9 (Ring of Functions). Let I be an interval of real numbers and let Fun(I, R) be
the set of all functions from I to R. Equipped with the usual addition and multiplication of functions,
210 CHAPTER 5. RINGS
Fun(I, R) is a commutative ring. The properties of a ring are inherited from R. In contrast, Fun(I, R)
is not a ring when equipped with addition and composition because the axiom of distributivity fails.
For example, consider the three functions f (x) = x + 1, g(x) = x2 , and h(x) = x3 . Then
(f ◦ (g + h))(x) = f (x2 + x3 ) + 1 = x3 + x2 + 1
(f ◦ g)(x) + (f ◦ h)(x) = x2 + 1 + x3 + 1 = x3 + x2 + 2. 4
(a, b) + (c, d) = (a +1 c, b +2 d)
(a, b) × (c, d) = (a ×1 c, b ×2 d).
Definition 5.1.11
Let R be a ring.
• A nonzero element r ∈ R is called a zero divisor if there exists s ∈ R − {0} such that
rs = 0 or sr = 0.
• Assume R has an identity 1 6= 0. An element of u ∈ R is called a unit if it has a
multiplicative identity, i.e., ∃v ∈ R, such that uv = vu = 1. The element v is often
denoted u−1 . The set of units in R is denoted by U (R).
Note that the identity 1 is itself a unit, but that the 0 element is not a zero divisor. This lack of
symmetry in the definitions may seem unappealing but this distinction turns out to be useful in all
theorems that discuss units and zero divisors.
The notation U (R) is reminiscent of the notation U (n) as the set of units in Z/nZ. In the ring
(Z/nZ, +, ×), every nonzero element is either a unit or a zero divisor. Proposition 2.2.9 established
that the units in Z/nZ are elements a such that gcd(a, n) = 1. Now if gcd(a, n) = d 6= 1, then there
exists some k ∈ Z such that ak = n`. Then in Z/nZ we have ak = 0.
In an arbitrary ring, it is not in general true that every nonzero element is either a unit or a zero
divisor. We need look no further than the integers. The units in the integers are U (Z) = {−1, 1},
and all the elements greater than 1 in absolute value are neither units nor zero divisors. On the
other hand, as the following proposition shows, no element can be both.
Proposition 5.1.12
Let R be a ring with identity 1 6= 0. The set of units and the set of zero divisors in a ring
are mutually exclusive.
Proof. Assume that a is both a unit and a zero divisor. Then there exists b ∈ R − {0} such that
ba = 0 or ab = 0. Assume without loss of generality that ba = 0. There also exists c ∈ R such that
5.1. INTRODUCTION TO RINGS 211
ac = 1. Then
b = b(ac) = (ba)c = 0c = 0.
As mentioned above, the set of units contains the multiplicative identity. Furthermore, by
definition, every element in U (R) has a multiplicative inverse. This leads to the simple remark that
we phrase as a proposition.
Proposition 5.1.13
Let R be a ring with identity 1 6= 0. Then U (R) is a group.
Proposition 5.1.14
Let R1 and R2 be rings each with an identity 1 6= 0. Then, R1 ⊕ R2 has an identity (1, 1)
and, as an isomorphism of groups,
U (R1 ⊕ R2 ) ∼
= U (R2 ) ⊕ U (R2 ).
The following definitions refer to elements with specific properties related to their powers. Prop-
erties of such ring elements are studied in the exercises.
Definition 5.1.15
Let R be a ring. An element a ∈ R is called nilpotent if there exists a positive integer k
such that ak = 0. The subset of nilpotent elements in R is denoted by N (R).
Definition 5.1.16
Let R be a ring. An element a ∈ R is called idempotent if a2 = a.
Definition 5.1.17
A ring R is called an integral domain if it is commutative, contains an identity 1 6= 0, and
contains no zero divisors.
The terminology of integral domain evokes the fact that integral domains resemble the algebra
of the integers. However, we will encounter many other integral domains besides the integers.
Example 5.1.18. The ring Z ⊕ Z is not an integral domain because, in particular, (1, 0) · (0, 1) =
(0, 0) so (1, 0) and (0, 1) are zero divisors. 4
212 CHAPTER 5. RINGS
Definition 5.1.20
A ring R with identity 1 6= 0 is called a division ring if every nonzero element in R is a
unit.
Example 5.1.21. The ring of quaternions H is a division ring. A simple calculation gives
(a + bi + cj + dk)(a − bi − cj − dk)
= a2 − abi − acj − adk + abi − b2 (−1) − bck − bd(−j)
+ acj − bc(−k) − c2 (−1) − cdi + adk − bdj − cd(−i) − d2 (−1)
= a2 + b2 + c2 + d2 .
By changing the signs on b, c, and d, we get the same result with the product in reverse order. For
all quaternions α = a + bi + cj + dk 6= 0, the sum of squares a2 + b2 + c2 + d2 6= 0 and so the inverse
of α is
1
α−1 = 2 (a − bi − cj − dk).
a + b2 + c2 + d2
The quaternions H are an example of a noncommutative division ring. Because of the importance
of the above calculation, if α = a + bi + cj + dk ∈ H we define the notation α = a − bi − cj − dk and
we call
def
N (α) = αα = a2 + b2 + c2 + d2
the norm of α. 4
Definition 5.1.22
A commutative division ring is called a field .
5.1.4 – Subrings
As in every algebraic structure, we define the concept of substructure.
Definition 5.1.23
Let (R, +, ×) be a ring. A subset S is called a subring of R if (S, +) is a subgroup of (R, +)
and if S is closed under ×. If R is a ring with an identity 1R and if S is a subring with an
identity 1S = 1R , then S is called a unital subring of R.
5.1. INTRODUCTION TO RINGS 213
Using the One-Step Subgroup Criterion from group theory, in order to prove that S is a subring
of a ring R, we simply need to prove that S is closed under subtraction with the usual definition of
subtraction
def
a − b = a + (−b)
and closed under multiplication.
As an example, if R = Z, then for any integer n, consider the subset of multiples nZ. We know
that the subset nZ is a subgroup with +. Furthermore, for all na, nb ∈ nZ we have (na)(nb) =
n(nab) ∈ nZ, so nZ is closed under multiplication. Hence, nZ is a subring of Z.
This first example illustrates that the definition of a subring makes no assumption that if R
contains an identity 1 6= 0 then a subring S does also. In contrast, it is possible that a subring S
contains an identity 1S that is different from the identity 1R . For example, consider the subring
S = {0, 2, 4} of the ring R = Z/6Z. Obviously, 1R = 1 but it is easy to check that the identity of S
is 1S = 4. With the above terminology, S is a subring of R but not a unital subring of R.
Example 5.1.24 (Ring of Continuous Functions). Consider the set C 0 ([a, b], R) of continuous
real-valued functions from the interval [a, b]. This is a subset of the ring of functions Fun([a, b], R)
from the interval [a, b] to R. The reader should recall that some theorems, usually introduced in a
first calculus course and proven in an analysis course, establish that subtraction and × are binary
operations on C 0 ([a, b], R). In particular, the proof that the product of two continuous functions is
continuous is not trivial. Consequently, C 0 ([a, b], R) is a subring of Fun([a, b], R). 4
Some properties of a ring are preserved in subrings. For example, any subring of a commutative
ring is again a commutative ring. Furthermore, if R is an integral domain, then any subring of R
that contains 1 is also an integral domain. On the other hand, a subring of a field need not be a
field even if it contains the identity. The integers Z as a subring in Q gives an example of this.
As an abstract example of subrings we discuss the center of a ring.
Definition 5.1.25
Let R be a ring. The center of R, denoted C(R), consists of all elements that commute
with every other element under multiplication. In other words,
Proposition 5.1.26
Let R be a ring. The center C(R) is a subring of R.
r(z1 + (−z2 )) = rz1 + r(−z2 ) = rz1 + (−(rz2 )) = z1 r + (−(z2 r)) = z1 r + (−(z2 )r) = (z1 + (−z2 ))r.
Hence, z1 + (−z2 ) ∈ C(R) and thus (C(R), +) is a subgroup of (R, +). Furthermore,
In Exercises 5.1.1 through 5.1.8, decide whether the given set R along with the stated addition and multipli-
cation form a ring. If it is, prove it and decide whether it is commutative and whether it has an identity. If
it is not, decide which axioms fail. You should always check that the symbol is in fact a binary operation on
the given set.
1. Let R = R>0 , with the addition of x y = xy, and the multiplication x ⊗ y = xy .
214 CHAPTER 5. RINGS
(a) m · (r + s) = (m · r) + (m · s)
(b) (m + n) · r = (m · r) + (n · r)
10. Let R be a ring, let r, s ∈ R, and let m, n ∈ Z. Prove the following formulas with the · notation.
11. Prove that in C 0 ([a, b], R) the composition operation ◦ is right-distributive over +.
12. Let I be an interval of real numbers. Prove that the zero divisors in Fun(I, R) are nonzero functions
f (x) such that there exists x0 ∈ I such that f (x0 ) = 0. Prove that all the elements in Fun(I, R) are
either 0, a zero divisor, or a unit.
13. Prove (carefully) that the nonzero elements in (C 0 ([a, b], R), +, ×) that are neither zero divisors nor
units are functions for which there exists an x0 ∈ [a, b] and an ε > 0 such that f (x0 ) = 0 and for
which f (x) 6= 0 for all x such that 0 < |x − x0 | < ε.
14. Prove that multiplication in H is associative.
15. Let α = 1 + 2i + 3j + 4k and β = 2 − 3i + k in H. Calculate the following operations: (a) α + β; (b)
αβ; (c) βα; (d) αβ −1 ; (e) β 2 .
16. Let α, β ∈ H be arbitrary. Decide whether any of the operations αβ −1 , β −1 α, βα−1 , or α−1 β are
equal.
17. Let R = {a+bi+cj +dk ∈ H | a, b, c, d ∈ Z}. Prove that R is a subring of H and prove that U (R) = Q8 ,
the quaternion group.
18. Fix an integer n ≥ 2. Let R(n) be the set of symbols a + ib where a, b ∈ Z/nZ. Define + and × on R
like addition and multiplication in C.
19. Define Hom(V, W ) as the set of linear transformations from a real vector space V to an other real
vector space W . Prove that Hom(V, V ), equipped with + and ◦ (composition) is a ring.
20. Let R1 and R2 be rings with nonzero identity elements. Prove that U (R1 ⊕ R2 ) ∼= U (R2 ) ⊕ U (R2 ).
Prove the equivalent result for a finite number of rings R1 , R2 , . . . , Rn .
21. Prove that the characteristic char(R) of an integral domain R is either 0 or a prime number. [Hint:
By contradiction.]
22. Consider the ring Z ⊕ Z and consider the subset R = {(x, y) | x − y = 0}. Prove that R is a subring.
Decide if R is an integral domain.
23. Prove that a finite integral domain is a field.
24. Let R1 and R2 be rings. Prove that R1 ⊕ R2 is an integral domain if and only if R1 is an integral
domains and R2 = {0} or vice versa.
5.1. INTRODUCTION TO RINGS 215
25. (Binomial Formula) Let R be a ring and suppose that x and y commute in R. Prove that for all
positive integers n, !
n
n
X n n−i i
(x + y) = x y.
i=0
i
26. Let R be a ring and suppose that x and y commute in R. Prove that for all positive integers n,
44. Let C n ([a, b], R) be the set of real-valued functions on [a, b] whose first n derivatives exist and are
continuous. Prove that C n+1 ([a, b], R) is a proper subring of C n ([a, b], R).
45. Let R be any ring and let {Si }i∈I be a collection of subrings (not necessarily finite or countable).
Prove that the intersection \
Si
i∈I
is a subring of R.
46. Let R be a ring and let R1 and R2 be subrings. Prove by a counterexample that R1 ∪ R2 is in general
not a subring.
47. Let R be a ring and let a be a fixed element of R. Define C(a) = {r ∈ R | ra = ar}. Prove that C(a)
is a subring of R.
48. Let R be a ring and let a be a fixed element of R.
(a) Prove that the set {x ∈ R | ax = 0} is a subring of R.
(b) With R = Z/100Z, and a = 5, find the subring defined in part (a).
5.2
Rings Generated by Elements
Following the general outline presented in the preface, this section first introduces a particular
method to efficiently describe certain types of subrings. Motivated by the notation, we introduce
two important families of rings that build new rings from old ones.
the
1subring
is closed under multiplication, for all integers k and n, the fraction 2kn is an element in
Z 2 . It is not hard to show that the set
k
k, n ∈ Z
2n
It is not uncommon for the ring A to be implied by the elements in the set S. The following two
examples illustrate this habit of notation.
Example 5.2.2 (Gaussian Integers). Consider the ring Z[i]. It is understood that i is the imag-
inary number that satisfies i2 = −1. This notation assumes that the superset ring A is the ring
C. The ring Z[i] contains all the integers and, since it is closed under multiplication, it contains all
integer multiples of i. Since Z[i] is closed under addition, it must contain the subset
{a + bi ∈ C | a, b ∈ Z}.
However, this subset is closed under subtraction and under multiplication with
−4 + 3i
3 + 2i
x
−1 − i
Hence, this subset is the smallest subring in C containing both Z and the element i and so it is
precisely Z[i]. In the usual manner of depicting a complex number a + bi as a point in the plane,
the subring Z[i] consists of the points with integer coordinates. (See Figure 5.1.)
The ring Z[i] is called the ring of Gaussian integers and is important in elementary number
theory.
In C, the multiplicative inverse of an element is
a − bi
(a + bi)−1 = .
a2 + b2
The group of units U (Z[i]) consists of elements a + bi ∈ Z[i] such that
a b
, ∈ Z.
a2 + b2 a2 + b2
If |a| ≥ 2, then a2 > |a|, in which case a2 + b2 > |a| and hence a2 + b2 could not divide a. A
symmetric result holds for b. Consequently, if a + bi ∈ U (Z[i]), then |a| ≤ 1 and |b| ≤ 1. However,
if |a| = 1 and |b| = 1, then a2 + b2 = 2, while a = ±1 and so a/(a2 + b2 ) ∈
/ Z. Thus, we see that the
only units in Z[i] have |a| = 1 and b = 0 or a = 0 and |b| = 1. Hence,
−7
~ı +
3~u
~u
√
5− 2
√ R
−7 + 3 2 ~ı 5~ı −
~
u
√
Figure 5.2: A representation of Z[ 2]
The following proposition is the main point of this subsection. The result may feel intuitively
obvious but we provide the proof to illustrate that the details are not so obvious.
Proposition 5.2.4
Let R be a commutative ring. Then with the operations of addition and multiplication
defined as above, R[x] is a commutative ring that contains R as a subring.
5.2. RINGS GENERATED BY ELEMENTS 219
Proof. Let a(x), b(x), c(x) be polynomials in R[x]. Let n = max{deg a, deg b, deg c}. If k > deg p(x)
for any polynomial, we will assume pk = 0. Then
a(x) + (b(x) + c(x)) = a(x) + ((bn + cn )xn + · · · + (b1 + c1 )x + (b0 + c0 ))
= (an + (bn + cn )) xn + · · · + (a1 + (b1 + c1 )) x + (a0 + (b0 + c0 ))
= ((an + bn ) + cn ) xn + · · · + ((a1 + b1 ) + c1 ) x + ((a0 + b0 ) + c0 )
= ((an + bn )xn + · · · + (a1 + b1 )x + (a0 + b0 )) + c(x)
= (a(x) + b(x)) + c(x).
So + is associative on R[x].
The additive identity is the 0 polynomial. The additive inverse of a polynomial a(x) is −an xn −
· · · − a1 x − a0 . The addition is commutative so (R[x], +) is an abelian group.
To show that polynomial multiplication is associative, we use (5.1). If deg a(x) = m, deg b(x) = n,
and deg c(x) = `, then
m+n
X X
(a(x)b(x)) c(x) = ai bj xk c(x)
q=0 i+j=q
m+n+`
X X X
= ai bj ck xh
h=0 q+k=h i+j=q
m+n+`
X X
= ai bj ck xh
h=0 i+j+k=h
m+n+`
X X X
= ai bj ck xh
h=0 i+r=h j+k=r
Xn+` X
= a(x) bj ck xr
r=0 j+k=r
= a(x) (b(x)c(x)) .
Since R is commutative, we have
m+n
X X m+n
X X
a(x)b(x) = ai bj xk = bj ai xk = b(x)a(x).
k=0 i+j=k k=0 i+j=k
= a(x)b(x) + a(x)c(x).
220 CHAPTER 5. RINGS
In elementary algebra, we regularly work in the context of Z[x], Q[x], or R[x], which are poly-
nomial rings with integer, rational, or real coefficients respectively. However, consider the following
example where the ring of coefficients is a finite ring.
Example 5.2.5. Consider the polynomial ring S = (Z/3Z)[x]. As examples of operations in this
ring, let p(x) = x2 + 2x + 1 and q(x) = x2 + x + 1 be two polynomials in S. (For brevity, we omit
the bar and write 2 instead of 2.) We calculate the addition and multiplication:
Polynomial rings are important families of rings and find applications in countless areas. We will
use them for many examples of properties of rings and we will study properties of polynomial rings
at length.
Proposition 5.2.6
Let R be an integral domain.
(1) Let a(x), b(x) be nonzero polynomials in R[x]. Then deg a(x)b(x) = deg a(x) +
deg b(x).
(2) The units of R[x] are the units of R. In other words, U (R[x]) = U (R).
(3) R[x] is an integral domain.
Proof. If deg a(x) = m and deg b(x) = n, then a(x)b(x) has no terms of degree higher than m + n.
However, since am 6= 0 and bn 6= 0, the product contains the term am bn xm+n as long as am bn 6= 0.
Since R contains no zero divisors, am bn 6= 0. Hence, deg a(x)b(x) = m + n = deg a(x) + deg b(x).
The multiplicative identity in R[x] is the degree 0 polynomial 1. Suppose a(x) ∈ U (x) and let
b(x) ∈ U (x) with a(x)b(x) = 1. Then deg(a(x)b(x)) = 0. Since the degree of a polynomial is a
nonnegative integer, part (1) implies that deg a(x) = 0. Hence, a(x) ∈ R and part (2) follows.
Since R is an integral domain it contains an identity 1 6= 0. This is also the multiplicative
identity for R[x]. Let a(x), b(x) be nonzero polynomials. Let am xm be the leading term of a(x)
and let bn xn be the leading term of b(x). Then am bn xm+n is the leading term of their product.
Since the coefficients am and bn are nonzero and R contains no zero divisors, am bn 6= 0 and hence
a(x)b(x) 6= 0. Thus, R[x] is an integral domain.
Proposition 5.2.7
Let R be a commutative ring. A polynomial a(x) ∈ R[x] is a zero divisor if and only if
∃r ∈ R − {0} such that r(a(x)) = 0.
Proof. (⇐=). This direction is obvious, since r can be viewed as a polynomial (of degree 0).
(=⇒). Suppose that a(x) is a zero divisor in R[x]. This means that there exists a polynomial
b(x) ∈ R[x] such that a(x)b(x) = 0. We write
a(x) = am xm + · · · + a1 x + a0 ,
b(x) = bn xn + · · · + b1 x + b0 .
5.2. RINGS GENERATED BY ELEMENTS 221
We will show that r = bm+10 satisfies ra(x) = 0. More precisely, we show by (strong) induction that
ai bi+1
0 = 0 for all 0 ≤ i ≤ m.
The term of degree 0 in the product a(x)b(x) has the coefficient a0 b0 . We must have a0 b0 = 0
since a(x)b(x) = 0. This gives the basis step of our proof by induction. Now suppose that ai bi+1
0 =0
for all 0 ≤ i ≤ k. The term of degree k + 1 in a(x)b(x) = 0 is
k+1
X
0 = ak+1 b0 + ak b1 + · · · + a1 bk + a0 bk+1 = ai bk+1−i .
i=0
0 = ak+1 bk+2
0 + ak bk+1
0 b1 + · · · + a1 bk+1
0 bk + a0 bk+1
0 bk+1 .
Using the induction hypothesis, we get 0 = ak+1 bk+2 0 because all the other terms are 0. By induction,
ai bi+1
0 = 0 for all coefficients ai in the polynomial a(x). Since deg a(x) = m, we have bm+1
0 a(x) = 0.
Having described the construction of a polynomial ring in one variable, the construction extends
naturally to polynomial rings in more than one variable. If R is a commutative ring, then R[x] is
another commutative ring and R[x][y] is then a polynomial ring in the two variables x and y. We
typically write R[x, y] instead of R[x][y] for the polynomial ring with coefficients in R and with the
two variables x and y. More generally, if x1 , x2 , . . . , xn are symbols for variables, then we inductively
define the polynomial ring with coefficients in R in these variables by
Because of this inductive definition, it is not particularly appropriate to use the terminology of
“multivariable polynomial ring” because R[x1 , x2 , . . . , xn ] can be viewed as a polynomial ring in one
variable but with coefficients in R[x1 , x2 , . . . , xn−1 ].
Polynomial rings offer many examples of properties of rings, in particular commutative rings. In
upcoming sections, we will study properties of rings F [x], where F is a field, or other polynomial
rings R[x] where R has specific properties. Polynomial rings F [x1 , x2 , . . . , xn ], where F is a field, are
more challenging to study than F [x]. Chapter 12 studies such rings, and more generally, Noetherian
rings, in detail.
a1 g1 + a2 g2 + · · · + an gn
where ai ∈ R. Note that if g1 is the group identity, we usually write a1 g1 as just a term a1 . As with
polynomials, we call any summand ai gi a term of the formal sum.
Addition of formal sums is done component-wise:
n
X n
X Xn
ai gi + bi gi = (ai + bi )gi .
i=1 i=1 i=1
We define the multiplication · of formal sums by distributing · over + and then rewriting terms as
where the product ai bj occurs in R and the operation gi gj = gk corresponds to the group operation
in G. Then, just as with polynomials, one gathers like terms.
We illustrate the operations defined on formal sums with a few examples.
222 CHAPTER 5. RINGS
Example 5.2.8. Let G = D5 and consider the set of formal sums Z[D5 ]. Let
α = r2 + 2r3 − s and β = −r2 + 7sr.
Then α + β = 2r3 − s + 7sr and
αβ = (r2 + 2r3 − s)(−r2 + 7sr)
= −r4 + 7r2 sr − 2r5 + 14r3 sr + sr2 − 7ssr
= −r4 + 7sr4 − 2 + 14sr3 + sr2 − 7r
= −2 − 7r − r4 + sr2 + 14sr3 + 7sr4 . 4
It is not uncommon to use the group itself as the indexing set for the coefficients of the terms.
Hence, we often denote a generic group ring element as
X
α= ag g.
g∈G
In the proof of the following proposition, establishing associativity is the most challenging part.
We have used the above notation combined with iterated sums. The notation (x, y) : xy = g stands
for all pairs x, y ∈ G such that xy = g.
Proposition 5.2.9
Let R be a commutative ring and let G be a finite group. The set R[G], equipped with
addition and multiplication as defined above, is a ring and is called the group ring of R
and G. Furthermore, R is a subring of R[G].
Proof. If |G| = n, then the group (R[G], +) is isomorphic as a group to the direct sum of (R, +)
with itself n times. Hence, (R[G], +) is an abelian group. We need to prove that multiplication is
associative and
P that multiplication
P is distributive
Pover the addition.
Let α = g∈G ag g, β = g∈G bg g, and γ = g∈G cg g be three elements in R[G]. Then
X X X X X
(αβ)γ = ax bz g γ = ax by cz h
g∈G (x,y): xy=g h∈G (g,z): gz=h (x,y): xy=g
X X X X
= ax by cz h = ax by cz h
h∈G (x,y,z): (xy)z=h h∈G (x,y,z): x(yz)=h
X X X X X
= ax by c z h = α by cz g 0
h∈G (x,g 0 ): xg 0 =h (y,z): yz=g 0 g 0 ∈G (y,z): yz=g 0
= α(βγ).
This proves associativity of multiplication. Also
X X X X
α(β + γ) = ax (by + cy ) g = (ax by + ax cy ) g
g∈G (x,y): xy=g g∈G (x,y): xy=g
X X X
= a x by + a x cy g
g∈G (x,y): xy=g (x,y): xy=g
X X X X
= ax by g + ax cy g
g∈G (x,y): xy=g g∈G (x,y): xy=g
= αβ + αγ.
5.2. RINGS GENERATED BY ELEMENTS 223
This establishes left-distributivity. Right-distributivity is similar and establishes that R[G] is a ring.
The subset {r · 1 ∈ R[G] | r ∈ R} is a subring that is equal to R.
Note that even if R is commutative, R[G] is not necessarily commutative. Most of the examples
of rings introduced so far in the text have been commutative rings. The construction of group rings
gives a wealth of examples of noncommutative rings.
Example 5.2.10. Consider the group ring (Z/3Z)[S3 ]. The elements in (Z/3Z)[S3 ] are formal sums
α = a1 + a(12) (12) + a(13) (13) + a(23) (23) + a(123) (123) + a(132) (132),
where each aσ ∈ Z/3Z. Since there are 3 options for each ai , there are 36 elements in this group
ring. As a simple illustration of some properties of elements, we point out that (Z/3Z)S3 contains
zero divisors. For example,
The ring has the identity element 1, which really is 1·1 and it contains units that are not the identity
since 1(12) · 1(12) = 1 as an operation in the group ring. 4
Proposition 5.2.11
Let R be a commutative ring with an identity 1 6= 0 and let G be a group. Then the
element 1 · 1G is the identity in R[G] and G is a subgroup of U (R[G]).
Proposition 5.2.11 along with Proposition 5.2.9 together show that the group ring R[G] is a ring
that includes the ring R as a subring and the group G as a subgroup of U (R[G]). This observation
shows in what sense R[G] is a ring generated by R and G.
Proposition 5.2.12
Let G be a finite group with |G| > 1 and R a commutative ring with more than one element.
Then R[G] always has a zero divisor.
Proof. Let r ∈ R − {0} and suppose that the element in g ∈ G has order m > 1. Then
(r − rg)(r + rg + rg 2 + · · · + rg m−1 )
= r2 + r2 g + r2 g 2 + · · · + r2 g m−1 − (r2 g + r2 g 2 + r2 g 3 + · · · + r2 g m )
= r2 − r2 g m = r2 − r2 = 0.
Among the examples of rings we have encountered so far, group rings are likely the most abstract.
They do not, in general, correspond to certain number sets, modular arithmetic, polynomials, func-
tions, matrices, or any other mathematical object naturally encountered so far. However, as with
any other mathematical object, one develops an intuition for it as one uses it and finds applications.
In Section 5.4.3, we will introduce a more general construction that simultaneously subsumes
polynomial rings and group rings. This leads to a definition for a group ring R[G] with an arbitrary
ring R and a group G that are not necessarily finite.
224 CHAPTER 5. RINGS
with coefficients in R. In R[[x]], we do not worry about issues of convergence. Addition of power
series is performed term by term and for multiplication
∞
! ∞ ! ∞ n
X n
X n
X X X
an x bn x = cn xn where cn = ak bn−k = ai bj .
n=0 n=0 n=0 k=0 i+j=n
5.3. MATRIX RINGS 225
(a) Prove that R[[x]] with the addition and the multiplication defined above is a commutative ring.
(b) Suppose that R has an identity 1 6= 0. Prove that 1 − x is a unit.
(c) Prove that a power series of ∞ n
P
n=0 an x is a unit if and only if a0 is a unit.
20. Consider the power series ring Q[x]. (See Exercise 5.2.19.)
(a) P
Suppose that c0 is a nonzero square element. Prove that that there exists a power series
∞ n
n=0 an x such that !2
X∞ X∞
n
an x = cn xn .
n=0 n=0
21. Let R be a ring and let X be a set. Prove that the set Fun(X, R) of functions from X to R is a ring
with the addition and multiplication of functions defined by
def
(f1 + f2 )(x) = f1 (x) + f2 (x)
def
(f1 f2 )(x) = f1 (x)f2 (x).
22. Let R be a ring and let X be a set. The support of a function f ∈ Fun(X, R) is the subset
Consider the subset Funf s (X, R) of functions in Fun(X, R) that are of finite support, i.e., that are
0 except on a finite subset of X. Prove that Funf s (X, R) is a subring of Fun(X, R) as defined in
Exercise 5.2.21.
5.3
Matrix Rings
This section introduces an important family of examples of noncommutative rings, that of matrix
rings.
In linear algebra courses, students usually encounter matrices as representing linear transforma-
tions with respect to certain bases. The product of two matrices is defined as the matrix representing
the composition of the linear transformations. Since the composition of functions is always asso-
ciative (see Proposition 1.1.15), it follows that the product of matrices over the real (or complex)
numbers is associative. In order to prove that the multiplication in Mn (R) is associative, we can
only use (5.2) as the definition.
Let R be any ring. Let A = (aij ), B = (bij ), and C = (cij ) be matrices in Mn (R). Then the
(i, j)th entry of (AB)C is
n n
! n Xn n Xn
X X X X
aik bk` c`j = aik bk` c`j = aik bk` c`j
`=1 k=1 `=1 k=1 k=1 `=1
n n
!
X X
= aik bk` c`j .
k=1 `=1
This is the (i, j) entry of A(BC). Hence, (AB)C = A(BC) and matrix multiplication is associative.
The (i, j)th entry of A(B + C) is
n n n
! n
!
X X X X
aik (bkj + ckj ) = (aik bkj + aik ckj ) = aik bkj + aik ckj .
k=1 k=1 k=1 k=1
This is the (i, j)th entry of AB +AC so A(B +C) = AB +AC. This shows the matrix multiplication
is left-distributive over addition. Right-distributivity is proved in a similar way. We have proven
the key theorem of this section.
Proposition 5.3.1
The set Mn (R) equipped with the operations of matrix addition and matrix multiplication
is a ring.
In a first linear algebra course, students encounter matrices with real or complex coefficients. As
one observes with Mn (R), the multiplication in Mn (R) is not commutative even if R is. With a
little creativity, we can think of all manner of matrix rings. Consider for example,
M2 (Z/2Z); Mn (Z[x]); Mn (Z); Mn (C 0 ([0, 1], R)); or Mn (H).
Example 5.3.2. As an example of a matrix product in Mn (R) where R is not commutative, consider
the following product in M2 (H).
i 1 + 2j i+j k i(i + j) + (1 + 2j)(2 + i) ik + (1 + 2j)(2i − j)
=
i−k 3k 2 + i 2i − j (i − k)(i + j) + 3k(2 + i) (i − k)k + 3k(2i − j)
1 + i + 4j − k 2 + 2i − 2j − 4k
= . 4
−1 + i + 2j + 7k 1 + 3i + 5j
Rings of square matrices Mn (R) naturally contain many subrings. We mention a few here but
leave the proofs as exercises. Given any ring R, the following are subrings of Mn (R):
• Mn (S), where S is a subring of R;
• the set of upper triangular matrices;
• the set of lower triangular matrices;
• the set of diagonal matrices.
In Section 10.2, we will review general vector spaces over any field. However, we already point out
that most of the algorithms introduced in linear algebra—Gauss-Jordan elimination, various matrix
factorizations, and so on—can be applied without any modification to Mn (F ), where F is a field.
However, when R is a general ring, because nonzero elements might not be invertible and elements
might not commute, some algorithms are no longer guaranteed to work and some definitions no
longer make sense.
5.3. MATRIX RINGS 227
Definition 5.3.3
Let R be a ring. We denote by GLn (R) the group of units U (Mn (R)) and call it the general
linear group of index n on the ring R.
This definition gives meaning to groups such as GLn (Z/kZ) or GLn (Z) but also general linear
groups over noncommutative rings, such as GLn (H).
5.3.3 – Determinants
If a ring R is commutative it is possible to define the determinant and recover some of the properties
of determinants we encounter in linear algebra. The propositions are well-known but the proofs
given in linear algebra sometimes rely on the ring of coefficients being in a field. For completeness,
we give proofs for the context of arbitrary rings.
Note, throughout this discussion on determinants, we assume that the ring R is commutative.
Definition 5.3.4
If R is a commutative ring, then we define the determinant as a function det : Mn (R) → R
defined on a matrix A = (aij ) ∈ Mn (R) by
X
det A = (sign σ)a1σ(1) a2σ(2) · · · anσ(n) . (5.3)
σ∈Sn
det A = 2 × 0 × 1 + 3 × 3 × 4 + 5 × 1 × 2 − 3 × 1 × 1 − 5 × 0 × 4 − 2 × 3 × 2
= 0 + 0 + 4 − 3 − 0 − 0 = 1.
In the above calculation, the products correspond (in order) to the following permutations: 1, (1 2 3),
(1 3 2), (1 2), (1 3), and (2 3). 4
Definition 5.3.4 is called the Leibniz formula for the determinant. Many courses on linear algebra
first introduce the determinant via the Laplace expansion. As we will see shortly, the two defini-
tions are equivalent. The Leibniz definition for the determinant leads immediately to the following
important properties of determinants.
228 CHAPTER 5. RINGS
Proposition 5.3.6
Let τ ∈ Sn be a permutation. If A0 is the matrix obtained from A by permuting the rows
(respectively the columns) of A according to the permutation τ , then
Proof. We prove the proposition first for permutations of the rows. By the Leibniz definition,
X
det A0 = (sign σ)aτ (1)σ(1) aτ (2)σ(2) · · · aτ (n)σ(n) .
σ∈Sn
Since R is commutative, we can permute the terms in each product so that the row indices are in
increasing order. This amounts to reordering the product according to the permutation τ −1 . Hence
For any fixed τ , as σ runs through all permutations, so does στ −1 . Hence, we have
X
det A0 = (sign(στ −1 ))(sign(τ ))a1σ(τ −1 (1)) a2σ(τ −1 (2)) · · · anσ(τ −1 (n))
σ∈Sn
X
= (sign τ ) (sign σ 0 )a1σ0 (1) a2σ0 (2) · · · anσ0 (n) = (sign τ )(det A).
σ 0 ∈Sn
By a similar reasoning as for the rows, it again follows that det A0 = (sign τ )(det A).
Proposition 5.3.7
For all A ∈ Mn (R), if A> denotes the transpose of A, then det(A> ) = det(A).
Proof. By definition, X
det(A> ) = (sign σ)aσ(1)1 aσ(2)2 · · · aσ(n)n .
σ∈Sn
We recall that sign(σ −1 ) = sign σ. Since R is commutative, by permuting the coefficients in each
product so that the row index is listed in sequential order, we have
X
det(A> ) = (sign σ)a1σ−1 (1) a2σ−1 (2) · · · anσ−1 (n)
σ∈Sn
X
= (sign σ −1 )a1σ−1 (1) a2σ−1 (2) · · · anσ−1 (n) .
σ∈Sn
However, the inverse function on group elements is a bijection Sn → Sn so as σ runs through all the
permutations in Sn , the inverses σ −1 also run through all the permutations. Hence,
X
det(A> ) = (sign σ 0 )a1σ0 (1) a2σ0 (2) · · · anσ0 (n) = det(A).
0
σ ∈Sn
Note that neither Proposition 5.3.6 nor 5.3.7 would hold if R were not commutative. Commu-
tativity is also required for the following theorem. Though the Leibniz formula could be used for a
definition of the determinant for a matrix with coefficients in a noncommutative ring, many if not
5.3. MATRIX RINGS 229
most of the usual properties we expect for determinants would not hold. This is why we typically
only consider the determinant function on matrix rings over a commutative ring of coefficients.
Theorem 5.3.8
Let R be a commutative ring, n a positive integer, and let A ∈ Mn (R). Denote by Aij the
submatrix of A obtained by deleting the ith row and the jth column of A. For each fixed i,
n
X
det A = (−1)i+j aij det(Aij ) (5.4)
j=1
Formula (5.4) is called the Laplace expansion about row i and (5.5) is called the Laplace
expansion about column j.
Proof. Fix an integer i with 1 ≤ i ≤ n. Break the sum in (5.3) by factoring out each matrix entry
with a row index of i. Then (5.3) becomes
n ij a removed
X X z }| {
det A = aij (sign σ)a 1σ(1) a2σ(2) · · · anσ(n) .
j=1 σ∈Sn
σ(i)=j
In the product inside the nested summation, all terms with row index i and with column index j
have been removed. Consequently, the inside summation resembles the Leibniz formula (5.3) of the
submatrix Aij , though we do not know if the sign of the permutation σ corresponds to that required
by (5.3).
Let σ ∈ Sn with σ(i) = j. Then the permutation
leaves n fixed but has the same number of inversions as σ does if we remove i from the domain
{1, 2, . . . , n} of σ and remove j from the codomain. Since the sign of an m-cycle is (−1)m−1 , then
sign σij = (−1)n−i (sign σ)(−1)n−j = (−1)2n−i−j (sign σ) = (−1)i+j (sign σ).
Another property that follows readily from the Leibniz formula is that the determinant is linear
by row and, by virtue of Proposition 5.3.7, linear by column as well. (See Exercise 5.3.11.) This
property inspires us to consider other functions F : Mn (R) → R that are linear in every row and
every column.
230 CHAPTER 5. RINGS
Proposition 5.3.9
A function F : Mn (R) → R is linear in every row and in every column if and only if there
exists a function f : Sn → R such that
X
F (A) = f (σ)a1σ(1) a2σ(2) · · · anσ(n) . (5.6)
σ∈Sn
Proof. Suppose first that F is linear in each row. By properties of linear transformations, if F is
linear in row 1, then
Xn
F (A) = cj1 a1j1
j1 =1
for some elements cj1 whose value may depend on the other rows. Furthermore, by picking appro-
priate values for the first row of A, we see that the functions cj1 must be linear in all the other rows.
Since each cj1 is linear in row 2, then
Xn Xn X
F (A) = dj1 ,j2 a2j2 a1j1 = dj1 ,j2 a1j1 a2j2 ,
j1 =1 j2 =1 1≤j1 ≤j2 ≤n
where dj1 ,j2 are ring elements that depend on the elements in rows 3 through n. Continuing until
row n, we deduce that if F is linear in every row, then there exists a function f : {1, 2, . . . , n}n → R
such that X
F (A) = f (j1 , j2 , . . . , jn )a1j1 a2j2 · · · anjn .
1≤j1 ≤j2 ≤···≤jn ≤n
Now if F is also linear in each column, then f (j1 , j2 , . . . , jn ) must be 0 any time two of the indices
j1 , j2 , . . . , jn are equal, because otherwise, F (A) would contain a quadratic term in one of the entries
of the matrix. This proves that there exists a function f : Sn → R that satisfies (5.6).
Conversely, regardless of the function f : Sn → R, the function F defined as in (5.6) is linear in
every row and column.
Proposition 5.3.10
Suppose that F : Mn (R) → R is a function that is linear in every row, is linear in every col-
umn, and satisfies the alternating property that if A0 is obtained from the matrix A by per-
muting the rows (or columns) according to the permutation τ , then F (A0 ) = (sign τ )F (A).
Then there exists a constant c ∈ R such that F (A) = c det(A).
Proof. According to Proposition 5.3.9, there exists a function f : Sn → R such that (5.6) holds. Call
c = f (1). According to (5.6), F (I) = f (1) = c. Consider the permutation matrix Eσ , for σ ∈ Sn ,
whose entries are (
1 if j = σ(i)
eij =
0 otherwise.
Then by the alternating property f (σ) = F (Eσ ) = (sign σ)f (I) = c(sign σ). Thus,
X
F (A) = c(sign σ)a1σ(1) a2σ(2) · · · anσ(n) = c det A.
σ∈Sn
The property described in Proposition 5.3.9 characterizes determinants. Indeed, the determinant
is the unique function Mn (R) → R that is linear in the rows, linear in the columns, satisfies the
alternating condition, and is 1 on the identity matrix. This characterization of the determinant
leads to the following important theorem about determinants.
5.3. MATRIX RINGS 231
Proposition 5.3.11
Let R be a commutative ring. Then for any matrices A, B ∈ Mn (R),
Proof. Given the matrix B, consider the function F : Mn (R) → R defined by F (A) = det(AB). For
a fixed i, suppose that we can write the ith row of the matrix A as aij = ra0ij + sa00ij with 1 ≤ j ≤ n.
We denote by A0 the matrix of A but with the ith row replaced with the row (a0ij )nj=1 and denote
by A00 the matrix of A but with the ith row replaced with the row (a00ij )nj=1 . Denote by C = AB,
C 0 = A0 B, and C 00 = A00 B. Then the ith row of C can be written as
n
X n
X
cij = aik bkj = (ra0ik bkj + sa00ik bkj )
k=1 k=1
n
! n
!
X X
=r a0ik bkj +s a00ik bkj
k=1 k=1
= rc0ij + sc00ij .
Since the determinant is linear in each row, then det C = r(det C 0 ) + s(det C 00 ). Thus, F (A) =
rF (A0 ) + sF (A00 ). Hence, F is linear in each row. By a similar reasoning, F is linear in each column.
We leave it as an exercise (Exercise 5.3.19) to prove that a function F : Mn (R) → R that
is linear in each row and linear in each column satisfies the alternating property (described in
Proposition 5.3.10) if and only if F (A) = 0 for every matrix A that has a repeated row or a repeated
column. By the definition of matrix multiplication, if A has two repeated rows, then AB also has
two repeated rows so and hence F (A) = det(AB) = 0. Hence, F satisfies the alternating property.
Consequently, by Proposition 5.3.10, F (A) = c det(A) and F (I) = c = det(B). Thus, det(AB) =
det(A) det(B).
Finally, if the ring R has an identity 1 6= 0, the determinant gives a characterization of invertible
matrices.
Proposition 5.3.12
Let R be a commutative ring with an identity 1 6= 0. A matrix A ∈ Mn (R) is a unit if and
only if det A ∈ U (R). Furthermore, the (i, j)th entry of the inverse matrix A−1 is
Proposition 5.3.11 generalizes the definition in Example 3.7.13 that discussed general linear
groups over fields. Proposition 5.3.11 is equivalent to saying that the determinant function det :
GLn (R) → U (R) is a group homomorphism. We define the kernel of the homomorphism as the
special linear group
SLn (R) = {A ∈ Mn (R) | det A = 1}.
Perform the following calculations, if they are defined: (a) A + BC; (b) ABC; (c) B n for all n ∈ N;
(d) C −1 ; (e) A−1 B.
232 CHAPTER 5. RINGS
12. Let R be a commutative ring with an identity 1 = 6 0. Let σ ∈ Sn and defined the matrix Eσ as the
n × n matrix with entries (eij ) such that
(
1 if j = σ(i)
eij =
0 otherwise.
5.4
Ring Homomorphisms
In our study of groups, we emphasize that we do not typically concern ourselves with any functions
between objects with a given algebraic structure but only functions that preserve the structure. In
the context of rings, such functions are called ring homomorphisms.
Definition 5.4.1
Let R and S be two rings. A ring homomorphism is a function ϕ : R → S satisfying
Example 5.4.2. The function ϕ : Z → Z/nZ defined simply by ϕ(a) = ā is a ring homomorphism.
This statement is simply a rephrasing of the definition of the usual operations in modular arithmetic,
namely that for all a, b ∈ Z,
a+b=a+b and ab = a b. 4
Example 5.4.4. Let R be a commutative ring. Fix an element r ∈ R. We define the evaluation
map evr : R[x] → R by
evr (an xn + · · · + a1 x + a0 ) = an rn + · · · + a1 r + a0 .
The function evr evaluates the polynomial at r. This function is a ring homomorphism. 4
Example 5.4.5. Let R be any ring. Let U2 (R) be the ring of upper triangular 2 × 2 matrices over
a ring R. Consider the function ϕ : U2 (R) → R ⊕ R defined by
a b
ϕ = (a, d).
0 d
234 CHAPTER 5. RINGS
Then
a1 b1 a2 b2 a1 + a2 b1 + b2
ϕ + =ϕ = (a1 + a2 , d1 + d2 )
0 d1 0 d2 0 d1 + d2
a b1 a b2
=ϕ 1 +ϕ 2 .
0 d1 0 d2
Furthermore,
a1 b1 a2 b2 a1 a2 a1 b2 + b1 d2
ϕ =ϕ = (a1 a2 , d1 d2 )
0 d1 0 d2 0 d1 d2
a b1 a b2
=ϕ 1 ϕ 2 .
0 d1 0 d2
Example 5.4.6. Consider the operator D : C 1 ([a, b], R) → C 0 ([a, b], R) defined by taking the
derivative D(f ) = f 0 . This is not a ring homomorphism. It is true that D(f + g) = D(f ) + D(g)
but the product rule is D(f g) = D(f )g + f D(g), which in general is not equal to D(f )D(g). 4
π(am xm + · · · + a1 x + a0 ) = am xm + · · · + a1 x + a0 .
π π
eva
(Z/nZ)[x] Z/nZ
which means that π ◦ eva = eva ◦ π. Consequently, if q(x) ∈ Z[x] is a polynomial such that π(q(x))
has no roots as a polynomial in (Z/nZ)[x], then q(x) cannot have a root in Z[x]. But to check that
π(q(x)) has no roots in (Z/nZ)[x] we simply need to test all the congruence classes in Z/nZ. 4
Now the multiplication in C is (a + bi)(c + di) = (ac − bd) + (ad + bc)i while
a −b c −d ac − bd −ad − bc
= .
b a d c ad + bc ac − bd
Since a ring homomorphism ϕ : R → S is a group homomorphism between (R, +) and (S, +),
then by Proposition 3.7.9,
• ϕ(0R ) = 0S
It is also true that for all a ∈ R and all n ∈ N∗ , ϕ(an ) = ϕ(a)n . However, it is not necessarily true
that ϕ(1R ) = 1S , even if R and S both have identities. For example, if R and S are any rings, the
function ϕ : R → S such that ϕ(r) = 0 is a homomorphism. As a nontrivial example, consider the
function f : Z → Z/6Z defined by f (a) = 3̄ā. It is not hard to check that f is a ring homomorphism
but that the image is Im f = {0̄, 3̄}.
Following terminology introduced for groups (Definition 3.7.26), we call a homomorphism of a
ring R into itself an endomorphism on R and an isomorphism of a ring onto itself an automorphism
on R.
Definition 5.4.9
Let ϕ : R → S be a ring homomorphism. The kernel of ϕ, denoted Ker ϕ, is the set of
elements of R that get mapped to 0, namely
Im ϕ = {s ∈ S | ∃r ∈ R, ϕ(r) = s}.
Proposition 5.4.10
Let ϕ : R → S be a ring homomorphism.
Proof. For part (1), let x, y ∈ Im ϕ. Then there exist a, b ∈ R such that x = ϕ(a) and y = ϕ(b).
Hence, x − y = ϕ(a) − ϕ(b) = ϕ(a − b) ∈ Im ϕ. Furthermore, xy = ϕ(a)ϕ(b) = ϕ(ab) ∈ Im ϕ. Since
Im ϕ is closed under subtraction and multiplication, it is a subring of S.
236 CHAPTER 5. RINGS
For part (2), let x, y ∈ Ker ϕ and let r ∈ R. Then ϕ(x) = ϕ(y) = 0. Consequently, ϕ(x − y) =
ϕ(x) − ϕ(y) = 0 − 0 = 0 so x − y ∈ Ker ϕ and Ker ϕ is closed under subtraction. Furthermore,
ϕ(rx) = ϕ(r)ϕ(x) = ϕ(r)0 = 0 and also ϕ(xr) = ϕ(x)ϕ(r) = 0ϕ(r) = 0 so Ker ϕ is closed under
multiplication within Ker ϕ but also closed under multiplication by any element in R.
Example 5.4.11. Consider Example 5.4.4. Let s ∈ R be any element. Then evr (x + (s − r)) = s
so evr is surjective and hence the image of evr is all of R. The kernel Ker evr , however, is precisely
the polynomials that have r as a root. 4
involves only a finite number of terms for all s ∈ S. We call this condition the convolution condition
and call the operation on functions the convolution product between f1 and f2 .
Proposition 5.4.12
Let R be a ring, let (S, ·) be a semigroup, and let F be a subring of Fun(S, R) that satisfies
the convolution condition. Then (F, +, ∗) is a ring.
Proof. Since (F, +) is an abelian group by virtue of (F, +, ×) being a ring, we only need to check
associativity of ∗ and the distributivity of ∗ over +.
Let α, β, γ ∈ F. Then for all s ∈ S, we have
!
X X X
(α ∗ (β ∗ γ)) (s) = α(s1 ) β(t1 )γ(t2 ) = α(s1 )β(t1 )γ(t2 )
s1 ·s2 =s t1 ·t2 =s2 s1 ·(t1 ·t2 )=s
!
X X X
= α(s1 )β(t1 )γ(t2 ) = α(s1 )β(t1 ) γ(t2 )
(s1 ·t1 )·t2 =s q·t2 =s s1 ·t1 =q
= ((α ∗ β) ∗ γ) (s).
Hence, α ∗ (β ∗ γ) = (α ∗ β) ∗ γ.
Also for all s ∈ S,
X
(α ∗ (β + γ)) (s) = α(s1 )(β(s2 ) + γ(s2 ))
s1 ·s2 =s
X
= (α(s1 )β(s2 ) + α(s1 )γ(s2 ))
s1 ·s2 =s
X X
= α(s1 )β(s2 ) + α(s1 )γ(s2 )
s1 ·s2 =s s1 ·s2 =s
= (α ∗ β)(s) + (α ∗ γ)(s).
Hence, α ∗ (β + γ) = α ∗ β + α ∗ γ. This proves left-distributivity of ∗ over +. The proof for
right-distributivity is similar and follows from right-distributivity of × over + in R.
5.4. RING HOMOMORPHISMS 237
Definition 5.4.13
We call the ring (F, +, ∗) a convolution ring from (S, ·) to R.
There are a few common situations in which a subring of Fun(S, R) satisfies the convolution
condition. If the semigroup S is finite, then the condition is satisfied trivially. The semigroup (N, +)
is such for all n ∈ N there is a finite number of pairs (a, b) ∈ N such that a + b = n. Hence, for any
ring R, any subring of Fun(N, R) satisfies the convolution condition.
As a third general example, we consider functions of finite support. For any function f ∈
Fun(S, R), we define the support as
In Exercise 5.2.22 we proved that the set Funf s (S, R) of functions from S to R of finite support, i.e.,
functions f ∈ Fun(S, R) such that Supp(f ) is a finite set, is a subring of Fun(S, R). Furthermore,
Funf s (S, R) satisfies the convolution condition because all the terms in the summation (5.7) are 0
except possibly for pairs (s1 , s2 ) ∈ Supp(f1 ) × Supp(f2 ), which is a finite set.
We now give some specific examples of convolution rings that we have already encountered.
Example 5.4.14 (Polynomial Rings). Let R be a commutative ring and consider the semigroup
(N, +). Note that (N, +) is isomorphic as a monoid to ({1, x, x2 , . . .}, ×). Coefficients of polynomials
are 0 except for a finite number of terms, so we consider the function ψ : R[x] → Funf s (N, R)
where ψ(a(x)) is the function that to each integer n associates the coefficient an . The function ψ is
obviously injective. Furthermore, given any f ∈ Funf s (S, R), if n = max{i | f (i) 6= 0}, then
Hence, ψ is a bijection.
Suppose that ψ(a(x)) = f so f (i) = ai and that ψ(b(x)) = g so g(i) = bi . Then
According to (5.1),
X X
ψ(a(x)b(x)) = k 7→ ai bj = k 7→ f (i)g(j) = f ∗ g = ψ(a(x)) ∗ ψ(b(x)).
i+j=k i+j=k
Hence, we have shown that R[x] is ring isomorphic to the convolution ring (Funf s (N, R), +, ∗). 4
Example 5.4.15 (Group Rings). Let G be a finite group and let R be a commutative ring. If
α = a1 g1 + a2 g2 + · · · + an gn
β = b1 g1 + b2 g2 + · · · + bn gn
This is precisely the convolution product defined in (5.7). Consequently, as rings, R[G] is isomorphic
to the convolution ring (Fun(G, R), +, ∗). 4
The above example shows that (Fun(G, R), +, ∗) gives precisely the group ring that we defined
in Section 5.2.3 for a commutative ring R and a finite group G. However, this construction can be
generalized to infinite groups and noncommutative rings. For any group G and any ring R, we call
the group ring R[G] the convolution ring (Funf s (G, R), +, ∗).
238 CHAPTER 5. RINGS
is a homomorphism.
8. Frobenius homomorphism. Let R be a commutative ring of prime characteristic p.
(a) Prove that p divides kp for all integers 1 ≤ k ≤ p − 1.
(b) Prove that the function f : R → R given by f (x) = xp is a homomorphism. In other words,
prove that
(a + b)p = ap + bp and (ab)p = ap bp .
9. Given any set S, the triple (P(S), 4, ∩) has the structure of a ring. Let S 0 be any subset of S. Show
that ϕ(A) = A ∩ S 0 is a ring homomorphism from P(S) to P(S 0 ).
10. Prove that (5Z, +) is group-isomorphic to (7Z, +) but that (5Z, +, ×) is not ring-isomorphic to
(7Z, +, ×).
11. Let U be a set. Show that (Fun(U, Z/2Z), +, ×) is isomorphic to (P(U ), 4, ∩).
12. Show that (Z ⊕ Z)[x] is not isomorphic to Z[x] ⊕ Z[x].
13. Consider the ring (R, +, ×), where R = Z/2Z × Z/2Z as a set, where + is the component-wise addition
but the multiplication is done according to the following table:
Prove that (R, +, ×) is a ring. Also prove that (R, +, ×) is not isomorphic to Z/2Z ⊕ Z/2Z.
14. Prove that Mm (Mn (R)) is isomorphic to Mmn (R).
15. Show that the function ϕ : H → M2 (C) defined by
a + bi −c − di
ϕ(a + bi + cj + dk) =
c − di a − bi
18. Let A be a matrix in M2 (R). Define the function ϕ : R[x] → M2 (R) defined by
This looks like plugging A into the polynomial except that the constant term becomes the diagonal
matrix a0 I.
1 2
(a) Only for this part, take A = . Calculate ϕ(x2 + x + 1) and ϕ(3x3 − 2x).
3 4
(b) Show that ϕ is a ring homomorphism and that Im ϕ is a commutative subring of M2 (R).
(c) For any A ∈ M2 (R), show that the characteristic polynomial fA (x) of A is in Ker ϕ.
19. Suppose that R is a ring with identity 1 6= 0. Let ϕ : R → S be a nontrivial ring homomorphism (i.e.,
ϕ is not identically 0).
(a) Suppose that ϕ(1) is not the identity element in S (in particular, if S does not contain an identity
element). Prove that ϕ(1) is idempotent.
(b) Prove that whether or not S has an identity, Im ϕ has an identity, namely 1Im ϕ = ϕ(1).
(c) Suppose that S contains an identity 1S and that ϕ(1) 6= 1S . Prove that ϕ(1) is a zero divisor.
(d) Deduce that if S is an integral domain, then ϕ(1) = 1S .
(e) Suppose that R and S are integral domains. Prove that a nontrivial ring homomorphism ϕ :
R → S induces a group homomorphism ϕ× : (U (R), ×) → (U (S), ×).
20. Show that the function f : Z/21Z → Z/21Z defined by f (ā) = 7̄ā is an endomorphism on Z/21Z.
21. Let R be a commutative ring and let R[x, y] be the polynomial ring on two variables x and y. Con-
sider the function f : R[x, y] → R[x, y] such that f (p(x, y)) = p(y, x). Prove that f is a nontrivial
automorphism on R[x, y].
22. Denote by End(R) the set of endomorphisms on R (homomorphisms from R to itself).
(a) Show that End(R) is closed under the operation of function composition.
(b) Show that End(R) is closed neither under function addition nor under function multiplication.
23. Augmentation map. Let R be a commutative ring with identity 1 6= 0 and let G be a finite group.
Define the function ψ : R[G] → R by
ψ(a1 g1 + a2 g2 + · · · + an gn ) = a1 + a2 + · · · + an .
Prove that ψ is a ring homomorphism. This function is called the augmentation map of the group
ring R[G].
24. The augmentation map for a group ring generalizes in the following way. Let R be a commutative
ring with identity 1 6= 0 and let G be a finite group. Let f : G → U (R) be a group homomorphism.
Prove that the function ψ : R[G] → R defined by
is an automorphism of R[G].
26. Let R be a ring with identity 1 6= 0 and let (S, ·) be a monoid (semigroup with an identity e). Prove
that the identity for a convolution ring (F, +, ∗) of functions from S to R is the function i : S → R
defined by (
1 if s = e
i(s) =
0 if s 6= e.
27. Let R be a ring and consider the monoid S = ({1, −1}, ×). Prove that as a set Fun(R, S) is in bijection
with R × R but that the convolution ring (Fun(R, S), +, ∗) is not isomorphic to R ⊕ R.
240 CHAPTER 5. RINGS
28. Prove that ring of formal power series (see Exercise 5.2.19) over a commutative ring R is a convolution
ring.
29. Let R be a commutative ring and consider the semigroup Z. Prove that
is a subring of Fun(Z, R) that satisfies the convolution condition. [Compare to Exercise 5.4.28. If we
view Z as isomorphic to the semigroup ({xn | n ∈ Z}, ×), then the convolution ring (F, +, ×) is called
the ring of formal Laurent series over R and is denoted by R((x)).]
5.5
Ideals
Ideals are an important class of subrings in a ring. We first encounter their importance in the next
section in reference to the construction of quotient rings, where ideals play the role in ring theory
that normal subgroups play in group theory. However, ideals possess many important properties
independent of their role in creating quotient rings.
The concept of an ideal first arose in number theory in the context of studying properties√of
integer extensions, i.e., certain subrings of C that contain Z. Gaussian integers and rings like Z[ 3 2]
are some examples. In integer extensions, numbers do not always have certain desired divisibility
properties but ideals do. (This is a result of Dedekind’s Theorem from algebraic number theory,
a topic beyond the scope of this book but easily accessible with the preparation that this book
provides.) This motivated the term “ideal.” It also turns out that ideals play a pivotal role in
algebraic geometry, a branch of mathematics where the tools of abstract algebra are brought to bear
on the study of geometry. (See Section 12.8.)
5.5.1 – Ideals
Recall that given any subset S in a ring R, if r ∈ R, then the notation rS denotes the set rS =
{rs | s ∈ S} and the notation Sr denotes the set Sr = {sr | s ∈ S}.
Definition 5.5.1
Let R be a ring and let I be any subset of R.
A ring always contains at least two ideals, the subset {0} and itself. The ideal {0} is called the
trivial ideal. Any ideal I ( R is called a proper ideal.
In light of the One-Step Subgroup Criterion, the definition of an ideal can be restated to say that
an ideal of a ring R is a nonempty subset I ⊆ R that is closed under subtraction and closed under
multiplication by any element in R (i.e., ra and ar are in I for all a ∈ I and r ∈ R).
If R is commutative, then left and right ideals are equivalent and hence are also two-sided ideals.
Hence, the distinction between left and right ideals only occurs in noncommutative rings.
Though this does not illustrate the full scope of possible properties for ideals, it is important to
keep as a baseline reference what ideals are in Z.
5.5. IDEALS 241
Example 5.5.2 (Ideals in Z). We claim that all ideals in Z are of the form nZ, where n is a
nonnegative integer. The subset {0} = 0Z is an ideal. Otherwise, let I be an ideal in Z and let n be
the least positive integer in I. This exists by virtue of the Well-Ordering Principle of Z.
Now let m be any integer in I. Integer division of m by n gives m = nq + r for some integer q
and some remainder 0 ≤ r < n. However, since I is closed under multiplication by any element in
the ring, then nq ∈ I and since I is closed under subtraction, r = m − nq ∈ I. Since n is the least
positive element in I and since r is nonnegative with r < n, then r = 0. We conclude that m = qn
and so every element in I is a multiple of n. Consequently, I = nZ. 4
The style of proof in the above example (namely referring to an element that is minimal in some
way) is not uncommon in ring theory. However, it will not apply in all rings. Indeed, the notion
of minimality derives from the partial order ≤ on Z. In contrast, many rings do not possess such a
partial order.
We now give an example where left and right ideals are not necessarily equal.
Example 5.5.3. Let R = M2 (Z) and consider the subset
n a a o
I= | a, c ∈ Z .
c c
so A − B ∈ I. Furthermore,
c11 c12 a11 a11 c11 a11 + c12 a21 c11 a11 + c12 a21
CA = =
c21 c22 a21 a21 c21 a11 + c22 a21 c21 a11 + c22 a21
Example 5.5.4. We consider a few ideals and nonideals in the polynomial ring R = R[x].
Let I1 be the set of polynomials whose nonzero term of lowest degree has degree 3 or greater.
Let a(x), b(x) be two polynomials. If ai = bi = 0 for i = 0, 1, 2, then (ai − bi ) = 0 for i = 0, 1, 2 so
a(x), b(x) ∈ I1 implies that a(x) − b(x) ∈ I1 . Now suppose that a(x) ∈ I1 and b(x) is arbitrary, then
the first few coefficients of the terms in a(x)b(x) are
Since ai = 0 for i = 0, 1, 2, then the terms shown above are 0 and hence a(x)b(x) ∈ I1 . Hence, I1 is
a right ideal and hence is an ideal since R[x] is commutative.
Let I2 be the set of polynomials p(x) such that p(2) = 0 and p0 (2) = 0. Let a(x), b(x) ∈ I2 . Then
a(2) − b(2) = 0 and
d
(a(x) − b(x)) = a0 (2) − b0 (2) = 0.
dx x=2
Thus, I2 is closed under subtraction. Now let a(x) ∈ I2 and let p(x) be an arbitrary real polynomial.
Then a(2)p(2) = 0p(2) = 0 and
d
(a(x)p(x)) = a0 (2)p(2) + a(2)p0 (2) = 0p(2) + 0p0 (2) = 0.
dx x=2
Hence, a(x)p(x) ∈ I2 and we conclude that I2 is an ideal. The ideal I2 corresponds to polynomials
that have a double root at 2. Figure 5.3 shows the graphs of just a few such polynomials.
242 CHAPTER 5. RINGS
Figure 5.3: Some polynomials in the ideal {p(x) ∈ R[x] | p(2) = p0 (2) = 0}
In contrast, let S1 be the subset of polynomials whose degree is 4 or less. It is true that S1 is
closed under subtraction but it is not closed under multiplication (e.g., x × x4 = x5 ∈ / S1 ) so it is
not a subring and hence is not an ideal.
As another nonexample, let S2 be the subset of polynomials whose terms of odd degree are 0.
S2 is closed under subtraction and under multiplication and hence S2 is a subring. However, with
x ∈ R[x] and x2 ∈ S2 , the product x × x2 = x3 ∈/ S2 so S2 is not an ideal. 4
Proposition 5.4.10 already provided an important class of ideals but we restate the result here.
Proposition 5.5.5
Let ϕ : R → S be a ring homomorphism. Then Ker ϕ is an ideal of R.
Example 5.5.6. Let R be a commutative ring with a 1 6= 0 and let G be a finite group with
G = {g1 , g2 , . . . , gn }. Consider the group ring R[G] and consider also the subset
I = {a1 g1 + a2 g2 + · · · + an gn | a1 + a2 + · · · + an = 0}.
This subset I is an ideal by virtue of the fact that it is the kernel of the augmentation map defined
in Exercise 5.4.23. 4
Definition 5.5.7
Let A be a subset of a ring R.
(1) By the notation (A) we denote the smallest (by inclusion) ideal in R that contains
the set A. We say that the ideal (A) is generated by A.
(2) An ideal that can be generated by a single element set is called a principal ideal.
(3) An ideal that is generated by a finite set A is called a finitely generated ideal.
5.5. IDEALS 243
The student should note that when we refer to the subset (A) of R it is by definition an ideal
and hence there is no need to prove that it is.
The above notation, however, is not explicit since it does not directly offer a means of determining
all elements in (A). There is an explicit way to create an ideal from a subset A ⊂ R. In order to
describe it, we define the following sets.
Definition 5.5.8
Given a ring R and a subset A we can create some ideals from A.
(1) RA denotes the subset of finite left R-linear combinations of elements in A, i.e.,
RA = {r1 a1 + r2 a2 + · · · + rn an | ri ∈ R and ai ∈ A}.
(2) AR denotes the subset of finite right R-linear combinations of elements in A, i.e.,
AR = {a1 r1 + a2 r2 + · · · + an rn | ri ∈ R and ai ∈ A}.
(3) RAR denotes the subset of finite dual R-linear combinations of elements in A, i.e.,
RAR = {r1 a1 s1 + r2 a2 s2 + · · · + rn an sn | ri , si ∈ R and ai ∈ A}.
Proposition 5.5.9
Let R be a ring and let A be any subset.
(1) RA is a left ideal, AR is a right ideal, and RAR is a two-sided ideal.
Proof. For part (1), let r1 a1 + r2 a2 + · · · + rm am and r10 a01 + r20 a02 + · · · + rn0 a0n be two elements of
RA. Then their difference
r1 a1 s1 + r2 a2 s2 + · · · + rm am sm = r1 s1 a1 + r2 s2 a2 + · · · + rm sm am ,
244 CHAPTER 5. RINGS
so RAR ⊆ RA. However, since R has an identity, by setting si = 1 for i = 1, 2, . . . , m, we obtain all
elements in RA as elements in RAR. Hence, RA ⊆ RAR and thus we have RA = AR = RAR.
I = {sm + tn | s, t ∈ Z}.
By Proposition 2.1.12, I = (d) where d = gcd(m, n). Using this result, an induction argument shows
that every finitely generated ideal in Z can be generated by a single element. However, in arbitrary
rings there do exist ideals that are not finitely generated. The above induction argument would not
be sufficient to conclude that all ideals in Z are generated by a single element. On the other hand,
Example 5.5.2 gave a different argument that did prove that all ideals in Z are principle. 4
Example 5.5.11. Consider the ideal I = (5, x2 − x − 2) in Z[x]. This ideal consists of polynomial
linear combinations
As in Example 5.5.10, just because the ideal is expressed using two generators does not mean that
two generators are necessary. Assume that I = (a(x)) for some polynomial a(x). Then since 5 ∈ I,
we have a(x)r(x) = 5 for some r(x) ∈ Z[x]. Hence, by degree considerations, deg a(x) = 0 and thus
a(x) must be either 1 or 5. Now x2 − x − 2 has coefficients that are not multiplies of 5 so since
x2 − x − 2 ∈ I, we cannot have I = (5). Hence, if I is a principal ideal, then I = (1) = Z[x].
However, we notice that every polynomial r(x) in the form of (5.8) has the property that r(2) is a
multiple of 5. This is not the case for all polynomials in Z[x]. Hence, I 6= Z[x]. The assumption
that I is principle leads to a contradiction, so I does require two generators. 4
The following examples illustrate how the conditions in the various parts of Proposition 5.5.9 are
required for the result to hold.
Example 5.5.12. Let R be the ring 2Z and consider the subset A = {4}. The ideal (4) = 4Z
whereas RA = AR = 8Z and RAR = 16Z. 4
Example 5.5.13. Consider the the ring R = M2 (Z), which is a ring with an identity 1 6= 0 but a
noncommutative ring. Let A be the matrix
0 1
a= .
0 0
The notation Ra means R{a} and consists of just left multiples ba of a. (The linear combinations
collapse to just one multiple of a.) Also aR consists of just right multiples of A. However, by
properties of ranks of matrices, since rank a = 1, the rank of any multiple of a (left or right)
can be at most 1. Hence, Ra and aR are strict subsets of R. From Proposition 5.5.9(2), we
know that RaR = (a). However, it can be shown (see Exercise 5.5.9) that (a) = M2 (Z) so that
Ra = aR 6= (a) = RaR. 4
Proposition 5.5.14
Let R be a ring with an identity 1 6= 0. An ideal I is equal to R if and only if I contains a
unit.
Proof. Suppose that I = R. Then I contains 1, which is a unit. Conversely, suppose that I
contains a unit u. Suppose that v is the inverse of u. Now let r ∈ R be any element. Then
r = r(vu) = (rv)u ∈ I. Thus, R ⊆ I and hence R = I.
5.5. IDEALS 245
Proposition 5.5.15
Let R be a commutative ring with an identity 1 6= 0. R is a field if and only if its only
ideals are (0) and (1).
Definition 5.5.16
A principal ideal domain is an integral domain R in which every ideal is principal. We
often abbreviate the name and call such a ring a PID.
Definition 5.5.17
Let I, J be ideals in a ring R. We define the following operations on ideals:
(1) The sum, I + J = {a + b | a ∈ I and b ∈ J}.
(2) The product, IJ consists of finite sums of elements ai bi with ai ∈ I and bi ∈ J. Thus,
IJ = {a1 b1 + a2 b2 + · · · + an bn | ai ∈ I, bi ∈ J}.
Proposition 5.5.18
Let I, J be ideals in a ring R. Then I + J, IJ, and I ∩ J are ideals of R. Furthermore, the
ideals satisfy the following containment relations.
I
⊆ ⊆
IJ ⊆ I ∩J I +J
⊆ ⊆
J
Proof. Let a1 b1 + a2 b2 + · · · + am bm and a01 b01 + a02 b02 + · · · + a0n b0n be two elements in the IJ. Then
their difference
But for all i = 1, 2, . . . , m we have tai ∈ I since I is an ideal and bi t ∈ J since J is an ideal. Hence,
multiplying any element in IJ by any element in R produces an element in IJ. Thus, IJ is an ideal
of R.
(We leave the proof that I + J and that I ∩ J are ideals as an exercise. See Exercise 5.5.25.)
Some containments are obvious, namely, I ∩ J ⊆ I and I ∩ J ⊆ J. Since 0 is an element of
every ideal, I ⊆ I + J and J ⊆ I + J. Finally, let a ∈ I and b ∈ J. Then ab ∈ I because I is an
ideal and a ∈ I but also ab ∈ J because J is an ideal and b ∈ J. Thus, in a linear combination
a1 b1 + a2 b2 + · · · + an bn every product ai bi ∈ I ∩ J and hence the full linear combination is in I ∩ J.
We conclude that IJ ⊆ I ∩ J.
Example 5.5.19. Let R = Z and let I = 12Z and J = 45Z. The ideal operations listed in
Definition 5.5.17 are
We observe that IJ = (12 × 45), I ∩ J = (lcm(12, 45)), and I + J = (gcd(12, 45)). Consequently, the
operations on ideals directly generalize the notions of product, least common multiple, and greatest
common divisor. 4
Since ideals generalize the notion of least common multiple and greatest common divisor, we
should have an equivalent concept for relatively prime. Recall that a, b ∈ Z are called relatively
prime if gcd(a, b) = 1. In ring theory, the generalized notion is similar.
Definition 5.5.20
Let R be a ring with an identity 1 6= 0. Two ideals I and J of R are called comaximal if
I + J = R.
Note that this definition only applies when a ring has an identity.
For example, (2) and (3) are comaximal in Z. As another example, consider the ideals (x2 ) and
(x + 1). We have (x2 ) + (x + 1) = (x2 , x + 1). But (1 − x)(x + 1) + 1x2 = 1 is in (x2 , x + 1) and
hence, (x2 , x + 1) = Z[x]. Thus, (x2 ) and (x + 1) are comaximal ideals.
It is easy to give generating sets of I + J and IJ from generating sets of I and J. The proofs of
these claims are left as exercises but we mention them here because of their importance. Suppose
that I and J are generated by certain finite sets of elements, say I = (a1 , a2 , . . . , am ) and J =
(b1 , b2 , . . . , bn ). Then I + J = (a1 , a2 , . . . , am , b1 , b2 , . . . , bn ) and the set IJ is generated by the set
{ai bj | i = 1, 2, . . . , m, j = 1, 2, . . . , n}.
We conclude the section by mentioning two other operations on ideals, which, in the context of
commutative rings, produce other ideals. The exercises explore some questions and examples related
to these operations.
Definition 5.5.21
√
Let I be an ideal in a commutative ring R. The radical ideal of I, denoted by I, is
√
I = {r ∈ R | rn ∈ I for some n ∈ N∗ }.
Definition 5.5.22
Let I and J be ideals in a commutative ring R. The fraction ideal of I by J, denoted
(I : J) is the subset
(I : J) = {r ∈ R | rJ ⊆ I}.
5.5. IDEALS 247
17. Show that the ideal (2, x) in Z[x] is not a principal ideal. Conclude that Z[x] is not a PID.
18. Consider the ideal I = (13x + 16y, 11x + 13y) in the ring Z[x, y].
(a) Prove that I = (x − 2y, 3x + y). [Hint: By mutual inclusion.]
(b) Prove that (7x, 7y) ⊆ I but prove that this inclusion is strict.
19. Prove that Q[x, y] is not a PID.
20. In the ring R[x, y], let I = (ax + by − c, dx + ey − f ) where a, b, c, d, e, f ∈ R.
(a) Prove that if the lines ax + by = c and dx + ey = f intersect in a single point (r, s), prove that
I = (x − r, y − s).
(b) Prove that if the lines ax + by = c and dx + ey = f are parallel, then I = R[x, y].
(c) Prove that if the lines ax + by = c and dx + ey = f are the same, then I = (ax + by − c).
21. Consider the ideal I = (x, y) in the ring R[x, y]. Find a generating set for I k and show that I k requires
a minimum of k + 1 generators.
22. Let ϕ : R → S be a ring homomorphism.
(a) Show that if J is an ideal in S, then ϕ−1 (J) is an ideal in R.
(b) Show that if I is an ideal in R, then ϕ(I) is not necessarily an ideal in S.
(c) Show that if ϕ is surjective and I is an ideal of R, then ϕ(I) is an ideal of S.
23. Suppose that R is a commutative ring with 1 6= 0. Prove that R is a field if and only if its only ideals
are (0) and (1).
24. Suppose that R is a commutative ring with 1 6= 0. Prove that a principal ideal (a) = R if and only
if a is a unit. [Exercise 5.5.9 gives an example of a noncommutative ring where this result does not
hold.]
25. Let I and J be ideals of a ring R. Prove that: (a) I + J is an ideal of R and (b) I ∩ J is an ideal of R.
26. Let C be an arbitrary (not necessarily finite) collection of ideals of a ring R. Prove that
\
I
I∈C
is an ideal of R.
27. Suppose that I and J are ideals in a ring R that are generated by certain finite sets of elements, say
I = (a1 , a2 , . . . , am ) and J = (b1 , b2 , . . . , bn ).
(a) Prove that I + J = (a1 , a2 , . . . , am , b1 , b2 , . . . , bn ).
(b) Prove that IJ is generated by the set of mn elements, {ai bj | i = 1, 2, . . . , m, j = 1, 2, . . . , n}.
28. Let I and J be ideals of a ring R. Prove that I ∪ J is not necessarily an ideal of R.
29. Let I, J, K be ideals of a ring R. Prove that:
(a) I(J + K) = IJ + IK;
(b) I(JK) = (IJ)K.
30. Let I1 ⊆ I2 ⊆ · · · ⊆ Ik ⊆ · · · be a chain (by in the partial order of inclusion) of ideals in a ring R.
Prove that
∞
[
Ik
k=1
is an ideal in R.
31. Let R be a commutative ring and let I be an ideal of R. Prove that the subset
{an xn + · · · + a1 x + a0 ∈ R[x] | ak ∈ I k }
I1 I2 · · · In = I1 ∩ I2 ∩ · · · ∩ In .
34. Let R be a commutative ring and let I be an ideal in R. Prove that the radical defined in Defini-
tion 5.5.21 is an ideal.
p p p
35. Let R = Z. Calculate the following radical ideals: (a) (72); (b) (105); (d) (243).
√ √
36. Let I be an ideal in a commutative ring R. Prove that the radical of I is again I.
p
37. Let R be a commutative ring. Show that the set NR of nilpotent elements is equal to the ideal (0).
[For this reason, the subring of nilpotent elements in a commutative ring is often called the nilradical
of R.]
38. In the ring Z, prove the following fraction ideal equalities.
(a) ((2) : (0)) = Z
(b) ((24) : (4)) = (6)
(c) ((17) : (15)) = (17)
5.6
Quotient Rings
The introduction to Chapter 4 motivated the construction of quotient groups with modular arith-
metic. However, modular arithmetic has a ring structure and (Z/nZ, +, ×) is an example of a
quotient ring. This section parallels the discussion for quotient groups and introduces the quotient
object construction in the algebraic structure of rings.
According to Proposition 4.3.2, since ∼ behaves well with respect to +, then the equivalence
classes of ∼ are the cosets of a subgroup (A, +) of (R, +) where A is a normal subgroup. However,
since (R, +) is an abelian group, then any subgroup A ≤ R is normal. However, not all subgroups
(A, +) of the additive group (R, +) are such that the equivalence classes associated to the cosets of
A behave well with respect to ×.
Proposition 5.6.1
Let (R, +, ×) be a ring. An equivalence relation ∼ on R behaves well with respect to +
and × if and only if the equivalence classes of ∼ are the cosets of an ideal I.
Proof. We already know that ∼ behaves well with respect to + if and only if the equivalence classes
of ∼ are the cosets of some subgroup I of (R, +). Consequently, r1 ∼ r2 if and only if r2 − r1 ∈ I.
250 CHAPTER 5. RINGS
Suppose that ∼ behaves well with respect to + and ×. Let r1 ∼ r2 so that r2 − r1 = a ∈ I and
let s1 ∈ R be arbitrary. Since s1 ∼ s1 , then r1 s1 ∼ r2 s1 so
Thus, I is closed by multiplication on the right by any element in R. Similarly, let s1 ∼ s2 so that
s2 − s1 = a0 ∈ I and let r1 ∈ R be arbitrary. Since r1 ∼ r1 , then r1 s1 ∼ r1 s2 so
r1 s2 − r1 s1 = r1 (s2 − s1 ) = r1 a0 ∈ I.
Thus, I is closed by multiplication on the left by any element in R. We have shown that I must be
an ideal of R.
Conversely, suppose that ∼ is the equivalence relation on R whose equivalence classes are the
cosets r + I of some ideal I in R. We already know that ∼ behaves well with respect to +. Let
r1 , r2 , s1 , s2 ∈ R with r1 + I = r2 + I and s1 + I = s2 + I. Then r2 − r1 = a ∈ I and s2 − s1 = a0 ∈ I.
So,
r2 s2 − r1 s1 = (r1 + a)(s1 + a0 ) − r1 s1 = r1 a0 + as1 + aa0 .
By the properties of ideals, since a, a0 ∈ I, then r1 a0 , as1 , aa0 ∈ I and hence r1 a0 + as1 + aa0 ∈ I.
Thus, ∼ behaves well with respect to ×.
Proposition 5.6.2
The cosets in the additive quotient R/I form a ring with + and × defined by
def def
(a + I) + (b + I) = (a + b) + I and (a + I) × (b + I) = (ab) + I. (5.9)
Proof. From the quotient group construction in group theory, we already know that (R/I, +) is an
abelian group.
def
The main part of the proof is to check that (a + I) × (b + I) = (ab) + I is well-defined. Since ×
in R behaves well with respect to the equivalence relation created by the additive cosets of I, then
r1 + I = r2 + I and s1 + I = s2 + I. From Proposition 5.6.1, we deduce that (r1 s1 ) + I = (r2 s2 ) + I.
def
Now that we know that (a + I) × (b + I) = (ab) + I is well-defined, then using the right-hand
side of this expression, it follows easily that in R/I, the operation × is associative and distributive
over +.
Proposition 5.6.1 establishes not only that cosets of an ideal I in a ring R determine an equivalence
relation that behaves well with respect to the operations of R, but more importantly that this
is the only type of equivalence relation on R that behaves well with respect to the operations.
Proposition 5.6.2 now justifies the central definition of this section.
Definition 5.6.3
Let R be a ring and let I be a (two-sided) ideal in R. The ring R/I with addition and
multiplication defined in Proposition 5.6.1 is the quotient ring of R with respect to I.
We point out one minor technicality in the notation. As subsets of R, the set (a + I) × (b + I)
is not necessarily equal to the subset (ab) + I in R. As subsets of R, we can only conclude that
(a + I) × (b + I) ⊆ (ab) + I and this inclusion is often proper. For example, let R = Z, let I = 11Z
and consider the cosets 2 + 11Z and 6 + 11Z. In the quotient ring
As subsets in Z,
(2 + 11Z)(6 + 11Z) = {(2 + 11k)(6 + 11`) | k, ` ∈ Z}.
5.6. QUOTIENT RINGS 251
But (2 + 11k)(6 + 11`) = 12 + 22` + 66k + 121k`. Though 23 ∈ 1 + 11Z, we can show that there
exist no integers k, ` ∈ Z such that 23 = 12 + 22` + 66k + 121k`. Assume that there did. Then
11 = 22` + 66k + 121k` so 1 = 2` + 6k + 11k`. We deduce that 1 − 2` = k(6 + 11`) so 6 + 11` divides
2` − 1. If ` > 0, then 0 < 2` − 1 < 6 + 11` so 6 + 11` does not divide 2` − 1. Obviously, if ` = 0,
then 6 + 11` does not divide 2` − 1. If ` < 0, then
7 < 9(−`) =⇒ 7 + 2(−`) < 11(−`) =⇒ 1 + 2(−`) < −6 + 11(−`) =⇒ |1 − 2`| < |6 + 11`|
so again it is not possible for 6 + 11` to divide 2` − 1. Hence, this contradicts our assumption that
23 ∈ (2 + 11Z)(6 + 11Z).
Partly because of this technicality, it is even more common in ring theory to borrow the notation
from modular arithmetic and denote a coset r + I in R/I by r. As always, with this notation, the
ideal I must be clear from context. Also inspired from modular arithmetic, we may say that r and
s are congruent modulo I whenever r + I = s + I. Note that the operations in (5.9) are such that
the correspondence r 7→ r̄ from R to R/I is a homomorphism.
Definition 5.6.4
Given a ring R and an ideal I ⊆ R, the homomorphism π : R → R/I defined by π(r) =
r̄ = r + I is called the natural projection of R onto R/I.
Example 5.6.5. The first and most fundamental example comes from the integers R = Z. The
subring I = nZ is an ideal. The quotient ring is R/I = Z/nZ, the usual ring of modular arithmetic.
The reader can now see that our notation for modular arithmetic in fact comes from our notation
for quotient rings. 4
Example 5.6.6. Example 5.5.4 presented two ideals I1 and I2 in R[x]. We propose to describe the
corresponding quotient rings.
Let I1 be the ideal of polynomials whose nonzero term of lowest degree has degree 3 or greater.
Using the generating subset notation, this ideal can be written as I1 = (x3 ). In R/I1 , we have
a(x) = b(x) if and only if b(x) − a(x) ∈ I1 , so b(x) − a(x) = x3 p(x) for some polynomial p(x). Thus,
a(x) = b(x) if and only if a0 = b0 , a1 = b1 and a2 = b2 . Hence, for every polynomial a(x) ∈ R[x], we
have equality a(x) = b(x) for some unique polynomial b(x) of degree 2, namely b(x) = a2 x2 +a1 x+a0 .
Because addition and multiplication behave well with respect to the quotient ring process, we
can write any polynomial in R/I1 as
a2 x2 + a1 x + a0 = a2 x2 + a1 x + a0 .
Now for any two real numbers, ab = ab. However, the element x in the quotient ring has the
particular property that x3 = x3 = 0. In particular, any power of x greater than 2 gives 0. As a
sample calculation,
(x2 + 3x + 7)(2x2 − x + 3) = 3x2 − 3x2 + 9x + 14x − 7x + 21
= 14x2 + 2x + 21.
Now consider the ideal I2 defined as the set of polynomials p(x) such that p(2) = 0 and p0 (2) = 0.
In Section 6.3.3 we will offer a characterization of I2 with generators. However, we can understand
R/I2 without it. Every polynomial p(x) ∈ R[x] satisfies
p(x) = p(2) + p0 (2)(x − 2).
Hence, every polynomial is congruent modulo I2 to the 0 polynomial or a unique polynomial of
degree 1 or less. Though we could write such polynomials as a + bx, it is more convenient to write
the polynomials in R/I2 are a + b(x − 2). Addition in R/I2 is performed component-wise and the
multiplication is
(a + b(x − 2))(c + d(x − 2)) = ac + (ad + bc)(x − 2) + bd(x − 2)2
= ac + (ad + bc)(x − 2), (5.10)
252 CHAPTER 5. RINGS
2 − 12 x
where we replaced the product polynomial p(x) of (possibly) degree 2 with the polynomial p(2) +
p0 (2)(x − 2).
Since p(x) × q(x) = p(x)q(x), the product in (5.10) shows that
The ideal I1 in Example 5.6.6 illustrates a common situation with quotient rings in polynomial
rings. In the exercises, the reader is guided to prove the following results. (See Exercise 5.6.9.) Let
R be a commutative ring with an identity 1 6= 0. Suppose that a(x) = an xn + · · · + a1 x + a0 ∈ R[x]
with an ∈ U (R). In the quotient ring R[x]/(a(x)), for every polynomial p(x) there exists a unique
q(x) ∈ R[x] with deg q(x) < n such p(x) = q(x). Furthermore, in the quotient ring R[x]/(a(x)), the
element x satisfies
xn = −a−1n (an−1 x
n−1
+ · · · + a1 x + a0 ).
Repeated application of this identity governs the multiplication operation in R[x]/(a(x)).
Example 5.6.7. Recall Example 5.5.11 which discussed the ideal I = (5, x2 − x − 2) in Z[x]. We
propose to describe Z[x]/I. Since 5 ∈ I, any element a0 ∈ Z satisfies
a0 = r
Z[x]/I ∼
= (Z/5Z)/(x2 − x − 2) ∼
= (Z/5Z)[x]/(x2 + 4x + 3).
The construction of quotient rings allows us to define many rings, the nature of whose elements
seem less and less removed from something familiar. However, from an algebraic perspective, their
ontology is no less strange than any other ring encountered “naturally.”
When mathematicians first explored complex numbers, they dubbed the elements bi, where b ∈ R,
as imaginary because they considered such numbers so far removed from reality. With the formalism
of quotient rings, complex numbers arise naturally as the quotient ring R[x]/(x2 + 1). Indeed, every
element in R[x]/(x2 + 1) can be written uniquely as a + bx with a, b ∈ R and, since x2 + 1 = 0,
the variable satisfies x2 = −1. Hence, x has exactly the algebraic properties of the unit imaginary
number i.
Proof. We already saw that the kernel Ker ϕ is an ideal. The proof of the First Isomorphism
Theorem of groups shows that the association Φ : R/(Ker ϕ) → S given by Φ(r + (Ker ϕ)) = ϕ(r) is
a well-defined function that is injective and that satisfies the homomorphism criteria for addition.
To establish this theorem, we only need to prove that Φ satisfies the homomorphism criteria for
the multiplication. Let r1 , r2 ∈ R/(Ker ϕ). Then
Thus, Φ is an injective ring homomorphism and hence establishes a ring isomorphism between
R/(Ker ϕ) and the subring Im ϕ in S.
Example 5.6.9. For any ring R, the subset {0} is an ideal of R and R/{0} is isomorphic to R. To
see this using the First Isomorphism Theorem, we use the identity homomorphism i : R → R. It is
obviously surjective and the kernel is Ker i = {0}. Hence, {0} is an ideal with R/{0} ∼
= R. 4
Example 5.6.10 (Augmentation Map Kernel). Let R be a commutative ring and let G be a
finite group with G = {g1 , g2 , . . . , gn }. Let
( n )
X
I= ai gi ∈ R[G] a1 + a2 + · · · + an = 0 .
i=1
We saw that this is an ideal by virtue of being the kernel of the augmentation map. Since the
augmentation map is surjective, by the First Isomorphism Theorem, R[G]/I ∼
= R. 4
An important application of the First Isomorphism Theorem involves the so-called reduction
homomorphism in polynomial rings. Let R be a commutative ring and let I be an ideal in R. The
reduction homomorphism ϕ : R[x] → (R/I)[x] is defined by
where ā is the coset of a in R/I. Because taking cosets into R/I behaves well with respect to + and
×, it is easy to show that ϕ is in fact a homomorphism as claimed. The kernel of ϕ is Ker ϕ = I[x].
The First Isomorphism Theorem leads to the following result.
254 CHAPTER 5. RINGS
Proposition 5.6.11
Let R be a ring and let I be an ideal. Then the subring I[x] of R[x] is an ideal and
R[x]/I[x] ∼
= (R/I)[x].
Listed below are the Second, Third, and Fourth isomorphism theorems for rings. We list them
without proof and request the reader to prove them in the exercises. The proofs are very similar to
the proofs for the corresponding group isomorphism theorems.
(R/I)/(J/I) ∼
= R/J.
Example 5.6.15. A simple application of the Third Isomorphism Theorem for rings arises in mod-
ular arithmetic. Let R = Z and I = (12) = 12Z. Now J = (4) = 4Z is also an ideal of Z. The Third
Isomorphism Theorem for this situation is
(Z/12Z)/(4Z/12Z) ∼
= Z/4Z. 4
Theorem 5.6.16
Let n1 , n2 , . . . , nk be integers greater than 1 which are pairwise relatively prime, i.e.,
gcd(ni , nj ) = 1 for i 6= j. For any integers a1 , a2 , . . . , ak ∈ Z, the system of congruences
x ≡ a1 (mod n1 )
x ≡ a2
(mod n2 )
..
.
x ≡ ak (mod nk )
Proof. For each i, set n0i = n/ni . Now ni and n0i are relatively prime so there exist integers si and
ti such that
si ni + ti n0i = 1.
Then for all i, we have ti n0i ≡ 1 (mod ni ) so ai ti n0i ≡ ai (mod ni ). Consider the integer given by
Then, since n0j ≡ 0 (mod ni ) if i 6= j, we have x ≡ ai (mod ni ) for all i. Hence, x satisfies all of the
congruence relations.
If another integer y satisfies all of the congruence conditions, then x − y is congruent to 0 for all
ni , and hence n|(x − y), establishing uniqueness of the solution.
Example 5.6.17. Find an x such that x ≡ 3 (mod 8), x ≡ 2 (mod 5), and x ≡ 7 (mod 13). The
theorem confirms that there exists a unique solution for x modulo 520. The proof provides a method
to find this solution. We have
We need to calculate ti as the inverse of n0i modulo ni . Though we could use group theoretic methods,
of the Extended Euclidean Algorithm to do this, in this example the integers ni are small enough
that trial and error suffices. We find
t1 = 1, t2 = 4, t3 = 1.
Thus, according to the above proof, the solution to the system of congruence equations is
is a ring homomorphism with kernel A1 ∩A2 ∩· · ·∩Ak . If the ideals are pairwise comaximal,
then this map is surjective, A1 A2 · · · Ak = A1 ∩ A2 ∩ · · · ∩ Ak , and
R/(A1 A2 · · · Ak ) ∼
= (R/A1 ) ⊕ (R/A2 ) ⊕ · · · ⊕ (R/Ak ),
as a ring isomorphism.
Proof. We prove the result first for k = 2 and then extend by induction.
Let A1 and A2 be two comaximal ideals and consider ϕ : R → R/A1 ⊕ R/A2 defined as in the
statement of the theorem. Since A1 and A2 are comaximal, there exist a1 ∈ A1 and a2 ∈ A2 such
that a1 + a2 = 1. Let r, s ∈ R be arbitrary and let x = ra2 + sa1 . Then
Thus, the ideals A1 A2 · · · Ak and Ak+1 are comaximal. Thus, since the theorem holds for k = 2,
then
R/(A1 A2 · · · Ak+1 ) ∼
= (R/(A1 A2 · · · Ak )) ⊕ (R/Ak+1 ).
By the induction hypothesis, we deduce that
R/(A1 A2 · · · Ak+1 ) ∼
= (R/A1 ) ⊕ (R/A2 ) ⊕ · · · ⊕ (R/Ak+1 ).
This theorem leads to the decomposition for the group of units in modular arithmetic.
Corollary 5.6.19
αk
Let n be a positive integer and let n = pα1 α2
1 p2 · · · pk . Then
Z/nZ ∼
= (Z/pα α2 αk
1 Z) ⊕ (Z/p2 Z) ⊕ · · · ⊕ (Z/pk Z).
1
U (n) ∼
= U (pα α2 αk
1 ) ⊕ U (p2 ) ⊕ · · · ⊕ U (pk ).
1
It is still a number theory problem to determine U (pα ) for various primes p and powers α.
(a) In this quotient ring, calculate and simplify as much as possible the sum and product of
4x2 − 5x + 2 and 2x + 7.
(b) Show that x is a unit.
(a) Prove that the quotient ring F5 [x]/(f (x)) has 25 elements.
(b) Prove that F5 [x]/(f (x)) contains no zero divisor.
(c) Deduce that F5 [x]/(f (x)) is a field. [Hint: See Exercise 5.1.23.]
9. Let R be an integral domain and let a(x) ∈ R[x] with deg a(x) = n > 0 and an ∈ U (R).
(a) Prove that, in the quotient ring R[x]/(a(x)), the element x satisfies xn = −a−1
n (an−1 x
n−1
+· · ·+
a1 x + a0 ).
(b) Prove that for every polynomial p(x), there exists a polynomial q(x) that is either 0 or has
deg q(x) < n such that p(x) = q(x) in R[x]/(a(x)). [Hint: Use induction on the degree of p(x).]
(c) Prove that the polynomial q(x) described above is unique.
10. Consider the ring Z[i] and the ideal I = (2 + i). We study the quotient ring Z[i]/(2 + i).
(a) Consider the (quotient) ring R1 = (Z/2Z)[x]/(x3 + 1). List all 8 elements of this ring and
determine whether (and show how) they are units, zero divisors, or neither.
(b) Repeat the same question with the ring R2 = (Z/2Z)[x]/(x3 + x + 1).
(c) Consider also the ring R3 = Z/8Z of modular arithmetic modulo 8. The rings R1 , R2 , and R3
all have 8 elements. Show that none of them are isomorphic to each other.
17. Let R be a ring and let I be an ideal of R. Prove that Mn (I) is an ideal of Mn (R) and show that
Mn (R)/Mn (I) ∼
= Mn (R/I). [Hint: First Isomorphism Theorem.]
18. Let Un (R) be the set of upper triangular n × n matrices with coefficients in R. Prove that the subset
I = {A ∈ Un (R) | aii = 0 for all i = 1, 2, . . . , n}
is an ideal in Un (R) and determine Un (R)/I.
19. Prove the Second Isomorphism Theorem for rings. (See Theorem 5.6.12.)
20. Prove the Third Isomorphism Theorem for rings. (See Theorem 5.6.13.)
21. Prove the Fourth Isomorphism Theorem for rings. (See Theorem 5.6.14.)
22. Let R be a ring and let e be an idempotent element in the center C(R). Observe that the ideals
(e) = Re and (1 − e) = R(1 − e) and prove that R ∼
= Re ⊕ R(1 − e).
23. Consider the group ring Z[S3 ]. Show that the set I of elements
α = a1 + a2 (1 2) + a3 (1 3) + a4 (2 3) + a5 (1 2 3) + a6 (1 3 2)
satisfying (
a1 + a2 + a3 + a4 + a5 + a6 = 0
a1 − a2 − a3 − a4 + a5 + a6 = 0
is an ideal and show that Z[S3 ]/I = ∼ Z ⊕ Z. Prove that every element in Z[S3 ]/I can be written
uniquely as a1 1 + a2 (1 2), with a1 , a2 ∈ Z.
24. Let R be a PID and let I be an ideal in R. Prove that R/I is a PID.
25. Consider the subset I = {(2m, 3n) | m, n ∈ Z} in the ring Z ⊕ Z. Prove that I is an ideal in Z ⊕ Z and
that (Z ⊕ Z)/I ∼
= Z/2Z ⊕ Z/3Z.
26. Let R and S be rings with identity 1 6= 0. Prove that every ideal of R ⊕ S is of the form I ⊕ J for
some ideals I ⊆ R and J ⊆ S.
27. Solve the following system of congruences in Z:
(
x ≡ 3 (mod 5)
x ≡ 7 (mod 11).
5.7
Maximal Ideals and Prime Ideals
We now turn to two different classes of ideals in rings. Though it will not be obvious at first, they
both attempt to generalize the notion and properties of prime numbers in Z.
Recall that in elementary number theory there are two equivalent definitions of prime numbers.
(1) An integer p > 1 is a prime number if and only if p is only divisible by 1 and itself. (The
definition of prime.)
(2) An integer p > 1 is a prime number if and only if whenever p|ab, then p|a or p|b. (Euclid’s
Lemma, Proposition 2.1.21.)
In ring theory in general, these two notions are no longer equivalent. The connection between
properties of ideals and integer arithmetic comes from the fact that a|b if and only if b ∈ (a), if and
only if (b) ⊆ (a).
Definition 5.7.1
An ideal I in a ring R is called maximal if I 6= R and if the only ideals J such that
I ⊆ J ⊆ R are J = I or J = R.
An arbitrary ring need not have maximal ideals. In many specific rings it is obvious or at least
very simple to prove the existence of maximal ideals. However, the following general proposition
proves the existence of maximal ideals in rings with a very few conditions. The proof relies on Zorn’s
Lemma, which is equivalent to the Axiom of Choice.
Proof. Let R be a ring with an identity and let I be any proper ideal. Let S be the set of all proper
ideals that contain I. Then S is a nonempty set (since it contains I), which is partially ordered by
inclusion. Let C be any chain of ideals in S. We show that C has an upper bound.
Define the set [
J= A.
A∈C
In the integers, suppose that (n) is a maximal ideal in Z. Then any ideal I = (m) satisfying
(n) ⊆ (m) ⊆ R is either (n) = (m) or (m) = R. Expressing this in terms of divisibility, we deduce
that m|n implies m = ±1 or m = ±n. Supposing that m is positive, then m = 1 or m = n. This
260 CHAPTER 5. RINGS
corresponds to the definition of a prime number, listed above in (Criterion 1). Hence, the maximal
ideals in Z correspond to the ideals (p) where p is a prime number.
The next proposition offers a criterion for when an ideal is maximal.
Proposition 5.7.3
Let R be a commutative ring. Then an ideal M is maximal if and only if R/M is a field.
Proof. By the Lattice Isomorphism Theorem for rings, there is a bijective correspondence between
the ideals of R/M and the ideals of R that contain M . But M is a maximal ideal if and only if the
only ideals of R that contain M are M and R itself. Hence, R/M contains precisely two ideals, (1)
and (0). By Proposition 5.5.15, R/M is a field.
The above proposition offers the strategy to prove that an ideal I is maximal: Calculate the
quotient ring R/I, prove that R/I is a field, and then invoke the proposition. With the integers, we
had previously seen that all ideals are of the form (n) with n nonnegative and that Z/nZ is a field
if and only if n is a prime number. Proposition 5.7.3 shows that (n) is maximal if and only if n is a
prime number, which returns to prime numbers as the motivation for the notion of a maximal ideal.
Example 5.7.4. As a nonobvious example of Proposition 5.7.3, consider the ring of Gaussian inte-
gers Z[i] and the principal ideal (2 + i). Exercise 5.6.10 shows that Z[i]/(2 + i) is isomorphic to F5 ,
which is a field, so (2 + i) is a maximal ideal. 4
Definition 5.7.5
Let R be any ring. An ideal P 6= R is called a prime ideal if whenever two ideals A and B
satisfy AB ⊆ P then A ⊆ P or B ⊆ P .
Many applications that involve prime ideals occur in the context of commutative rings. In
commutative rings, Definition 5.7.5 has an equivalent statement.
Proposition 5.7.6
Let R be a commutative ring with an identity 1 6= 0. An ideal P 6= R is prime if and only
if ab ∈ P implies a ∈ P or b ∈ P .
Prime ideals possess an equivalent characterization via quotient rings similar to Proposition 5.7.3
for maximal ideals.
Proposition 5.7.7
An ideal P in a commutative ring with an identity 1 6= 0 is prime if and only if R/P is an
integral domain.
Corollary 5.7.8
In a commutative ring with an identity 1 6= 0, every maximal ideal is a prime ideal.
Proof. Let R be a commutative ring and let M be a maximal ideal. Then by Proposition 5.7.3,
R/M is a field. Every field is an integral domain. So by Proposition 5.7.7, M is prime.
By Criteria 2 for primality in the integers, using either the definition or Proposition 5.7.7, we
see that an ideal (n) ⊆ Z is prime if and only if n is a prime number or n = 0. The zero ideal (0)
is the only ideal in Z that is prime but not maximal. However, as the following example illustrates,
this similarity between prime and maximal ideals does not hold for arbitrary rings.
√
Example 5.7.9. Let R = Z[x] and consider the ideal I1 = (x2 −2). The quotient ring R/I1 ∼ = Z[ 2]
is an integral domain but is not a field so I1 is a prime ideal that is not maximal. 4
Proposition 5.7.10
Let R be a PID. Then every nonzero prime ideal P is a maximal ideal.
Proof. Let P = (p) be a nonzero prime ideal in R and let I = (m) be an ideal containing P .
Since p ∈ (m) then there exists r ∈ R such that mr = p. But then, mr ∈ P so either m ∈ P or
r ∈ P . If m ∈ P , then (m) ⊆ P which implies that (m) = P since we already knew that P ⊆ (m).
Now suppose that r ∈ P so that there exists some s ∈ R such that r = sp. Then from mr = p we
deduce that msp = p. Since R is in an integral domain and p 6= 0, the cancellation law holds, so
ms = 1. Thus, m is a unit and (m) = R. Hence, we have proved that either I = P or I = R. Thus,
we conclude that P is maximal.
Example 5.7.11. In this example, we revisit Z[x] and show a few different ideals that are prime,
maximal, or neither. Consider the following chain of ideals
(It is not yet obvious that this is a chain but we shall see that it is.) The ideal (x3 + x2 − 2x − 2)
is not prime because (x − 1)(x2 − 2) = x3 + x2 − 2x − 2 is in this ideal whereas neither x − 1 nor
x2 − 2 is in the ideal, since nonzero polynomials in (x3 + x2 − 2x − 2) have √ degree 3 or higher.
The ideal (x2 − 2) is prime but not maximal because Z[x]/(x2 − 2) ∼ = Z[ 2] is an integral domain
but is not a field.
262 CHAPTER 5. RINGS
The ideal I3 = (x2 − 2, x2 + 13) is actually equal to (x2 − 2, 15). We see that 15 ∈ I3 because
x2 + 13 − (x2 − 2) = 15. However, x2 + 13 ∈ (x2 − 2, 15) so I3 = (x2 − 2, 15). This is not a prime
ideal because neither 3 nor 5 are in I3 whereas 15 ∈ I3 .
Finally, the ideal (x2 − 2, 5) is maximal. We can see this because
Z[x]/(x2 − 2, 5) ∼
= Z/5Z[x]/(x2 − 2) ∼
= Z/5Z[x]/(x2 + 3).
However, a 6= 0 is not a zero divisor and if a + bx were a zero divisor with b 6= 0 in Z/5Z[x]/(x2 + 3),
then b−1 a + x would be a zero divisor. Then there would exist x + c and x + d with c, d ∈ Z/5Z
such that (x + c)(x + d) = x2 + 3 in Z/5Z[x]. Then c and d would need to solve c + d = 0 and
cd = 3. Hence, d = −c and −c2 = 3. Checking the five cases, we find that −c2 = 3 has no solutions
in Z/5Z. Hence, Z/5Z[x]/(x2 + 3) has no zero divisors. This quotient ring is commutative and has
an identity so Z/5Z[x]/(x2 + 3) is an integral domain. It is finite so it is a field, so we conclude that
(x2 − 2, 5) is a maximal ideal. 4
Example 5.7.12. Let R = C 0 ([0, 1], R) be the ring of continuous real-valued functions on [0, 1]
and let a be a number in [0, 1]. Let Ma be the set of all functions such that f (a) = 0. Note that
Ma = Ker(eva ) so Ma is an ideal. Also by the surjectivity of eva and the first isomorphism theorem,
we have R/Ma ∼ = R. Since R is a field, by Proposition 5.7.3, Ma is a maximal ideal.
Consider now the ideal
This ideal is not prime because, for example, the functions g, h : [0, 1] → R given by g(x) = x and
h(x) = 1 − x are such that g, h ∈
/ I but gh ∈ I. Consequently, it is not maximal either. 4
Prime ideals and maximal ideals possess many interesting properties. Properties of prime ideals
in a commutative ring are a central theme in the study of commutative algebra, the branch of
algebra that studies in depth the properties of commutative rings. The section exercises present
some questions that ask the reader to determine whether a given ideal is prime or maximal but also
investigate many of these properties. The reader is encouraged to at least skim the statements of
exercises to acquire some intuition about the properties of prime ideals.
Ik = {A ∈ Un (R) | akk = 0}
6. Consider the ring Un (Z) of upper triangular n × n matrices with integer coefficients. Fix an integer k
with 1 ≤ k ≤ n. Prove that the set
Ik = {A ∈ Un (Z) | akk = 0}
is a prime ideal in Un (Z) that is not maximal. Explain how this differs from the previous exercise.
7. Let R = C 0 (R, R) and consider the set of functions of compact support,
[Recall that a subset S of R is bounded if there exists some c such that S ⊆ [−c, c].]
(a) Prove that I is an ideal.
(b) Prove that any maximal ideal that contains I is not equal to any of the ideals Ma described in
Example 5.7.12.
8. This exercise asks the reader to prove the following modification of Proposition 5.7.3. Let R be any
ring. An ideal M is maximal if and only if the quotient R/M is a simple ring. [A simple ring is a ring
that contains no ideals except the 0 ideal and the whole ring.]
9. In noncommutative rings, we call a left ideal a maximal left ideal (and similarly for right ideals) if it
is maximal in the poset of proper left ideals ordered by inclusion. Prove that maximal left ideals (and
maximal right ideals) exist under the same conditions as for Krull’s Theorem.
10. Prove that the set of matrices {A ∈ M2 (R) | a11 = a21 = 0} is a maximal left ideal of M2 (R). (See
Exercise 5.7.9.) Find a maximal right ideal in M2 (R).
11. Find all the prime ideals in Z ⊕ Z.
12. Prove that (y − x2 ) is a prime ideal in R[x, y]. Prove also that it is not maximal.
13. Prove that (y, x2 ) is not prime in R[x, y].
14. Prove that the principal ideal (x2 + y 2 ) is prime in R[x, y] but not in C[x, y].
15. Show that in C 1 (R, R) the ideal I = {f | f (2) = f 0 (2) = 0} is not a prime ideal.
16. Show by example that the intersection of two prime ideals is in general not another prime ideal.
17. Let R be any ring.
(a) Show that the intersection of two prime ideals P1 and P2 is prime if and only if P1 ⊆ P2 or
P2 ⊆ P1 .
(b) Conclude that the intersection of two distinct maximal ideals is never a prime ideal.
18. Let R be a commutative ring. Prove that the nilradical N (R) is a subset of every prime ideal.
19. Show that a commutative ring with an identity 1 6= 0 is an integral domain if and only if {0} is a
prime ideal.
20. Let ϕ : R → S be a ring homomorphism and let Q be a prime ideal in S. Prove that ϕ−1 (Q) is a
prime ideal in R.
21. Let ϕ : R → S be a ring homomorphism and let P be a prime ideal in R. Prove that the ideal
generated by ϕ(P ) is not necessarily a prime ideal in S.
22. Let R be a ring and let S be a subset of R. Prove that there exists an ideal that is maximal (by
inclusion) with respect to the property that it “avoids” (does not contain) the set S.
23. Prove that the nilradical of a commutative ring is equal to the intersection of all the prime ideals of
that ring. [Hint: Use Exercises 5.7.18 and 5.7.22.]
√
24. Let R be a commutative ring and let I be an ideal. Prove that the radical ideal I is equal to the inter-
section of all prime ideals that contain I. [Hint: Use the isomorphism theorems and Exercise 5.7.23.]
25. Consider the polynomial ring R[x1 , x2 , x3 , . . .] with real coefficients but a countable number of variables.
Define I1 = (x1 ), I2 = (x1 , x2 ), and Ik = (x1 , x2 , . . . , xk ) for all integers k. Prove that Ik is a prime
ideal for all positive integers k and prove that
I1 ( I2 ( · · · ( Ik · · ·
264 CHAPTER 5. RINGS
5.8
Projects
Project I. Roots of x2 + 1 in Z/nZ. Consider the ring of modular arithmetic R = Z/nZ.
The goal of this project is to find the number of solutions to the equation x2 + 1 = 0 in R.
Determine the number of solutions for a large number of different values of n. Try to make
(and if possible prove) a conjecture about the number of roots when n is prime, when n is a
power of 2, when n is the product of two primes, and when n is general.
Project II. The Three-Sphere Group. Recall (see Appendix A.1) that the set {z ∈ C | |z| =
1} is a subgroup of (U (C), ×). Furthermore, this set is isomorphic to the circle S1 where we
locate point via and angle and we consider the addition operation by the addition of angles.
Consider now the subset of the quaternions
S = {α ∈ H | N (α) = 1},
where N (α) is the quaternion norm introduced in Example 5.1.21. This set S is a subgroup
of (U (H), ×). Study this group. Show that geometrically we can view this set of a three-
dimensional unit sphere S3 in R4 . Study the group in comparison to (S1 , +). Are there
subgroups, normal subgroups, etc., in (S, ×)? If there are normal subgroups, identify the
corresponding quotient groups. Are there nontrivial homomorphisms with this group as its
domain? Offer your own investigations or generalizations about this group.
Project III. Inverting Matrices of Quaternions. Consider the ring M2 (H). Attempt to find a
criterion for when a matrix is invertible. Can you find a formula for the inverse of an invertible
matrix in M2 (H)? Does your criteria extend to Mn (H) for n ≥ 3?
Project IV. Matrices of Quaternions. Study the ring M2 (H). Decide if you can find nilpotent
elements, zero divisors, ideals, etc. Discuss solving systems of two equations in two variables
but with variables and coefficients taken from H.
First show that the function ϕ is a ring homomorphism and conclude that Im ϕ is a commuta-
tive subring of Mn×n (F ). The matrices in the image of ϕ commute with A but this may not
account for all matrices that commute with A. Setting adA : Mn×n (F ) → Mn×n (F ) as the
linear transformation such that adA (X) = AX − XA, then Ker(adA ) is the subset of Mn×n (F )
of matrices that commutes with A. What can be said about the relationship between Ker adA
and Im ϕ? Are they always equal for any A? If not, what conditions on A make them equal?
The set Ker adA is a priori just a subspace of Mn×n (F ); is it a subring or even an ideal of
Mn×n (F )? (Consider examples with 2 × 2 rings.)
Project VI. Subset Polynomial Ring. If S is a set, then the power set P(S) has the structure
of a ring when equipped with the operation 4 for addition and ∩ for the multiplication.
Consider the polynomial ring R = P(S)[x]. What are some properties (units, zero divisors,
what ideals may be, what are the maximal ideals, is it an integral domain, is it a PID, etc.) of
this ring? Discuss the same question with a quotient ring of R. For example, you could take
S = {1, 2, 3, 4, 5} and study properties of
Project VII. Application of FTFGAG. Consider the group of units in quotient rings of the
form Fp [x]/(n(x)), where p is a prime and n(x) is some polynomial in Fp [x]. Since the quotient
ring will be finite and abelian, the group of units will be finite and abelian, and hence is subject
to the Fundamental Theorem of Finitely Generated Abelian Groups. Try a few examples and
find out as much as you can about such groups. With examples, can you determine the
isomorphism type of U (F5 [x]/(x2 + x + 1)) or U (F2 [x]/(x3 + x + 1))?
Project VIII. Quotient Rings and Calculus. Revisit Example 5.6.6 and in particular how
a product in a quotient ring recovers the product rule of differentiation. Generalize this
observation to higher derivatives or to other function rings such as C n ([a, b], R) or to even
more general function rings. Are there other rules of differentiation that emerge from doing
other operations in an appropriate quotient ring?
Project IX. Convolution Rings. The construction of convolution rings is a general process
with many natural examples subsumed them. Explore properties of convolution rings of your
own construction. Consider using commutative or noncommutative rings, finite or infinite
semigroups. Explore ring properties such as zero divisors, commutativity, units, ideals, and so
on.
6. Divisibility in Commutative Rings
Chapter 2 introduced a few basic properties of the integers that were essential for many of the
earlier topics. Section 2.1 emphasized first the well-ordering of the integers and then properties
following from the notion of divisibility. The well-ordering of the integers was a property of the total
(discrete) order ≤ on Z and implied the principle of mathematical induction on Z. The partial order
of divisibility on the integers led to the concept of primes, greatest common divisor, least common
multiple, modular arithmetic, and many other notions.
Arbitrary rings do not naturally possess a total order that leads to well-ordering or induction.
However, divisibility is an important notion in rings, in particular commutative rings. This chapter
develops the notion of divisibility for rings, with a view of generalizing many of the divisibility
properties of the integers to as general a context as possible. The presentation makes regular
reference to the topics introduced in Section 2.1.
Section 6.1 defines divisibility in rings and discusses how and in what context arbitrary rings
possess similar notions to divisibility, primes, and greatest common divisor. In Section 6.2, we
construct methods to force ring elements to be divisible by other elements, thereby creating rings
of fractions, modeled from the construction of the rational numbers from the integers. Section 6.3
discusses a generalization of the integer division algorithm, while Section 6.4 discusses a general
context in which something akin to the unique prime factorization occurs.
In Section 6.5, the first application section of the chapter, general concepts of divisibility are
brought to bear on polynomial rings and in particular polynomial rings with coefficients in a field.
As a technical application, Section 6.6 introduces the RSA protocol for public key cryptography.
Finally, Section 6.7 offers a brief introduction to algebraic number theory, a branch of mathematics
that studies questions of interest in classical number theory but in extensions of the integers.
6.1
Divisibility in Commutative Rings
6.1.1 – Divisors and Multiples
Definition 6.1.1
Let R be a commutative ring. We say that a nonzero element a divides an element b, and
write a|b, if there exists r ∈ R such that b = ar. The element a is called a divisor of b and
b is called a multiple of a.
The reader may (and should) wonder why this definition is phrased in the context of commutative
rings and not arbitrary rings. If a ring R is not commutative, then given elements a, b ∈ R it could
be possible that there exists r ∈ R such that b = ar but that there does not exist s ∈ R such that
b = sa. Consequently, we would need to introduce the notions of right-divisibility if ∃r ∈ R, b = ar
and left-divisibility if ∃r ∈ R, b = ra. In the theory of noncommutative rings, one must take care to
distinguish these relations and develop appropriate theorems. This section restricts the attention to
commutative rings, but the exercises investigate some properties that apply to arbitrary rings.
When we say that an integer a is divisible by an integer b, there is an assumption that the k ∈ Z
such that b = ak is unique. The uniqueness follows immediately from Definition 2.1.5 for divisibility
over the integers. Consider now an arbitrary commutative ring R and suppose that a = bk and
a = bk 0 in R. Then 0 = b(k − k 0 ). One way that 0 = b(k − k 0 ) could hold is if b = 0, which would
267
268 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS
imply that a = 0. This is why Definition 6.1.1 imposed the condition a 6= 0. However, if b is a zero
divisor, then there exist distinct k and k 0 such that b(k − k 0 ) = 0. Hence, uniqueness of the factor k
does not hold in rings with zero divisors.
In Section 2.1.2, we pointed out that (N∗ , |) is a partially ordered set. This remark inspires us
to investigate whether and in what sense divisibility is close to being a partial order on a ring.
Proposition 6.1.2
Let R be a commutative ring. Divisibility is a transitive relation on R. Furthermore,
divisibility is a reflexive relation on R if and only if R has an identity.
Proof. Suppose that a, b, c ∈ R with a and b nonzero and a|b and b|c. Then there exist r, s ∈ R with
b = ar and c = bs. Then by associativity, c = a(rs) so a|c.
For reflexivity, note that a|a for all a ∈ R if and only if there exists r ∈ R such that a = ar. This
is precisely the definition of a multiplicative identity.
A discussion about antisymmetry is more involved, but we try to follow the reasoning in Proposi-
tion 2.1.6(4) that led to the result that | is antisymmetric on N∗ . Suppose that a and b are elements
in a commutative ring R such that a|b and b|a. Then there exist r, s ∈ R with b = ar and a = bs.
Hence, a = ars. Without more information about the ring, there is not much else to be said. If
the ring R has an identity 1 6= 0, then a1 = ars, which implies that a(1 − rs) = 0. If the ring R
has zero divisors, then the cancellation law does not necessarily exist and it does not follow that
rs = 1. However, if R contains no zero divisors, then a|b and b|a is equivalent to a and b differing by
multiplication by a unit. We observe that in any ring with an identity, −1 is a unit so divisibility is
never antisymmetric.
The above discussion shows that for the usual notion of divisibility, it is preferable that the
commutative ring have an identity 1 6= 0 and not include any zero divisors. These are precisely the
properties of an integral domain. Consequently, from now on, unless we explicitly say so, we will
discuss the notion of divisibility only in the context of integral domains.
Definition 6.1.3
Let R be an integral domain. Two elements a and b are called associates if there exists a
unit u such that a = bu.
It is not hard to prove that the relation of associate is an equivalence relation on R. (See
Exercise 6.1.6.) In this text, we will consistently use the following relation symbol on an integral
domain
a'b
to mean that a and b are associates. Notice that with this relation symbol, r ' 1 if and only if r is
in the group of units U (R).
We can now see in what sense divisibility is a partial order.
Proposition 6.1.4
Setting [a] | [b] in the quotient set (R − {0})/ ' if and only if a|b in R defines a relation on
(R − {0})/ '. Furthermore, | is a partial order on (R − {0})/ '.
Proof. We must first verify that setting [a] | [b] is well-defined. Let a and a0 be associates and let
b and b0 be associates so that a = a0 u and b = b0 v for u, v ∈ U (R). But a = br is equivalent to
a0 = b0 (vru−1 ) and a0 = b0 r0 is equivalent to a = bv −1 r0 u. Hence, the choice of representatives
from [a] and [b] is irrelevant. Hence, our definition for divisibility on the set of associate classes is
well-defined.
From Proposition 6.1.2, we already know that | is transitive and reflexive on (R − {0})/ '.
Furthermore, we saw that a|b and b|a in R if and only if a and b are associates. Hence, if [a] | [b]
and [b] | [a] on (R − {0})/ ', then [a] = [b]. Hence, | is antisymmetric on (R − {0})/ '.
6.1. DIVISIBILITY IN COMMUTATIVE RINGS 269
√
Figure 6.1: The elements and associates in Z[(−1 + i 3)/2]
Example 6.1.5. Consider the polynomial equation x3 − 1 = 0. By the identity given in Exer-
cise 5.1.26, this equation is equivalent to
(x − 1)(x2 + x + 1) = 0.
√ √
The roots of x2 + x + 1 = 0 are ω = −1+i2
3
and ω = −1−i2
3
.
Consider the ring Z[ω]. As a subring of C that includes the identity, it is an integral domain. It
is not hard to show that in C,
a + bω
(a + bω)−1 = 2 .
a − ab + b2
In order for a + bω to be a unit in Z[ω], it is not hard to show that we need a2 − ab + b2 = ±1. This
has six solutions, namely (a, b) = (±1, 0), (0, ±1), ±(1, 1), or in other words
√ √ √ √
1 + i 3 −1 + i 3 −1 − i 3 1 − i 3
1, , , −1, , .
2 2 2 2
In polar coordinates, the units are
π π
cos k + i sin k for k = 0, 1, . . . , 5.
3 3
√
These correspond to the 6 distinct powers of ζ k with ζ = 1+i2 3 .
In Figure 6.1, the dots represent the elements in Z[ω]. Each element is an associate to a unique
element in the sector defined in polar coordinates by r > 0 and 0 ≤ θ < π3 , represented by . So
according to Proposition 6.1.4, divisibility is a partial order on the elements in Z[ω] in this sector.4
respect to the operations of the ring. However, for some integral domains, the concept of a norm
allows us to leverage properties of the integers to deduce results about the ring.
Definition 6.1.6
• Let R be an integral domain. Any function N : R → N with N (0) = 0 is called a
norm on R.
Proposition 6.1.7
√
The element α ∈ Z[ D] is a unit if and only if N (α) = 1.
Proof. We already know that if α is a unit then N (α) = 1. Conversely, if N (α) = 1, then a2 − Db2 =
±1 and so √
−1 a−b D √
α = 2 ∈ Z[ D].
a − Db 2
Though divisibility is a partial order on an integral domain in the sense of Proposition 6.1.4, the
two criteria for primeness in Z are still not necessarily equivalent in integral domains. Hence, we
need two separate definitions.
Definition 6.1.8
Let R be an integral domain.
(1) Suppose that r is nonzero and not a unit. Then r is called irreducible if whenever
r = ab either a or b is a unit. Otherwise, r is said to be reducible.
Note that p is a prime element if and only if the principal ideal (p) is a prime ideal. However,
keep in mind that not all prime ideals are necessarily principal.
Proposition 6.1.9
In an integral domain, a prime element is always irreducible.
Proof. Suppose that (p) is a nonzero prime ideal and that p = ab. Then a or b is in (p). Without
loss of generality, suppose that a ∈ (p) so that a = cp for some c. Then p = pcb and hence 1 = cb.
Thus, b is a unit in R.
It is easy to calculate that αβ = 10 = γδ. It is also easy to calculate that using the norm N in (6.1),
we have
N (α) = 4, N (β) = 25, N (γ) = 10, N (δ) = 10.
√
However, we claim √ that there are no elements in Z[ 10] of norm either 2 or 5. Assume there exists
an element a + b 10 of norm 2. Then a2 − 10b2 = ±2. Modulo 10, this gives a2 ≡ ±2 (mod 10).
The squares modulo 10 are 0, 1, 4, 9, 6, 5. Hence, we arrive at a contradiction and √ so we conclude
there exists no element of norm 2. Assume now that there exists an element a + b 10 of norm 5.
Then a2 − 10b2 = ±5. This implies that 5|a2 and thus that 5|a. Writing a = 5c leads to the equation
5c2 − 2b2 = ±1. In modulo 5, this equation is 3b2 ≡ ±1 (mod 5) which is equivalent to b2 ≡ ±2
(mod 5). The squares in modular arithmetic modulo 5 are 0, 1, 4. Therefore, the assumption leads
to a contradiction and thus there exists no element of norm 5.
Suppose that α = ab. Then N (ab) = 4 so the pair (N (a), N (b)) is (1, 4), (2, 2), or (4, 1). Since
no element exists of norm 2, either N (a) = 1 or N (b) = 1. By Proposition 6.1.7, either a or b is
a unit. Hence, α is irreducible. By a similar reasoning, we can establish that β, γ, and δ are all
irreducible elements. However, none of them are prime elements.
Notice that α divides 10 = γδ. But N (α) = 4 does not divide N (γ) = 10 or N (δ) = 10, so α
divides neither γ nor δ. Hence, α is an element that is irreducible but not prime. With a similar
reasoning, we can show that β, γ, and δ are not prime elements either.
We point out that we have made√ the √ above example
√ more complicated than √ necessary to illustrate
√
−1
a point. Note√ that 10 = 2 · 5 = 10 · 10,
√ −1 that 3 + 10√is a unit with (3 + 10) = −3 + 10 and
that 19 + 6 10 is a unit with (19 + 6 10) = 19 − 6 10. We chose α, β, γ, δ by
√ √ √ √ √ √
α = 2(3 + 10), β = 5(−3 + 10), γ = 10(19 + 6 10), δ = 10(19 − 6 10).
√
Hence,
√ α, β, γ, and δ are respectively associates with the much simpler numbers 2, 3, 10, and
10. 4
272 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS
Definition 6.1.11
Let R be an integral domain. If a, b ∈ R with (a, b) 6= (0, 0), a greatest common divisor of
a and b is an element d ∈ R such that:
• d|a and d|b (d is a common divisor);
• if d0 |a and d0 |b (d0 is another common divisor), then d0 |d.
Proposition 6.1.12
If two elements a and b in an integral domain R have a greatest common divisor d, then
any other greatest common divisor d0 is an associate of d.
Proof. If both d and d0 are greatest common divisors to a and b, then d|d0 and d0 |d. Hence, there
exist elements u, v ∈ R such that d0 = ud and d = vd0 . Then d0 = uvd0 and since R is an integral
domain, 1 = uv, which implies that u and v are units. Thus, d ' d0 .
The hypothesis of Proposition 6.1.12 was careful to say, “if a and b have a greatest common
divisor.” In contrast to what happens in the ring of integers, two elements in arbitrary integral
domains need not possess a greatest common divisor.
√
Example 6.1.13. The ring Z[ 10] again offers an example √ of the nonexistence of greatest common
√ a = 12 and b = 24 + 6 10. It is easy to see that 6 is a common
divisors. Consider the elements
divisor to a and b but 8 + 2 10 is also because
√ √ √ √
12 = (4 − 10)(8 + 2 10) and 24 + 6 10 = 3(8 + 2 10).
√ √
Since neither N (6) = 36 nor N (8 + 2 10) = 24 divides the other, then neither 6 nor√8 + 2 10
divides the other. Hence, neither of them is a greatest common divisor of 12 and 24 + 6 10.
Assume
√ √ d to a and b. Then d is a multiple of 6 and of
there exists a greatest common divisor
8 + 2 10 while it is a divisor of 12 and 24 + 6 10. Hence, N √(6) = 36 divides N (d) which in turn
divides N (12) = 144. In Example 6.1.10, we showed that Z[ 10] has no elements with norm 2 so
N (d)/N (6) = 1 or 4. If N (d)
√ = N (6), then d is an associate of 6 which leads to a contradiction since
6 is not a multiple of 8 + 2 10. If N√(d) = 144, then d is an associate of√12 which is a contradiction
since 12 is not a divisor of 24 + 6 10. Consequently, 12 and 24 + 6 10 do not have a greatest
common divisor. 4
We conclude the section with two definitions that adapt some more terminology of elementary
number theory to integral domains. Results related to these concepts are left in the exercises.
Definition 6.1.14
Two elements a and b in an integral domain are said to be relatively prime if the only
common divisors are units.
Definition 6.1.15
Let R be an integral domain. If a, b ∈ R with (a, b) 6= (0, 0), a least common multiple is an
element m ∈ R such that:
Proposition 2.1.16 for least common multiples in Z carries over with only minor adjustments to
any integral domain.
Proposition 6.1.16
Let R be an integral domain. Two nonzero elements a and b possess a greatest common
divisor if and only if they possess a least common multiple.
The reader may have noticed with some dissatisfaction that this section did not show how to
determine if certain elements in an integral domain are irreducible or prime, or how to find a greatest
common divisor of two elements if they exist. Such questions are much more difficult in arbitrary
integral domains than they are in Z. As subsequent sections attempt to generalize the results of
elementary number theory to commutative rings, we will introduce subclasses of integral domains
in which such question are tractable. In the meantime, we give the following definition.
Definition 6.1.17
An integral domain in which every two nonzero elements have a greatest common divisor
is called a gcd-domain.
(b) Prove by a counterexample that if r is an irreducible element in R, then ϕ(r) is not necessarily
irreducible in S.
√ √
15. Prove that there are no elements α √∈ Z[ 10] with N (α) = 3. Conclude that the elements 7 + 2 10
and 3 are irreducible elements in Z[ 10].
16. Let R be an integral domain. Prove that, if it exists, a least common multiple of a, b ∈ R is a generator
for the (unique) largest principle ideal containing (a) ∩ (b). Conclude that in a PID, if (a) ∩ (b) = (m),
then m is a least common multiple of a and b.
17. Prove that in a PID, every irreducible element is prime.
18. Let R be an integral domain and let a, b ∈ R. Prove that a and b have a greatest common divisor d
with a = dk and b = d`, then k and ` are relatively prime.
√
19. Prove that Proposition 2.1.14 does not hold when we replace Z with Z[ 10].
20. Let p1 , p2 , q1 , q2 be irreducible elements in an integral domain R such that none are associates to any
of the others and p1 p2 = q1 q2 . Prove that p1 q1 q2 and p1 p2 q1 do not have a greatest common divisor.
21. Prove that if a least common multiple m of a and b exists in an integral domain, then (m) is the
largest principal ideal contained in (a) ∩ (b).
22. Prove Proposition 6.1.16. [Hint: For one direction of the if and only if statement, see Proposi-
tion 2.1.16.]
In Exercises 6.1.23 through 6.1.28 the ring R is a gcd-domain. Furthermore, for two elements a, b ∈ R, we
define gcd(a, b) as a greatest common divisor, well-defined up to multiplication of a unit and we also define
lcm(a, b) as a least common multiple, well-defined up to multiplication of a unit.
23. Prove that gcd(a, b) lcm(a, b) ' ab for all nonzero a, b ∈ R.
24. Prove that gcd(a, gcd(b, c)) ' gcd(gcd(a, b), c) and also that lcm(a, lcm(b, c)) ' lcm(lcm(a, b), c) for
all nonzero a, b, c ∈ R.
25. Prove that gcd(ac, bc) ' gcd(a, b)c for all nonzero a, b, c ∈ R. Prove also that lcm(ac, bc) ' lcm(a, b)c.
26. Prove that gcd(a, b) ' 1 and gcd(a, c) ' 1 if and only if gcd(a, bc) ' 1.
27. Prove that if gcd(a, b) ' 1 and a|bc, then a|c.
28. Prove that if gcd(a, b) ' 1, a|c, and b|c, then ab|c.
29. A Bézout domain is an integral domain in which the sum of any two principal ideals is again a principal
ideal. Prove that a Bézout domain is a gcd-domain.
6.2
Rings of Fractions
One way to deal with questions of divisibility in a ring is to force certain elements to be units. We
already encountered this process in the definition of the rational numbers in reference to the integers.
Most people first encounter fractions so early in their education that a precise construction was not
appropriate at that time. We give one here.
A fraction r = ab consists of a pair of integers (a, b) ∈ Z × Z∗ . However, some fractions are
considered equivalent. For example, since we use a fraction to represent a ratio, we must have
This is not yet a good definition for a relation since it does not give a criteria for when two arbitrary
pairs are in relation. A complete expression of the equivalence relation ∼ is
One can see this definition follows from the requirement from the simple calculation
Then the set of equivalence classes consists of all fractions along with one more element, the equiva-
lence class of (1, 0), which contains every pair (a, 0). This is sometimes called the integral projective
line and instead of writing the equivalence class of (a, b) with the fraction notation, they are written
as (a : b). We can also think of this as the set of all integer ratios. Even if this set may be interesting
for certain applications, it does not carry a ring structure with the usual addition and multiplication
because multiplication is not defined for (1 : 0) × (0 : 1).
Consequently, any construction of fractions that allows a 0 in the denominator either reduces the
ring down to the trivial ring of one element or does not produce a ring structure. Despite this, it is
possible to define rings of fractions in which the denominators are zero divisors.
Because the cancellation law does not apply in rings with zero divisors, it is possible for ad2 u = bd1 u
for some u without ad2 = bd1 . So the relation ∼ on R × D can be defined symmetrically by
Proposition 6.2.1
The relation ∼ given in (6.4) is an equivalence relation on R × D.
Proof. For all (a, d) ∈ R × D, ad − ad = 0 so there does exist a u ∈ D (any u in D) such that
(ad − ad)u = 0. Hence, ∼ is reflexive.
Suppose that (a, d1 ) ∼ (b, d2 ). Then (ad2 − bd1 )u = 0 for some u ∈ D. Then
Hence, multiplying the first equation by vd3 and the second by ud1 , we get (ad2 d3 − bd1 d3 )uv = 0
and (bd3 d1 − cd2 d1 )uv = 0. Adding these, and canceling bd1 d2 uv, we deduce that
Definition 6.2.2
Let R be a commutative ring. Let D be any nonempty subset of R that does not contain
0 and is closed under multiplication. The set of fractions of R with denominators in D
is the set of ∼-equivalence classes on R × D. We denote this set by D−1 R and write the
a
equivalence class for (a, d) by .
d
Theorem 6.2.3
Let R be a commutative ring and D a nonempty multiplicatively closed set that does not
contain 0. The operations defined in (6.3) are well-defined on D−1 R (i.e., are independent
of choice of representative for a given equivalence class). Furthermore, these operations
give D−1 R the structure of a commutative ring with a 1 6= 0.
Proof. Suppose that da1 = db2 and dr3 = ds4 are elements in D−1 R with (ad2 − bd1 )u = 0 and
(rd4 − sd3 )v = 0 for some u, v ∈ D. Adding the fractions gives
a r ad3 + rd1 b s bd4 + sd2
+ = and + = .
d1 d3 d1 d3 d2 d4 d2 d4
To see that these two fractions are equal, note that
(ad3 + rd1 )d2 d4 − (bd4 + sd2 )d1 d3 uv
= (ad2 − bd1 )ud3 d4 v + (rd4 − sd3 )vd1 d2 u
= 0d3 d4 v + 0d1 d2 u = 0.
That addition and multiplication are commutative on D−1 R follows from the commutativity of
addition and multiplication on R. The addition of fractions is associative with
a b c ad2 d3 + bd1 d3 + cd1 d2
+ + =
d1 d2 d3 d1 d2 d3
regardless of which + is performed first. Similarly, the multiplication of fractions is associative.
For any d ∈ D, an element of the form d0 is the additive unit in D−1 R because
a 0 ad + 0 ad a
0
+ = 0
= 0 = 0.
d d dd dd d
a −a d
The additive inverse to d is just d . Furthermore, any element of the form d satisfies
a d ad a
× = 0 = 0
d0 d dd d
for all da0 ∈ D−1 R so dd is a multiplicative unit.
The only remaining axiom, distributivity of × over +, is left as an exercise for the reader. (See
Exercise 6.2.1.)
Definition 6.2.4
We call D−1 R, equipped with + and × as given in (6.3), the ring of fractions of R with
denominators in D.
We emphasize that the construction given in Theorem 6.2.3 directly generalizes the construction
of Q from Z where R = Z and D = Z∗ . It is not hard to show that if we had taken D = Z>0 , we
would get a ring of fractions that is isomorphic to Q.
There is always a natural homomorphism ϕ : R → D−1 R given by ϕ(r) = rd d for some d ∈ D.
rd rd0 0
By the equivalence of fractions, d = d0 for any d, d ∈ D so the choice of d ∈ D is irrelevant for the
definition of ϕ. In the case of Z and Q, the homomorphism is ϕ(n) = n1 . However, in the general
situation, we must define ϕ as above because D does not necessarily contain 1.
Though the natural homomorphism Z → Q is injective, the function ϕ is not necessarily injective
sd0
in the case of arbitrary commutative rings. For r, s ∈ R, ϕ(r) = ϕ(s) gives rd d = d0 for some
d, d0 ∈ D, which implies that
(rdd0 − sd0 d)u = 0 ⇐⇒ (r − s)dd0 u = 0
for some u ∈ D. If D contains no zero divisors, then ϕ(r) = ϕ(s) implies that r − s = 0 so r = s
and hence ϕ is injective. Conversely, if D does contain a zero divisor d with bd = 0 and b 6= 0, then
bd 0
ϕ(b) = = = ϕ(0)
d d
and ϕ is not injective. This discussion establishes the following lemma.
Lemma 6.2.5
The function ϕ : R → D−1 R defined by ϕ(r) = rd d for some d ∈ D, is injective if and only
if the multiplicatively closed subset D contains no zero divisors.
Note that when R is an integral domain, the condition in Lemma 6.2.5 is always satisfied.
Proposition 6.2.6
Let R be a commutative ring and D a multiplicatively closed subset that does not contain
0 or any zero divisors. Then D−1 R contains a subring isomorphic to R. Furthermore, in
this embedding of R in D−1 R, every element of D is a unit in D−1 R.
278 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS
Proof. By Lemma 6.2.5, the function ϕ is injective so by the First Isomorphism Theorem, R is
2
isomorphic to Im ϕ. Let d be any element in D. We can view d in D−1 R as the element dd . But
2 3
D−1 R contains the element dd2 and it is easy to see that dd × dd2 = dd3 = 1 in D−1 R.
Example 6.2.7. Let R = Z and let D = {1, a, a2 , a3 , . . .} for some positive integer a. Then D−1 R
consists of all the fractions
n
with n ∈ Z and k ∈ N.
ak
This is a ring that is also a subring of Q. Recall that we denote this ring by Z a1 .
4
Example 6.2.8. Let R = Z[x] and let D = {(1 + x)n }n≥0 . The ring D−1 R consists of all rational
expressions of the form
p(x)
(1 + x)n
where p(x) ∈ Z[x] and n ∈ N. The units in this ring are all polynomials of the form ±(1 + x)k for
k ∈ Z. The ring has no zero divisors. 4
construction. If R is a commutative
Examples 6.2.7 and 6.2.8 are particular examples of a general
ring and a is an element that is not nilpotent, then R a1 is the ring of fractions D−1 R where
D = {a, a2 , a3 , . . .}.
If R is an integral domain then the set D = R − {0} is multiplicatively closed and does not
contain 0. In D−1 R, every nonzero element of R becomes a unit, so D−1 R is a field.
Definition 6.2.9
If R is an integral domain, and if D = R − 0, then D−1 R is called field of fractions of R.
This field of rational expressions over R is usually denoted by R(x), in contrast to R[x]. 4
6.2. RINGS OF FRACTIONS 279
Viewing an integral domain R as a subring in its field of fractions F , for a, b ∈ R with a 6= 0, the
element a divides b if and only if ab ∈ R.
Though some texts only discuss rings of fractions in the context of integral domains, the defi-
nitions provided in this section only require R to be commutative. The following example presents
a ring of fractions in which the denominator allows for zero divisors. Note, however, that since D
must be multiplicatively closed and not contain 0, then D cannot contain nilpotent elements.
Example 6.2.12. Let R be the quotient ring R = Z[x]/(x2 − 1). (For simplicity, we omit the over-
line in a + bx notation.) We see that x − 1 and x + 1 are zero divisors. Consider the multiplicatively
closed set D = {(1 + x)k | 0 ≤ k}. In R, we have x2 − 1 = 0, so x2 = 1. Then
x−1 x2 − 1 0 0
= = = .
(x + 1)n (x + 1)n+1 (x + 1)n+1 1
2n a + 2 m b 2n a + 2m b 2n a(x + 1) + 2m b(x + 1)
a b
f m
+ n =f = =
2 2 2m+n (x + 1)m+n (x + 1)m+n+1
n m
2 a(x + 1) 2 b(x + 1) a(x + 1)n+1 b(x + 1)m+1
= + = +
(x + 1)m+n+1 (x + 1)m+n+1 (x + 1)m+n+1 (x + 1)m+n+1
a b a b
= m
+ n
=f m +f .
(x + 1) (x + 1) 2 2n
Furthermore,
a b
a b ab ab a b
f =f = = =f m f
2m 2n 2m+n (x + 1)m+n (x + 1)m (x + 1)n 2 2n
a+bx a+b
so f is a ring homomorphism. Since (x+1) k = (x+1)k , then f is surjective. However, the kernel of f
consists of all a/2k such that a/(x + 1)k = 0/1 in D−1 R, which means a = 0. Thus, Ker f = {0} so
f is injective and thus an isomorphism. We have shown that D−1 R is isomorphic to Z 12 .
Even though R contains zero divisors, the ring of fractions D−1 R = Z 12 is an integral domain.
By Proposition 5.1.12, zero divisors are not units in a ring. However, taking the ring of fractions
construction forced the zero divisor (x + 1) to become a unit. In the process, the element x − 1
became 0 under the usual function ϕ : R → D−1 R. 4
The above example illustrates that if D contains a zero divisor a with ab = 0 for some element
b, in the usual homomorphism ϕ : R → D−1 R, ϕ(a) is a unit and ϕ(b) = 0.
The last example of the section is an important case of ring of fractions.
Example 6.2.13 (Localization). Let R be a commutative ring and let P be a prime ideal. The
subset D = R − P is multiplicatively closed. Indeed, since P is a prime ideal ab ∈ P implies a ∈ P or
b ∈ P . Taking the contrapositive of this implication statement gives (and careful to use DeMorgan’s
280 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS
D 6= {1, 6, 62 , . . .}.
3. Let D = {2a 3b 5c | a, b, c ∈ N} as a subset of Z. Prove that D−1 Z is isomorphic to Z 30
1
.
4. Let D be the subset in Z of all positive integers that are products of powers of primes of the form
4k + 1. Prove that D is multiplicatively closed. Prove also that D−1 Z is neither isomorphic to Z n1
[Such series are called formal Laurent series and F ((x)) is called the field of formal Laurent series
over F . See also Exercise 5.4.29.]
6.3. EUCLIDEAN DOMAINS 281
6.3
Euclidean Domains
As mentioned in its introduction, this chapter gathers together topics related to divisibility. In the
previous section, we discussed rings of fractions, a construction that forces certain elements to be
units. In particular, if R is an integral domain, a nonzero element a divides b if and only if, in the
field of fractions, ab is in the subring R. However, much of the theory of divisibility of integers does
not rely on the ability to take fractions of any nonzero elements.
This section introduces Euclidean domains, rings in which it is possible to perform something
akin to the integer division algorithm.
6.3.1 – Definition
Definition 6.3.1
Let R be an integral domain. A Euclidean function on R is any function d : R − {0} → N
such that
(1) For all a, b ∈ R with a 6= 0, there exist q, r ∈ R such that
We call any expression of the form (6.5) a Euclidean division of b by a. It is not uncommon to
call q a quotient and r a remainder in the Euclidean division.
The Integer Division Theorem (Theorem 2.1.7) states that for a, b ∈ Z with a 6= 0 there exist
unique q, r such b = aq + r and 0 ≤ r < |a|. The above definition does not require uniqueness of q
and r, simply existence. The Integer Division Theorem establishes that the integers are a Euclidean
domain with d(x) = |x|. However, according to Definition 6.3.1, for any given a and b either b = aq
or, the two possibilities of b = aq + r with 0 < r < |a| or b = aq 0 + r0 with −|a| < r0 < 0. In these
latter two possibilities, r0 = r − |a| and q 0 = q + sign(a).
Example 6.3.2 (Gaussian Integers). This example shows that the ring of Gaussian integers Z[i]
is a Euclidean domain with Euclidean function d(z) = |z|2 .
First note that for all nonzero α, β ∈ Z[i],
d(αβ) = |αβ|2 = |α|2 |β|2 = d(α)d(β).
Since d(α) ≥ 1 for all nonzero α, then d(β) ≤ d(αβ).
Let α, β ∈ Z[i] with α 6= 0. The ring Z[i] is a subring of the field C. When we divide β by α as
complex numbers, the result is of the form r + si where r, s ∈ Q. Let p and q be the closest integers
to the r and s respectively so that |p − r| ≤ 21 and |q − s| ≤ 12 . Then
β = (p + qi)α + ρ
β 1 1 1
where ρ ∈ Z[i]. Let θ = α − (p + qi) as an element in Q[i]. Then αθ = ρ and also d(θ) ≤ + = .
4 4 2
Hence, d(ρ) = d(αθ) = d(θ)d(α) ≤ 12 d(α). In particular, d(ρ) < d(α).
This establishes that d(z) = |z|2 is a Euclidean function and that Z[i] is a Euclidean domain.
We illustrate this Gaussian integer division with two explicit examples. Let β = 19 − 23i and
α = 5 + 3i. In Q[i], we have
β 13 86
= − i.
α 17 17
282 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS
The closest integers to the real and imaginary parts of this ratio are p = 1 and q = −5. Then
In the four possibilities, the different remainders are the four associates to 1 + 4i and have the
Euclidean function d(ρ) = 17 < d(α) = 34. 4
The Euclidean function in a Euclidean domain R offers some connection between R and the ring
of integers. This connection, as loose as it is, is enough to establish some similar properties between
R and Z. The following proposition is one such example.
Proposition 6.3.3
Let I be an ideal in a Euclidean domain R. Then I = (a) where a is an element of minimum
Euclidean function value in the ideal. In particular, every Euclidean domain is a principal
ideal domain.
Proof. Let d be the Euclidean function of R and consider S = {d(c) | c ∈ I}. By the well-ordering
principle, since S is a subset of N, it contains a least element n. Let a ∈ I be an element such that
d(a) = n.
Clearly (a) ⊆ I since a ∈ I. Now let b be any element of I and consider the Euclidean division
of b by a. We have b = aq + r where r = 0 or d(r) < d(a). Since a, b ∈ I, then r = b − aq ∈ I so
by the minimality of d(a) in S it is not possible for d(r) < d(a). Hence, r = 0, which implies that
b = aq and hence b ∈ (a). This shows that I ⊆ (a) and thus I = (a).
This process must terminate because d(r1 ) = d(a) is finite and because the sequence d(ri ) is a
strictly decreasing sequence of nonnegative integers. It is possible that rn−2 = qn−1 rn−1 + rn with
d(rn ) = 0 but rn 6= 0; in this case, the axioms of Euclidean domain force rn+1 = 0.
6.3. EUCLIDEAN DOMAINS 283
Unlike the Euclidean Algorithm on the integers, there is no condition on the uniqueness of the
elements that are involved in the Euclidean divisions in (6.6). As with integers, the Euclidean
Algorithm leads to the following important theorem.
Theorem 6.3.4
Let R be a Euclidean domain and let a and b be nonzero elements of R. Let r be the last
nonzero remainder in the Euclidean algorithm. Then r is a greatest common divisor of a
and b. Furthermore, r can be written as r = ax + by where x, y ∈ R.
Proof. Let r = rn be the final nonzero remainder in the Euclidean Algorithm. Then from the final
step r|rn−1 . Suppose that r|rn−i and r|rn−(i+1) . Then since
rn−(i+2) = qn−(i+1) rn−(i+1) + rn−i ,
by Exercise 6.1.2, r divides rn−(i+2) . By induction, r divides rn−i for all i = 0, 1, . . . , n. Thus, r
divides both a and b.
Now let s be any common divisors of a and b. Repeating an induction argument but starting at
the beginning of the Euclidean Algorithm, it is easy to see that s divides rk for all k = 0, 1, . . . , n.
In particular, s divides r. This proves that r is a greatest common divisor of a and b.
Again using an induction argument, starting from the beginning of the Euclidean Algorithm, it
is easy to see that rk ∈ (a, b), the ideal generated by a and b. Hence, r ∈ (a, b) and thus r = ax + by
for some x, y ∈ R.
Example 2.1.13 illustrated the use of the Extended Euclidean Algorithm in Z to find x, y ∈ Z
such that r = ax + by. The process described there carries over identically to Euclidean domains.
Theorem 6.3.5
Let F be a field. Then for all a(x), b(x) ∈ F [x] with b(x) 6= 0, there exist unique polynomials
q(x) and r(x) in F [x] such that
b(x) = a(x)q(x) + r(x) with r(x) = 0 or deg r(x) < deg a(x). (6.7)
Proof. First note that if b(x) = 0 then q(x) = r(x) = 0 and we are done. We need to show
that this is the unique solution to (6.7). If q(x) 6= 0, then a(x)q(x) 6= 0. Since r(x) = 0 or
deg r(x) < deg a(x) ≤ deg(a(x)q(x)), then adding r(x) to a(x)q(x) cannot cancel out the leading
term of a(x)q(x). Hence, a(x)q(x) + r(x) 6= 0. This is a contradiction, so we must have q(x) = 0
and thus r(x) = 0 also. From now on, assume that b(x) 6= 0.
Suppose that deg b(x) < deg a(x). Obviously, the choice of polynomials of q(x) = 0 and r(x) =
b(x) satisfies (6.7) but we need to show that this solution is unique. Since deg q(x)a(x) = deg q(x) +
deg a(x) ≥ deg a(x) and since we impose the condition that r(x) = 0 or that deg r(x) < deg a(x), then
for any polynomials q(x) and r(x) with q(x) 6= 0, we have deg(q(x)a(x)+r(x)) ≥ deg a(x) > deg b(x).
Hence, if deg b(x) < deg a(x) then we must have q(x) = 0 and b(x) = r(x).
Suppose now that deg b(x) ≥ deg a(x). Consider polynomials of the form b(x) − a(x)q(x). Then
for distinct polynomial q1 (x) 6= q2 (x) polynomials in F [x] we have
(b(x) − a(x)q1 (x)) − (b(x) − a(x)q2 (x)) = a(x)(q2 (x) − q1 (x)).
Hence, any two distinct polynomials of the form b(x) − a(x)q(x) differ by a polynomial with degree
at least deg a(x).
284 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS
If a(x) | b(x) then there exists d(x) with b(x) = a(x)d(x). This would mean that r(x) = 0. So
q(x) = d(x) and r(x) = 0 is a solution to (6.7). We need to show that it is unique. If we can write
b(x) = a(x)q(x) − r(x) in some other way, then
r(x) = (b(x) − a(x)q(x)) − (b(x) − a(x)d(x))
would be a polynomial of degree at least deg a(x). This contradicts the conditions in (6.7). Hence,
when a(x) | b(x) the solution for (6.7) is unique.
Now assume that deg b(x) ≥ deg a(x) and a(x) does not divide b(x). Consider the set of non-
negative integers {deg(b(x) − q(x)a(x)) | q(x) ∈ F [x]}. By the Well-Ordering Principle, this set has
a least element. Let r0 (x) = b(x) − a(x)q0 (x) be a polynomial of the form b(x) − q(x)a(x) of least
degree. Suppose that deg r0 (x) ≥ deg a(x). Then r0 (x) − LT(r0 (x))/LT(a(x))a(x) has degree lower
than r0 (x) because the subtraction cancels the leading term of r0 (x). Thus,
LT(r0 (x))
r2 (x) = b(x) − q(x) + a(x)
LT(a(x))
has degree strictly lower than r0 (x), which contradicts the condition that r0 (x) has minimal degree
among polynomials of the form b(x) − a(x)q(x). Hence, we can conclude that the polynomial r0 (x)
has degree strictly lower than a(x).
Suppose that r1 (x) and r2 (x) are two distinct polynomials of minimal degree of the form b(x) −
a(x)q(x). We must have deg r1 (x) = deg r2 (x). Write ri (x) = b(x) − a(x)qi (x). Suppose that k is
the highest degree where the terms of r1 (x) and r2 (x) differ. Let ci be the coefficient of the kth
degree term of ri (x). Then
c2 b(x) − a(x)q1 (x) − c1 b(x) − a(x)q2 (x) = c2 r1 (x) − c1 r2 (x)
has degree strictly lower than deg r1 (x) = deg r2 (x). Then
1 1
b(x) − a(x) c2 q1 (x) − c1 q2 (x) = c2 r1 (x) − c1 r2 (x) .
c2 − c1 c2 − c1
Thus, we have shown that there is a unique polynomial r0 (x) of the form b(x) − a(x)q(x) of least
degree. Furthermore, the corresponding q0 (x) = q(x) is unique also. Since any two polynomials of
the form b(x) − a(x)q(x) differ by a polynomial of degree at least deg a(x), this polynomial r0 (x)
is in fact the unique polynomial of the form b(x) − a(x)q(x) with degree strictly less than deg a(x).
The theorem follows.
Corollary 6.3.6
For any field F , the polynomial ring F [x] is a Euclidean domain with deg as the Euclidean
function.
Proof. The only thing that Theorem 6.3.5 did not establish is that for any two nonzero polynomials
a(x), b(x) ∈ F [x], deg b(x) ≤ deg(a(x)b(x)). However, this follows from the fact that the degree of a
nonzero polynomial is nonnegative and that in F [x],
deg(a(x)b(x)) = deg a(x) + deg b(x).
Theorem 6.3.3 establishes that the polynomial ring F [x] is also a principal ideal domain. In
contrast, note that in the ring Z[x], the ideal (2, x) is not principal so also by Theorem 6.3.3, the
ring Z[x] is not a Euclidean domain.
An important consequence of this proposition is that it characterizes irreducible polynomials.
Proposition 6.3.7
Let F be a field and p(x) ∈ F [x]. The polynomial p(x) is irreducible if and only if
F [x]/(p(x)) is a field.
6.3. EUCLIDEAN DOMAINS 285
The Euclidean division in F [x] is called polynomial division. The proof of Theorem 6.3.5 is
nonconstructive; the existence of q(x) and r(x) follow from the Well-Ordering Principle of the integers
but the proof does not illustrate how to find q(x) and r(x). Polynomial division is sometimes taught
in high school algebra courses but without justification of why it works. We review polynomial
division here for completeness.
Constructive polynomial division relies on the following fact. If a(x), b(x) ∈ F [x] are such that
deg b(x) ≥ deg a(x), then LT(b(x))/LT(a(x)) is a monomial and
LT(b(x))
p1 (x) = b(x) − a(x)
LT(a(x))
has a degree lower than b(x) because the leading term of b(x) cancels out. By repeating this process
on p1 (x) to obtain p2 (x) and so on, we ultimately obtain a polynomial that is 0 or is of degree less
than a(x). We illustrate the division with two examples.
Example 6.3.8. Consider the polynomials a(x) = 2x2 + 1 and b(x) = 3x4 − 2x + 7 in Q[x]. At the
first step, we find a monomial by which to multiply a(x) in order to obtain the leading term of b(x).
This is 23 x2 . As with long division in base 10, we put the monomial on top of the quotient bar:
3 2
2x
.
2x2 + 1 3x4 −2x+7
3 2
2x
− 32 x2 −2x+7
3 2
2x − 34
2x2 + 1 3x4 −2x+7
− 3x4 + 32 x 2
.
− 23 x2 −2x+7
− − 23 x2 − 34
−2x+ 31
4
This polynomial division algorithm ends at this stage since deg −2x + 31 < deg(2x2 + 1). This
4
work shows that
3 2 3 31
3x4 − 2x + 7 = (2x2 + 1) x − + −2x +
2 4 4
is the polynomial division of b(x) by a(x). 4
286 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS
Example 6.3.9. As another example, we perform the polynomial division of 4x4 + x3 + 2x2 + 3 by
3x2 + 4x + 1 in F5 [x]:
3x2 +3x +4
2
3x + 4x + 1 4x +x +2x2
4 3
+3
−(4x4 +2x3 +3x2 )
4x3 +4x2 +3
3 2 .
−(4x +2x +3x)
2x2 +2x +3
−(2x2 +x +4)
x +4
We read this as 4x4 + x3 + 2x2 + 3 divided by 3x2 + 4x + 1 in F5 [x] has quotient q(x) = 3x2 + 3x + 4
and remainder r(x) = x + 4. 4
I = (2x4 + 7x3 + 4x2 + 13x − 10, 3x4 + 5x3 − 16x2 + 14x − 4).
√ √
11. Prove that Z[ 2] is a Euclidean domain with the norm N (a + b 2) = |a2 − 2b2 | as the Euclidean
function.
√ √
12. Perform the Euclidean division of β = 10 + 13 2 by α = 2 + 3 2. (See Exercise 6.3.11.)
√ √
13. Perform the Euclidean division of β = 25 − 3 2 by α = −1 + 13 2. (See Exercise 6.3.11.)
√ √
14. Prove that Z[ −2] is a Euclidean domain with the norm N (a + b −2) = a2 + 2b2 as the Euclidean
function.
15. Let R be an integral domain. Prove that R[x] is an Euclidean domain if and only if R is a field.
16. In Q[x], determine the monic greatest common divisor d(x) of a(x) = x4 −2x+1 and b(x) = x2 −3x+2.
Using the Extended Euclidean Algorithm, write d(x) as a Q[x]-linear combination of a(x) and b(x).
17. In F2 [x], determine the greatest common divisor d(x) of a(x) = x4 + x3 + x + 1 and b(x) = x2 + 1.
Using the Extended Euclidean Algorithm, write d(x) as a F2 [x]-linear combination of a(x) and b(x).
18. Prove Proposition 6.3.7. [Hint: In a PID, an element is prime if and only if it is irreducible.] Conclude
that for all nonzero polynomials p(x) ∈ F [x], the quotient ring F [x]/(p(x)) is either a field or is not
an integral domain.
19. Let R be a Euclidean domain with Euclidean function d. Let n be the least element in the set
S = {d(r) | r ∈ R − {0}}. By the well-ordering of Z, S has a least element n. Show that all elements
s ∈ R such that d(s) = n are units.
6.4. UNIQUE FACTORIZATION DOMAINS 287
20. Least Common Multiples. Let R be a Euclidean domain with Euclidean function d.
(a) Prove that in R, any two nonzero elements a and b have a least common multiple.
ab
(b) Prove that least common multiples of a and b have the form where d is a greatest common
d
divisor.
6.4
Unique Factorization Domains
The Fundamental Theorem of Arithmetic states that any integer n ≥ 2 can be written as a prod-
uct of positive prime numbers and that any such product or primes is unique up to reordering.
This property can also be stated by saying that integers have unique prime factorizations. The
Fundamental Theorem of Arithmetic is taught early in a student’s education, as soon as students
know what prime numbers are. However, like many other algebraic properties of the integers, since
students usually are not shown a proof of the Fundamental Theorem of Arithmetic it comes as a
surprise that unique factorization does not hold in every integral domain.
As we did in Section 6.3, it is common in ring theory and in all of algebra more generally, to
define a class of rings with specific properties and explore what further properties follow from the
defining characteristics. This section introduces unique factorization domains: rings that possess a
property like the Fundamental Theorem of Arithmetic.
Definition 6.4.1
A unique factorization domain (abbreviated as UFD) is an integral domain R in which
every nonzero, nonunit element r ∈ R has the following two properties:
• r can be written as a finite product of irreducible elements r = p1 p2 · · · pn (not
necessarily distinct);
• and for any other product into irreducibles, r = q1 q2 · · · qm , then m = n and there is
a reordering of the qi such that each qi is an associate to pi .
Proposition 6.4.2
In any UFD, a nonzero element is prime if and only if it is irreducible.
Proof. By Proposition 6.1.9, every prime element is irreducible. We prove the converse in a UFD.
Let R be a UFD and let r ∈ R be an irreducible element. Suppose that a, b ∈ R and that r|ab.
Then by definition of divisibility, there exists c ∈ R such that rc = ab. Suppose that a, b, and c
have the following factorizations into irreducible elements
a = p1 p2 · · · pm , b = q1 q2 · · · qn , c = r1 r2 · · · rk .
288 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS
By definition of a UFD, since ab = rc, we have k = m + n − 1 and there is a reordering of the list
(p1 , p2 , . . . , pm , q1 , q2 , . . . , qn )
into
(r, r1 , r2 , . . . , rk )
possibly up to multiplication of units. Thus, there exists r | pi for some i = 1, . . . , m or r | qj for
some j = 1, . . . , n. If r divides some pi , then r | a and if r divides some qj , then r | b. Hence, r is a
prime element.
Because of this proposition, in a UFD we call the factorization of an element r into irreducible
elements a prime factorization of r. Furthermore, the irreducible factors of an element are called
prime factors.
Example 6.4.3. Every field is a UFD trivially since every nonzero element is a unit. 4
Example 6.4.4. The Fundamental Theorem of Arithmetic (Theorem 2.1.22) is precisely the state-
ment that Z is a UFD. 4
Example 6.4.5. We will soon see that every Euclidean domain√is a unique factorization domain.
Hence, Z[i] is a UFD. The norm function on rings of the form Z[ D], where D is square-free, helps
in determining the prime factorization of elements. Recall that in such rings: (1) γ is a unit if and
only if N (γ) = 1, and (2) γ is irreducible if N (γ) is prime. (Note that (2) is not an if-and-only-if
statement.) With this in mind, we propose to find a prime factorization of α = 6 + 5i and then of
β = 7 − 11i.
First of all, note that N (α) = 62 + 52 = 61 is a prime so α is irreducible and we are done.
For β, we have N (β) = 72 + 112 = 170 = 2 × 5 × 17. This is not sufficient to conclude that β is
irreducible or not but it does tell us that an irreducible factor must have a norm that is a divisor
of 170. We can try to find prime factors by trial and error. Note that N (1 + i) = 2, so 1 + i is
irreducible and could possibly be a factor. Dividing β by 1 + i gives
β = (1 + i)(−2 − 9i)
so 1 + i is indeed a prime factor. The norm N (−2 − 9i) = 65 gives some possibilities on the prime
factors of −2 − 9i. We observe that N (2 + i) = 5. However,
−2 − 9i 13 16
=− − i
2+i 5 5
so 2 + i is not a prime factor. On the other hand, 2 − i, which is not an associate of 2 + i, has norm
5 and we find that
−2 − 9i
= 1 − 4i.
2−i
Hence, 2 − i is a prime factor, as is 1 − 4i since N (1 − 4i) = 17. Hence, a prime factorization of β is
7 − 11i = (1 + i)(2 − i)(1 − 4i). 4
√
Example 6.4.6. Example 6.1.10 shows that Z[ 10] is not a UFD. The example established that
the elements α, β, γ, δ are all irreducible, not associates of each other and that 10 = αβ = γδ. 4
Example 6.4.7. As a result of the main theorem (Theorem 6.5.5) in the next section, we will see
that F [x, y], where F is a field, is a UFD. However, it is rather easy to construct examples of subrings
of F [x, y] that are not UFDs. For example, consider the ring R = F [x2 , xy, y 2 ]. All the constants
in F − {0} are units in R and R contains no polynomials of degree 1. Consequently, x2 , xy, and y 2
are irreducible elements in R. However, x2 y 2 can be factored into irreducible elements as
(x2 )(y 2 ) = x2 y 2 = (xy)(xy).
Furthermore, xy is not an associate of either x2 or y 2 . (If it were, x or y would need to be a unit,
which is not the case.) Hence, this gives two nonequivalent factorizations of x2 y 2 . 4
6.4. UNIQUE FACTORIZATION DOMAINS 289
√ √ √
2 −5 3 (1 − −5) (1 + −5)
√
Example 6.4.8. For √ an easier example of a ring that is not a√UFD, consider R = Z[ −5]. Recall
that an element a + b −5 ∈ R is a unit if and only if N (a + b −5) = a2 + 5b2 = 1. It is not hard
to see that the only units in R are 1 and −1. It is also easy to see that there is no element γ ∈ R
such that N (γ) = 2 or N (γ) = 3. Now consider the following two factorizations of the element 6:
√ √
6 = 2 × 3 = (1 + −5)(1 − −5).
√ √ √
We have N (2) = 4, N (3) = 9, and N (1 ± −5) = 6. If 2, 3, 1 + −5 or 1 − −5 was reducible,
then there would have to exist an element γ ∈ R of norm 2 or 3. Since that is not the case, all
four elements are irreducible.
√ Furthermore,
√ since the only units are 1 and −1, neither 2 nor 3 is an
associate to either
√ 1 + −5 or 1 − −5. Hence, we have displayed two distinct factorizations of 6
and hence Z[ −5] is not a UFD. 4
√
The ring Z[ −5] in the above example offers an opportunity to provide a visual interpretation
of the criteria of a UFD.
Let R be an integral domain and consider the equivalence classes of associate elements. By
Proposition 6.1.4, the set of equivalence classes (R − {0})/ ' becomes a partial order under the
relation of divisibility. Consider the Hasse diagram of this partial order and define ~v (a) as the
displacement vector (in R2 ) from the location of 1 to the element a ∈ (R − {0})/ '. In particular,
~v (1) = ~0. Suppose all the irreducible elements are on a first level (a horizontal line) above the least
element 1.
If R is a UFD, then the Hasse diagram of the poset ((R − {0})/ ∼, |) can be constructed in such
a way that if a ∈ (R − {0})/ ' has a factorization into irreducibles as a = r1 r2 · · · rm , then the
vector from 1 to a is the vector
regardless of the choice of vectors ~v (r) for each irreducible r. This claim holds because (1) the
Hasse diagram is for classes of associates and (2) the reordering of irreducibles is tantamount to the
commutativity of vector addition. Then each element a ∈ R is located with respect to 1 by a vector
~v (a) that is an integral linear combination of vectors of the form ~v (r) where r is an irreducible.
If R is not a UFD, then two nonequivalent factorizations of an element a ∈ R leads to an integral
linear combination of the vectors corresponding to irreducible elements. Thus, the Hasse diagram
cannot be constructed as described above unless certain linear combinations hold between the vectors
~v (ri ) corresponding to irreducible elements ri . √
For example, Figure 6.2 shows a few of the irreducible elements in Z[ −5]. Furthermore, we
placed these irreducible√elements “randomly.”
√ The edges are dashed and grayed to represent mul-
tiplication by 2, 3, 1√+ −5, or√1 − −5. The diagram has 6 located so that ~v (6) = ~v (2) + ~v (3).
However, 6 = (1 − −5)(1 + −5) is another factorization in irreducible elements, but in this
diagram √ √
~v (6) 6= ~v (1 − −5) + ~v (1 + −5).
This can be seen directly by the fact that the dashed edges do not form a parallelogram.
290 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS
Lemma 6.4.9
If R is a PID, then every chain of ideals I1 ⊆ I2 ⊆ · · · ⊆ Ik ⊆ · · · eventually terminates,
i.e., there exists a k such that for all n ≥ k, In = Ik .
is an ideal of R. Since R is a PID, then I = (a) for some a. However, a ∈ Ik for some k. But then
I = (a) ⊂ In for all n ≥ k but since In ⊂ I we must have In = (a) = I for all n ≥ k. Hence, the
chain terminates.
Proposition 6.4.10
Every PID is a UFD.
Proof. We first show by contradiction that every PID satisfies the first axiom for a UFD. Suppose
that r ∈ R cannot be written as a finite product of irreducibles. This tells us first that r is not
irreducible so we can write r = r1 b1 where neither factor is a unit. The assumption also implies that
at least one of these is not irreducible, say r1 . Then as ideals (r) ( (r1 ). Furthermore, we can write
r1 = r2 b2 and once again at least one of these is not an irreducible element, say r2 . Continuing with
this reasoning, we define an ascending chain of ideals
which never terminates. This is a contradiction by Lemma 6.4.9. Hence, we conclude that every
element in a PID can be written as a finite product of irreducible elements.
We now need to show the second axiom of UFD. One proceeds by induction on the minimum
number of elements in a decomposition of an element r. Suppose that r has a factorization into
irreducibles that has a single factor. Then r itself is an irreducible element. By the definition of
an irreducible element, if r = ab, then either a or b is a unit and hence the other element is an
associate of r. Hence, if there is a factorization with 1 element, all factorizations into irreducibles
have 1 element.
For the induction hypothesis, suppose that if r has a factorization into n irreducibles then all of
its factorizations into irreducibles involve n irreducibles and that the irreducibles can be rearranged
to be unique up to associates. Consider now an element that requires a minimum of n + 1 irreducible
factors for a factorization. Assume that we have two factorizations
r = p1 p2 · · · pn+1 = q1 q2 · · · qm ,
since we are in an integral domain. We apply the induction hypothesis on the factorizations
p2 · · · pn+1 = aq2 · · · qm and can conclude that the second criterion for UFD holds for n + 1. By
induction, the second axiom for UFDs hold for all n ∈ N∗ and the proposition follows.
6.4. UNIQUE FACTORIZATION DOMAINS 291
Proposition 6.3.3 along with Proposition 6.4.10 show that some of the classes of integral domains
that we have introduced are subclasses of one another. We can summarize the containment of some
of the ring classes introduced so far with the diagram
The propositions and examples provided so far have not shown strict containment between Euclidean
domains and PIDs or between PIDs and UFDs. Example 6.5.7 gives examples of UFDs that are not
PIDs.
As a consequence of Proposition 6.4.2, prime elements and irreducible elements are equivalent in
Euclidean domains and in PIDs. Furthermore, as particular examples, since every Euclidean domain
is a unique factorization domain, the ring of Gaussian integers Z[i] and the polynomial ring F [x]
over a field F are UFDs.
In the next section, we will establish a necessary and sufficient condition on R for R[x] to be a
UFD, but we can already give a characterization of when R[x] is a PID.
Proposition 6.4.11
If R is a commutative ring such that the polynomial ring R[x] is a PID, then R is necessarily
a field.
Proof. Suppose that R[x] is a PID. Then the subring R is an integral domain. By Exercise 5.7.2,
we see that (x) is a nonzero prime ideal. We can also see this by pointing out that two polynomials
a(x) and b(x) with nonzero constant terms a0 and b0 are such that their product a(x)b(x) has the
nonzero constant term a0 b0 . Thus, a(x) ∈ / (x) and b(x) ∈ / (x) implies that a(x)b(x) ∈
/ (x). The
contrapositive of this last statement establishes that (x) is a prime ideal.
By Proposition 5.7.10, since R[x] is a PID then the nonzero prime ideal (x) is in fact a maximal
ideal. Therefore, R = R[x]/(x) is a field.
Definition 6.4.12
Let R be an integral domain and let r ∈ R be an irreducible element. The function
ordr : R − {0} → N is defined by ordr (a) = n whenever n is the least positive integer such
that rn+1 does not divide a. In other words, ordr (a) = n means that rk | a for 0 ≤ k ≤ n
and rk - a if k > n.
Lemma 6.4.13
Let a and b be nonzero elements in a UFD R. Then a | b if and only if ordr (a) ≤ ordr (b)
for all irreducible elements r ∈ R.
Proof. First suppose that a | b. Let r be any irreducible element of R and let k = ordr (a). Then
rk | a and by transitivity of divisibility, rk | b. Hence, ordr (a) ≤ ordr (b).
Conversely, suppose that ordr (a) ≤ ordr (b) for all irreducible elements r. Let a = p1 p2 · · · pm
and b = q1 q2 · · · qn be factorizations into irreducible elements. Let a = p1 p2 · · · pm and b = q1 q2 · · · qn
be factorizations into irreducible elements. Let S = {r1 , r2 , . . . , r` } be a (finite) set of irreducible
292 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS
elements, none of which are associates to each other and such that for each pi and each qj is associate
to some element in S. Then we can write
for some units u, v ∈ U (R) and where αk = ordrk (a) and βk = ordrk (b). By hypothesis, αk ≤ βk for
all k so
b = a (u−1 v)r1β1 −α1 r2β2 −α2 · · · r`β` −β`
and thus a | b.
Proposition 6.4.14
Let R be a unique factorization domain. For all a, b ∈ R, there exists a greatest common
divisor of a and b.
is a divisor to both a and b. Let d0 be any other common divisor of a and b. Each irreducible in a
factorization of d0 must be an associate to some pi in the factorization of a. Hence, we can write
By Proposition 6.1.16, any two elements a and b in a UFD have a least common multiple. In a
parallel fashion, using the expressions in (6.9), it is easy to prove that
max(α1 ,β1 ) max(α2 ,β2 ) max(α` ,β` )
m = r1 r2 · · · r`
First, note that since, when taking complex conjugation, zw = z w, then an element in Z[i] is
irreducible if and only if its complex conjugate is irreducible.
Lemma 6.4.15
If an element a + bi ∈ Z[i] with ab 6= 0 is prime, then a2 + b2 is a prime number in Z.
Proof. The element a2 + b2 = (a + bi)(a − bi) is in the prime ideal (a + bi). Assume that a2 + b2 = mn
as a composite integer with factors m, n ≥ 2. Then since (a + bi) is a prime ideal, m ∈ (a + bi) or
n ∈ (a + bi). Without loss of generality, suppose that m ∈ (a + bi). The integer m cannot be an
associate of a + bi. Furthermore, there exists z ∈ Z[i] with (a + bi)z = m so zn = (a − bi). But n
is not a unit either which makes a − bi reducible. This is a contradiction and hence the assumption
that a2 + b2 is a composite integer is false. The lemma follows.
Lemma 6.4.16
Let a + bi ∈ Z[i] with ab 6= 0. Then a + bi is prime if and only a2 + b2 = p is a prime
number with p ≡ 1 (mod 4) or p = 2.
Proof. Suppose that a + bi is prime in Z[i]. Using modular arithmetic modulo 4, it is easy to that
for a, b ∈ Z, the sum a2 + b2 is never congruent to 3 modulo 4. (See Exercise 2.2.8.) Furthermore, if
n ≡ 0 (mod 4), then n is divisible by 4, and hence is composite. By Lemma 6.4.15, we deduce that
a2 + b2 is a prime integer congruent to 1 or 2 modulo 4. However, the only prime number that is
congruent to 2 modulo 4 is 2.
Conversely, suppose that a2 + b2 = p is a prime number. Then assume that a + bi = αβ for some
α, β ∈ Z[i]. Then p = N (αβ) = N (α)N (β) and hence either N (α) = 1 or N (β) = 1. Hence, by
Proposition 6.1.7, either α or β is a unit and we deduce that a + bi is irreducible.
Note that if a + bi is a prime element with a2 + b2 = 2, then a + bi is one of the four elements
±1 ± i.
Now we consider elements of the form a or bi. Obviously, bi is an associate to the integer b
so without loss of generality we only consider elements a + 0i ∈ Z[i]. Note that if an integer n is
composite in Z, it cannot be prime in Z[i] either. Hence, we restrict our attention to prime numbers.
Lemma 6.4.17
If p is a prime number in Z with p ≡ 3 (mod 4), then p is also prime in Z[i].
Proof. Assume p not irreducible in Z[i]. Then p = αβ with neither α not β a unit. We have
p2 = N (p) = N (α)N (β) so N (α) = N (β) = p since neither N (α) nor N (β) can be 1. However,
N (α) is the sum of two squares but (by Exercise 2.2.8) the sum of two squares cannot be congruent
to 3 modulo 4. Hence, the assumption leads to a contradiction and the lemma follows.
Lemma 6.4.18
In Z[i], the integer 2 factors into two irreducibles 2 = (1 + i)(1 − i).
The last lemma turns out to be the most difficult and requires a reference to a result in number
theory.
Lemma 6.4.19
If p is a prime number in Z with p ≡ 1 (mod 4), then p is not prime in Z[i] but has the
prime factorization of the form p = (a + bi)(a − bi).
294 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS
Proof. In 1770, Lagrange proved that if p = 4n + 1 is a prime number in Z, then the congruence
equation x2 ≡ −1 (mod p) has a solution. In other words, there exists an integer m such that
m2 + 1 is divisible by p. ([53, Theorem 11.5]) Now consider such a p as an element on Z[i]. Note
that m2 + 1 = (m + i)(m − i). The integer p cannot divide m + i or m − i, because otherwise p
would have to divide the imaginary part of m + i or m − i, which is ±1. Hence, since Z[i] is a unique
factorization domain, p cannot be prime in Z[i].
The norm of p is N (p) = p2 and since p is not prime in Z[i] then p must factor into p = αβ with
√ √
N (α) = N (β) = p. Writing α and β in polar coordinates, α = peθ and β = peφ . Since αβ = p,
we have θ + φ = 2πk so φ = −θ up to a multiple of 2π. Thus, β is the complex conjugate α.
The above five lemmas culminate in the following proposition.
Proposition 6.4.20
The prime (irreducible) elements in Z[i] are
• associates of integers of the form p + 0i, where p is prime with p ≡ 3 (mod 4);
• associates of 1 + i or 1 − i;
13. Let R be a UFD and let D be a multiplicatively closed set that does not contain 0. Prove that the
ring of fractions D−1 R is a UFD.
14. Let R be an integral domain and let r ∈ R be an irreducible element. Prove that ordr (ab) = ordr (a) +
ordr (b) for all a, b ∈ R − {0}.
15. Let R and r be as in Exercise 6.4.14. Let (R − {0})/ ' be the set of equivalence classes of associate
nonzero elements in R.
def
(a) Prove that ordr : (R − {0})/ ' → N given by ordr ([a]) = ordr (a) for all a ∈ R is a well-defined
function.
(b) Prove that ordr : (R − {0})/ ' → N is a monotonic function between the posets ((R − {0})/ ', |)
and (N, ≤).
16. Let R be a UFD and let a ∈ R − {0}. Let Sa be a set of irreducible elements that divide a and such
that no two elements in Sa are associates. Prove that Sa is a finite set and that
Y ord (a)
a' r r .
r∈Sa
17. Let R be an integral domain. Suppose that for all a ∈ R − {0}, any set S of irreducible elements
that divide a in which no two elements are associates is finite. Prove that R is a unique factorization
domain if and only if for all a ∈ R − {0} and all such sets S,
Y ord (a)
a' r r .
r∈S
6.5
Factorization of Polynomials
Early on, students of mathematics learn strategies to factor polynomials or to find roots of a poly-
nomial. Many of the theorems introduced in elementary algebra assume the coefficients are in Z,
Q or R. In this section we review many theorems concerning factorization or irreducibility in R[x]
where R is an integral domain.
Lemma 6.5.1
If R[x] is a UFD, then R is a UFD.
Proof. The ring R of coefficients is a subring of R[x]. By degree considerations and Proposition
5.2.6, if p(x) = c is a constant polynomial and if p(x) = a(x)b(x), then both a(x) and b(x) have
degree 0. Thus, if R[x] is a UFD, then every c ∈ R has a unique factorization into irreducible
elements of R[x] but each of these elements has to have degree 0. Thus, R is a UFD.
Because of this lemma, we henceforth let R be a unique factorization domain.
Proof. Consider the equation p(x) = A(x)B(x) where A(x) and B(x) are elements in F [x]. Let dA
be a least common multiple of all the denominators appearing in A(x) and similarly for dB . Set
a0 (x) = dA A(x) and b0 (x) = dB B(x), which are polynomials in R[x]. Then dA dB p(x) = a0 (x)b0 (x)
where now a0 (x), b0 (x) ∈ R[x]. Setting d = dA dB , we have dp(x) ∈ R[x].
If d is a unit in R, then we are done by taking a(x) = d−1 a0 (x) and b(x) = b0 (x). So this gives
p(x) = a(x)b(x) with a(x), b(x) ∈ R[x].
If d is not a unit, consider the prime factorization of d ∈ R, namely d = p1 p2 · · · pn . For each
i ∈ {1, 2, . . . , n}, the ideal pi R[x] is a prime ideal in R[x] since pi is irreducible in R and therefore
irreducible in the UFD R[x]. By Proposition 5.6.11, R[x]/(pi R[x]) ∼ = (R/pi R)[x] and is an integral
domain, since pi R[x] is a prime ideal. Considering the expression dp(x) = a0 (x)b0 (x) reduced in the
quotient ring, we have
0̄ = a0 (x) b0 (x).
Thus, one of these polynomials in the quotient ring is 0. Therefore, it is possible to partition
{1, 2, . . . , n} = I into two subsets Ia and Ib such that if i ∈ Ia , then a0 (x) = 0 in (R/pi R)[x] and if
i ∈ Ib , then b0 (x) = 0 in (R/pi R)[x]. Then all the coefficients of a0 (x) are multiples of
Y
pi
i∈Ia
Corollary 6.5.3
Let R be a UFD and let F be its field of fractions. Let p(x) ∈ R[x] be such that its
coefficients have a greatest common divisor of 1. Then p(x) is irreducible in R[x] if and
only if it is irreducible in F [x]. In particular, a monic polynomial is irreducible in R[x] if
and only if it is irreducible in F [x].
Example 6.5.4. Consider the polynomial p(x) = 6x2 − x − 1 in Z[x]. If we consider p(x) as an
element of the bigger ring Q[x], it can be factored as
1
p(x) = x − (6x + 2).
2
This factorization is not in R[x] but it can be changed to p(x) = (2x − 1)(3x + 1) in Z[x]. Note that
in Q[x], the unique factorization of p(x) is
1 1
p(x) = 6 x − x+
2 3
where 6 is a unit in Q. 4
6.5. FACTORIZATION OF POLYNOMIALS 297
Theorem 6.5.5
R is a UFD if and only if R[x] is a UFD.
Proof. Lemma 6.5.1 already gave one direction of this proof. Gauss’ Lemma allows us to prove the
converse.
Suppose now that R is a UFD and let F be its field of fractions. Recall that F [x] is a Euclidean
domain so it is a UFD. Let p(x) ∈ R[x] and let d be the greatest common divisor of the coefficients
of p(x) so that p(x) = dp2 (x) where the coefficients of p2 (x) have a greatest common divisor of 1.
Since R is a UFD, and d can be factored uniquely into irreducibles in R, it suffices to prove that
p2 (x) can be factored uniquely into irreducibles in R[x].
Since F [x] is a UFD, p2 (x) can be factored uniquely into irreducibles in F [x] and by Gauss’
Lemma, there is a factorization of p2 (x) in R[x]. Since the greatest common divisor of coefficients
of p2 (x) is 1, then the greatest common divisor of each of the factors of p2 (x) in R[x] is 1. By
Corollary 6.5.3, the factors of p2 (x) in R[x] are irreducible. Thus, p(x) can be written as a product
of irreducible elements in R[x].
Since R[x] is a subring of F [x], then the factorization of p2 (x) in R[x] is a factorization into
irreducible elements in F [x], which is unique up to rearrangement and multiplication by units.
There exist fewer units in R than in F so the uniqueness of the factorization also holds in R[x].
Theorem 6.5.5 establishes that the algebraic context in which to discuss factorization of poly-
nomials is when the ring of coefficients is itself a UFD. The theorem also has consequences for
multivariable polynomial rings.
Corollary 6.5.6
If R is a UFD, then the polynomial ring R[x1 , x2 , . . . , xn ] with a finite number of variables
is a UFD.
Proof. Theorem 6.5.5 establishes the induction step from R[x1 , x2 , . . . , xn−1 ] to R[x1 , x2 , . . . , xn ] for
all n ≥ 1 and hence the corollary follows by induction on n.
Example 6.5.7. Note that Z[x], Z[x, y], etc. are UFDs by the above theorems. However, they are
not PIDs and thus give simple examples of rings that are UFDs but not PIDs. 4
Definition 6.5.8
A polynomial p(x) ∈ R[x] is called primitive if the coefficients of p(x) are relatively prime.
In the language of unique factorization domains, a polynomial is primitive if and only if it does
not have irreducible factors of degree 0. Note that if F is a field, then every polynomial in F [x] is
primitive.
With polynomials in Z[x], the content of a polynomial p(x), denoted by c(p) is defined as the
greatest common divisor of the coefficients of p(x) multiplied by the sign of the leading coefficient.
Similarly, we may refer to the content of a polynomial p(x) ∈ R[x], though this is only well-defined up
to multiplication by a unit. Consequently, a polynomial p(x) ∈ R[x] can be written p(x) = c(p)p0 (x)
where p0 (x) is a primitive polynomial and is called the primitive part of p(x).
298 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS
Proposition 6.5.9
Let p(x) ∈ R[x] and let F be the field of fractions of R. The polynomial p(x) has a
factor of degree 1 if and only if it has a root in F . Furthermore, if the root is rs , then
p(x) = (sx − r)q(x) for some q(x) ∈ R[x].
Proof. If p(x) has a factor of degree 1, say (sx − r), then rs is a root of p(x) when viewed as an
element in F [x].
For the converse, suppose that p(x) has a root α = rs in F . Consider the polynomial division
of p(x) by d(x) = x − α in the Euclidean domain F [x]. Since the remainder must have degree less
than deg d(x) = 1, so there exist unique q(x) ∈ F [x] and r ∈ F such that p(x) = (x − α)q(x) + r.
However, p(α) = 0 so 0 = 0q(α) + r and thus r = 0. Therefore, p(x) = (x − α)q(x) in F [x].
By Gauss’ Lemma, p(x) = (sx − r)q(x) for some polynomial q(x) ∈ R[x].
Corollary 6.5.10
Let p(x) be a nonzero polynomial in R[x]. The number of distinct roots in the field of
fractions is less than or equal to the degree of p(x).
Proof. Let F be the field of fractions of R. If α1 , α2 , . . . , αm are distinct roots of p(x) in F with
αi = ri /si , then
p(x) = (s1 x − r1 )(s2 x − r2 ) · · · (sm x − rm )q(x)
for some q(x) ∈ R[x]. Hence, deg p(x) ≥ m.
The following proposition is sometimes presented in elementary algebra courses in the context of
polynomials with integer coefficients. The proposition provides a short list of all the possible linear
irreducible factors of a polynomial.
p(x) = an xn + · · · + a1 x + a0 .
Then the only roots of p(x) in F [x] are u de where u is a unit, e divides a0 and d divides an .
Proof. Let rs be a roots of p(x) in F (with p(x) viewed as an element of F [x]) and suppose that r
and s have no common divisor. Then
rn rn−1 r
an n
+ a n−1 n−1
+ · · · + a1 + a0 = 0
s s s
which, after multiplying by sn , gives
Then
s(an−1 rn−1 + · · · + a1 sn−2 s + a0 sn−1 ) = −an rn .
By unique factorization in R, all the prime factors of s divide an rn but since s and r are relatively
prime, all the prime factors of s must be associates to the prime factors of an . Thus, s|an . With an
identical argument applied to (6.10), we can show that r|a0 . The result follows.
Proposition 6.5.11 also makes it simple to determine if a given quadratic or cubic polynomial is
irreducible.
6.5. FACTORIZATION OF POLYNOMIALS 299
Proposition 6.5.12
A primitive polynomial p(x) ∈ R[x] of degree 2 or 3 is reducible in R[x] if and only if it has
a root in the field of fractions F .
Proof. Suppose a primitive polynomial p(x) of degree 2 or 3 is reducible. Then p(x) = a(x)b(x) with
a(x), b(x) ∈ R[x], not units. Since deg a(x), deg b(x) ≥ 1 and since deg a(x)+deg b(x) = deg p(x) ≤ 3,
then deg a(x) = 1 or deg b(x) = 1. By Proposition 6.5.9, p(x) has a root in F .
Conversely, suppose that p(x) has a root rs ∈ F . Then by Proposition 6.5.9, p(x) = (sx − r)q(x).
Since p(x) is of degree 2 or 3, then deg q(x) is equal to 1 or 2, and hence q(x) is not a unit. Thus,
p(x) is reducible.
Example 6.5.13. We show that p(x) = 2x3 − 7x + 3 is irreducible in Z[x]. By Gauss’ Lemma, p(x)
can factor over Q if and only if it can factor over Z. We just need to check if it has roots in Q to
determine whether it factors over Q. The only possible roots according to Proposition 6.5.11 are
1 3
±1, ±3, ± , ± .
2 2
It is easy to verify that none of these eight fractions are roots of p(x). Then by Proposition 6.5.12,
p(x) is irreducible in Z[x]. 4
03 + 2 · 02 + 2 · 0 + 3 = 3 6= 0,
13 + 2 · 12 + 2 · 1 + 3 = 3 6= 0,
23 + 2 · 22 + 2 · 2 + 3 = 3 6= 0,
33 + 2 · 32 + 2 · 3 + 3 = 4 6= 0,
43 + 2 · 42 + 2 · 4 + 3 = 2 6= 0.
We observe that no element of the field is a root of the polynomial. So by Proposition 6.5.12, since
the polynomial is a cubic and has no roots, then the polynomial is irreducible. 4
Up to now, the propositions in this subsection have shown how to quickly determine if a polyno-
mial of degree 3 or less is irreducible. For polynomials of degree 4 or more in R[x], Proposition 6.5.11
helps quickly determine if a polynomial has a factor of degree 1. However, it requires more work
to determine if a polynomial has an irreducible factor of degree 2 or more. The following examples
illustrate what can be done for polynomials with coefficients in Z or in the finite field Fp .
Example 6.5.15. We propose to show that p(x) = x4 + x + 2 is irreducible as a polynomial in
F3 [x]. By checking the three field elements 0, 1, 2 ∈ F3 , it easy to see that none of them are roots.
Hence, p(x) has no linear factors. Assume that p(x) is reducible. Then by degree considerations,
p(x) is the product of two quadratic polynomials. A priori, by considering the leading coefficient of
p(x), there appear to be two cases:
However, by factoring out a 2 from each of the terms of the second case, we see that it is equivalent
to the first case. Hence, we only need to consider the first situation. Expanding the product for
p(x) gives
The last of the four conditions gives (b, d) = (1, 2) or (2, 1). Applied to the second equation, we see
that ac = 0, so a = 0 or c = 0. This last result applied to the first equation, shows that a = c = 0.
But then in the third equation, we deduce that 0 + 0 = 1, which is a contradiction. We conclude
that p(x) is irreducible. 4
Example 6.5.16. We repeat the above example, but with q(x) = x4 + x + 2 ∈ Z[x]. By Proposi-
tion 6.5.11, if q(x) has a linear factor, then it has one of the following roots: ±1 or ±2. A quick
calculation shows that none of these four numbers is a root of q(x), so q(x) has no factors of degree
1. Assume that q(x) is reducible. Then
From ad = 1, we deduce that (a, d) = (1, 1) or (−1, −1). As in the previous example, the case
(a, d) = (−1, −1) can be made equivalent to the case (a, d) = (1, 1), by multiplying both quadratic
factors by −1. Consequently, we again have equations similar to (6.11) but in Z:
b+e=0
e = −b
c + f + be = 0 c + f − b2 = 0
=⇒
bf + ce = 1
b(f − c) = 1
cf = 2, cf = 2.
From the third equation, f − c = 1 or −1. Furthermore, b = ±1 and hence the second equation gives
c + f = 1. Since we are in Z, the fourth equation implies that (c, f ) can be one of the four pairs
(1, 2), (2, 1), (−1, −2), or (−2, −1), but none of these four options satisfies c + f = 1. This implies
a contradiction and hence we deduce that q(x) is irreducible in Z[x]. 4
The above two examples illustrate a similar strategy to checking if a quartic polynomial is
irreducible but applied to different coefficient rings. However, we could have immediately deduced
the result of Example 6.5.16 from Example 6.5.15 without as much work. The following proposition
generalizes this observation.
Proposition 6.5.17
Let I be a proper ideal in the integral domain R and let p(x) be a nonconstant monic
polynomial in R[x]. If the image of p(x) in (R/I)[x] under the reduction homomorphism is
an irreducible element, then p(x) is irreducible in R[x].
Proof. Suppose that p(x) = a(x)b(x), where a(x) and b(x) are not units in R[x]. Then the degrees
of a(x) and b(x) must be positive since if either a(x) or b(x) were constant, then since LC(p(x)) =
1 = LC(a(x))LC(b(x)), the constant polynomial would need to be a unit in R.
Now let ϕ : R[x] → (R/I)[x] be the reduction homomorphism. (Recall that ϕ maps the coeffi-
cients of p(x) to its images in R/I.) Since ϕ is a homomorphism, then ϕ(p(x)) = ϕ(a(x))ϕ(b(x)).
We already established that the leading coefficients of a(x) and b(x) must be units, since the leading
coefficient of p(x) is 1. However, since I is a proper ideal of R, it contains no units. We deduce that
deg ϕ(a(x)) = deg a(x) ≥ 1 and deg ϕ(b(x)) = deg b(x) ≥ 1. Hence, neither ϕ(a(x)) nor ϕ(b(x)) is a
unit in (R/I)[x] and thus ϕ(p(x)) is reducible.
We have proven that if p(x) is reducible, then ϕ(p(x)) is reducible. The proposition is precisely
the contrapositive of this statement.
6.5. FACTORIZATION OF POLYNOMIALS 301
The following example gives another application of Proposition 6.5.17 even as it illustrates some
more reasoning with factorization of polynomials.
Example 6.5.18. We show that q(x) = x4 + 3x3 + 22x2 − 8x + 3 is irreducible in Z. We consider
the polynomial modulo 2: q(x) = x4 + x3 + 1 ∈ F2 [x]. It is obvious that neither 0 nor 1 in F2 are
roots of q(x). Hence, if q(x) is reducible in F2 [x], then it must be a product of two quadratics since
it does not have a factor of degree 1. However, we point out that F2 [x] only has one irreducible
polynomial of degree 2, namely x2 + x + 1. (There are only 3 other quadratic polynomials in F2 [x],
namely x2 , x2 + 1, and x2 + x, each of which is reducible.) The only quartic polynomial that is the
product of two irreducible quadratic polynomials is
(x2 + x + 1)(x2 + x + 1) = x4 + x2 + 1.
Hence, x4 + x3 + 1 is irreducible. We point out that our reasoning also establishes that x4 + x + 1
is irreducible in F2 [x]. 4
For nearly all integral domains R, determining whether a polynomial in R[x] is irreducible is
a generally a difficult problem. Proposition 6.5.17 offers a sufficient condition for deciding if a
polynomial is irreducible. Indeed, many theorems provide sufficient but not necessary conditions
that are relatively easy to verify. We conclude the section with one more sufficient condition for
irreducibility.
/ P 2 . Then f (x) is
be a polynomial in R[x]. Suppose that ai ∈ P for all i < n and a0 ∈
irreducible in R[x].
and 2 − i is not an associate of 2 + i. Hence, all the nonleading coefficients of f (x) are in the ideal
/ ((2 + i)2 ). Hence, f (x) is irreducible in Z[i][x].
(2 + i) and 5 ∈ 4
• If F is a field, then F [x] is not only a PID but a Euclidean domain. (Corollary 6.3.6)
Maple Function
irreduc(a); Tests whether a single variable or multivariate polynomial over an algebraic
number field is irreducible.
factor(a); Computes the factorization of a single variable or multivariate polynomial with
integer, rational, numeric, or algebraic number coefficients.
8. Let F be a finite field with q elements. By determining all the monic reducible polynomials of degree
2, prove that there are 21 (q 2 − q) monic irreducible quadratic polynomials in F [x].
9. List all irreducible monic quadratic polynomials in F5 [x].
10. Prove Corollary 6.5.3.
11. Let F be a field and let a be a nonzero element in F . Prove that if f (ax) is irreducible, then f (x) is
irreducible.
12. Prove a modification of Proposition 6.5.17 in which I is a prime ideal P in R and the polynomial p(x)
satisfies LC(p(x)) ∈
/ P.
13. Prove that f (x) = x3 + (2 + i)x + (1 + i) is irreducible in Z[i][x].
14. Let R be a UFD and let a ∈ R. Then p(x) ∈ R[x] is irreducible if and only if p(x + a) is irreducible.
15. Let p be a prime number in Z. Use Exercise 6.5.14 to prove that the polynomial xp−1 +xp−2 +· · ·+x+1
is irreducible in Z[x].
16. Consider the polynomial p(x) = x3 + 3x2 + 5x + 5 in Z[x]. Find a shift of the variable x so that you
can then use Eisenstein’s Criterion to show that p(x) is irreducible.
17. Let c1 , c2 , . . . , cn ∈ Z be distinct integers. Consider the polynomial
p(x) = (x − c1 )(x − c2 ) · · · (x − cn ) − 1
in Z[x].
(a) Prove that if p(x) = a(x)b(x), then a(x) + b(x) evaluates to 0 at ci for i = 1, 2, . . . , n.
(b) Deduce that if a(x) and b(x) are nonconstant, then a(x) + b(x) is the 0 polynomial in Z[x].
(c) Deduce that p(x) is irreducible.
[Hint: Exercise 6.4.12.]
18. Prove the following generalization of Eisenstein’s Criterion. Let P be a prime ideal in an integral
domain R and let
f (x) = an xn + an−1 xn−1 + · · · a1 x + a0
be a polynomial in R[x]. Suppose that: (1) an ∈ / P 2 . Then
/ P ; (2) ai ∈ P for all i < n; and (3) a0 ∈
f (x) is not the product of two nonconstant polynomials.
19. Let R be a UFD and let F be its field of fractions. Suppose that p(x), q(x) ∈ F [x] and that p(x)q(x)
is in the subring R[x]. Prove that the product of any coefficient of p(x) with any coefficient of q(x) is
an element of R.
20. Let F be a finite field of order |F | = q and let p(x) ∈ F [x] with deg p(x) = n. Prove that F [x]/(p(x))
has q n elements. [Hint: Exercise 5.6.9.]
21. Prove that for all primes p, there is a field with p2 elements.
22. Let p(x) = x4 + Ax3 + Bx2 + Cx + D be a monic quartic (degree 4) polynomial in Z[x].
(a) Suppose that p(x) factors into two quadratics p(x) = (x2 + ax + b)(x2 + cx + d). Prove that if
a = 0 or c = 0, then ABC = A2 D + C 2 .
(b) Prove that if p(x) factors into two quadratics and if ABC 6= A2 D + C 2 , then there exists an
integer a 6= 0 such that C 2 − 4aD(A − a) is a square integer.
(c) Deduce that if D < 0, there are only a finite number of possibilities for a.
(d) Use the previous two parts to prove that p(x) = x4 + 3x3 − 2x − 7 is irreducible in Z[x].
23. Let F be a field. Consider the derivative function D : F [x] → F [x] defined by D(a0 ) = 0 and
(a) From this definition, prove that D satisfies the differentiation rules
24. Let R be a UFD and let P (x) ∈ R[x]. Let n ∈ N∗ and define for this exercise P n (x) to be the
polynomial P (x) iterated n times, i.e.,
n times
z }| {
n
P (x) = P (P (· · · P (x) · · · )).
6.6
RSA Cryptography
In Section 3.10, we presented the idea of public key cryptography: a protocol for two parties who
communicate entirely publicly to select a key that will nonetheless stay secret (not easily obtainable).
The Diffie-Hellman protocol relied on Fast Exponentiation to quickly calculate powers of group
elements, whereas determining a from g and g a is relatively slow. The RSA public key protocol,
named after Ron Rivest, Adi Shamir, and Leonard Adleman, is a protocol between two parties A
and B in which party B allows party A to generate a key that will allow party A to send a secret
message to party B. Rivest, Shamir, and Adelman first introduced the protocol in the context of the
integers but it can be generalized to other rings. We will also first describe the protocol over Z and
then generalize.
Send n = pq
(2) Choose primes p 6= q
(1) The protocol starts with Alice initiating and telling Bob that she wants to talk secretly.
(2) Bob secretly chooses two prime numbers p and q but sends the product n = pq to Alice.
(3) Bob also chooses an integer e that is relatively prime to (p − 1)(q − 1) and sends this to Alice.
Together, the pair (n, e) form the public key of the protocol.
(4) The message Alice will send to Bob consists of an element m ∈ Z/nZ.
(5) Alice sends to Bob the ciphertext of c = me ∈ Z/nZ, where the power is performed with fast
exponentiation.
(6) Bob calculates the inverse d of e modulo Z/(p − 1)(q − 1)Z so that ed ≡ 1 (mod (p − 1)(q − 1)).
Since we are always concerned with implementing the calculations quickly, Bob can use the
Extended Euclidean Algorithm to determine if e is invertible in Z/(p − 1)(q − 1)Z and to
calculate the inverse d. (See Example 2.2.11.)
(7) Then Bob calculates (using fast exponentiation) cd = med in Z/nZ. By the Chinese Remainder
Theorem, there is an isomorphism
Hence, by Proposition 5.1.14, U (pq) ∼ = U (p) ⊕ U (q) and |U (pq)| = (p − 1)(q − 1). If m and
(p − 1)(q − 1)) are relatively prime, then m ∈ U (Z/nZ) = U (n) so
by Lagrange’s Theorem. If gcd(m, (p − 1)(q − 1)) 6= 1, we still have the following cases. If
m = 0 in Z/nZ, then c = me = 0 and cd = med = 0, so again cd = m. If p | m but m 6= 0 in
Z/nZ, then under the isomorphism ϕ(m) = (0, h) for some h ∈ Z/qZ. Then
(0, h)1+k(p−1)(q−1) = 0, h1 (hq−1 )k(p−1) = (0, h 1k(p−1) ) = (0, h).
Because of their roles, the integer e is called the encryption key and d is called the decryption
key.
From Eve’s perspective, she knows everything that Bob does except what p and q are separately.
If p and q are very large prime numbers, then it is very quick for Bob to calculate the product n = pq
but it is very slow for Eve to find the prime factorization, namely p × q of n. Furthermore, she would
need to know p and q separately in order to calculate (p − 1)(q − 1), which she needs to determine d.
As opposed to Diffie-Hellman, in which both Alice and Bob know the secret key g ab , only Bob
knows the secret keys p and q. Diffie-Hellman is symmetric in the sense that Alice and Bob could
both use g ab for communication. In RSA, Alice is in the same situation as Eve in that she cannot
easily determine d. Hence, the RSA protocol sets up a one-way secret communication: Only Alice
can send a secret message to Bob.
Again, as mentioned in the Diffie-Hellman protocol, it may seem unsatisfactory that it is theo-
retically possible for Eve to find d from the information passed in the clear. However, if the primes
p and q are large enough, it may take over 100 years with current technology to find the prime
factorization of n and hence determine p and q separately. Very few secrets need to remain secret
for a century so the protocol is secure in this sense.
Even if n is large, it is not likely that a long communication can be encoded into a single element
m ∈ Z/nZ. Alice and Bob can use two strategies to allow for Alice to send a long communication
to Bob.
306 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS
One strategy involves deciding upon an injective function H from the message space M into the
set of finite sequences of elements in Z/nZ. Then the message, regardless of alphabet, can be encoded
as a string of elements (m1 , m2 , . . . , m` ) in Z/nZ and Alice sends the sequence (me1 , me2 , . . . , me` ) to
Bob. Bob then decodes each element in the string as described above to recover (m1 , m2 , . . . , m` ).
Then, since H is an injective function, Bob can find the unique preimage of (m1 , m2 , . . . , m` ) under
H and thereby recover the message in the message space M.
A second strategy uses the RSA protocol only to exchange a key for some subsequent encryption
algorithm. In other words, Alice and Bob agree (publicly) on some other encryption algorithm that
requires a secret key and they use RSA to decide what key to use for that algorithm. Essentially,
Alice is saying to Bob: “Let’s use m as the secret key.”
As one last comment about speed of the protocol, prime factorization is always slow so whenever
Bob needs to calculate the greatest common divisor of two integers, he uses the Euclidean Algorithm
and when he calculates the inverse of e in Z/(p − 1)(q − 1)Z he uses the Extended Euclidean
Algorithm. The astute reader might wonder why we might not use Lagrange’s Theorem to calculate
d. Since gcd(e, (p − 1)(q − 1)) = 1, then e ∈ U ((p − 1)(q − 1)) so d = e|U ((p−1)(q−1))|−1 . However,
|U ((p − 1)(q − 1))| = φ((p − 1)(q − 1)), where φ is Euler’s totient function. Calculating the totient
function of an integer requires one to perform the prime factorization of that integer.
Example 6.6.1. For a first example, we use small prime numbers and illustrate the full process.
In practice, of course, one uses computer programs to execute these calculations.
Suppose that Alice wants to communicate secretly with Bob and they agree to use RSA. Bob
selects p = 1759 and q = 2347 and sends n = 4128373 to Alice. He also sends the encryption key
e = 72569. Alice and Bob agree to encode strings of characters in the following way. Each character
will correspond to a digit by
Observe that the first character of the alphabet here is the space character. Note, they only use
an alphabet of 29 characters. Since 294 = 707281 < n, then each quadruple of four characters
(c1 , c2 , c3 , c4 ) is converted to the integer
c4 × 293 + c3 × 292 + c2 × 29 + c1
and then viewed as an element of Z/4128373Z. If necessary, a message can be padded at the end
with spaces so that the message has 4k characters in it and then corresponds to a sequence of length
k of elements in Z/4128373Z.
Alice wants to tell Bob “Hello there.” Her string of character numbers is
For completeness, we show all the steps in performing the Extended Euclidean Algorithm to
find d. The first sequence of integer divisions performs the Euclidean Algorithm until we obtain a
6.6. RSA CRYPTOGRAPHY 307
remainder of 1:
4124268 = 72569 × 56 + 60404
72569 = 60404 × 1 + 12165
60404 = 12165 × 4 + 11744
12165 = 11744 × 1 + 421
11744 = 421 × 27 + 377
421 = 377 × 1 + 44
377 = 44 × 8 + 25
44 = 25 × 1 + 19
25 = 19 × 1 + 6
19 = 6 × 3 + 1.
If Bob picked e at random, then it is possible that gcd(e, (p − 1)(q − 1)) > 1. In this case, Bob would
identify this error during the Euclidean Algorithm and he would simply pick another e. When we
get a remainder of 1, we are done since there is no smaller positive integer. Going backwards, we
get the following linear combinations of the intermediate remainders:
1 = 19 − 6 × 3
1 = 19 − (25 − 19 × 1) × 3 = 19 × 4 − 25 × 3
1 = (44 − 25 × 1) × 4 − 25 × 3 = 44 × 4 − 25 × 7
1 = 44 × 4 − (377 − 44 × 8) × 7 = 44 × 60 − 377 × 7
1 = (421 − 377) × 60 − 377 × 7 = 421 × 60 − 377 × 67
1 = 421 × 60 − (11744 − 421 × 27) × 67 = 421 × 1869 − 11744 × 67
1 = (12165 − 11744 × 1) × 1869 − 11744 × 67 = 12165 × 1869 − 11744 × 1936
1 = 12165 × 1869 − (60404 − 12165 × 4) × 1936 = 12165 × 9613 − 60404 × 1936
1 = (72569 − 60404 × 1) × 9613 − 60404 × 1936 = 72569 × 9613 − 60404 × 11549
1 = 72569 × 9613 − (4124268 − 72569 × 56) × 11549 = 72549 × 656357 − 4124268 × 11549.
Considering the last expression modulo 4154268, we get 1 ≡ 72549 × 656357 (mod 4124268) so
656357 is the inverse of 72549 modulo 4124268. Thus, Bob calculated that d = 656357.
In order to decrypt Alice’s message, using fast modular exponentiation, Bob calculates modulo
Z/4128373Z that
(cd1 , cd2 , cd3 ) = (302913, 211947, 660712).
Knowing the process by which Alice compressed her message into a string of elements in Z/4128373Z,
Bob can recover Alice’s message of “HELLO THERE.” 4
Definition 6.6.2
An integral domain that is not a field is called an RSA domain if for every maximal ideal
M ⊆ R, the quotient ring R/M is a finite field.
The ring of integers Z is an RSA domain but R[x] is not. Indeed, for the irreducible element
x2 + 1, R/(x2 + 1) ∼ = C, which is not finite. On the other hand, as Exercise 5.6.10 hints, for
any prime element π in Z[i], the elements in Z[i]/(π) are the point in C with integral components
that are contained in a square with edge along the segment from 0 to π. Hence, Z[i]/(π) is finite.
Furthermore, for every finite field F , any maximal ideal M in F [x] is principal so, by Exercise 6.6.5,
F [x]/M is finite.
With the notion of an RSA domain, we adapt the RSA algorithm as follows.
(3) Choose e ∈ N
Send e
with gcd(e, (|R/M1 | −
1)(|R/M2 | − 1)) = 1
(4) Alice selects (5) Send c = me
m ∈ R/I
(6) Calculate d = e−1 in
Z/(|R/M1 |−1)(|R/M2 |−1)Z
(7) Decrypt cd = med = m
in R/I
(1) The protocol starts with Alice initiating and telling Bob that she wants to talk secretly.
(2) Bob secretly chooses two maximal ideals M1 and M2 but sends the product ideal I = M1 M2 to
Alice. If R is a PID (as all of the above examples are), then Bob can work with the generating
elements of any of the ideals.
(3) Bob also chooses an integer e that is relatively prime to (|R/M1 | − 1)(|R/M2 | − 1) and sends
this to Alice. Together, the pair (I, e) form the public key of the protocol.
(4) The message Alice will send to Bob consists of an element m ∈ R/I.
(5) Alice sends to Bob the ciphertext of c = me ∈ R/I, where the power is performed with fast
exponentiation.
(6) Bob calculates the inverse d of e modulo Z/(|R/M1 | − 1)(|R/M2 | − 1)Z so that ed ≡ 1
(mod (|R/M1 | − 1)(|R/M2 | − 1)). Since we are always concerned with implementing the cal-
culations quickly, Bob can use the Extended Euclidean Algorithm.
(7) Then Bob calculates (using fast exponentiation) cd = med in R/I. Since M1 and M2 are
distinct maximal ideals, then M1 + M2 = R and hence they are comaximal. By the Chinese
Remainder Theorem, there is an isomorphism
ϕ : R/I → (R/M1 ) ⊕ (R/M2 ).
6.6. RSA CRYPTOGRAPHY 309
Hence, by Proposition 5.1.14, U (R/I) ∼= U (R/M1 ) ⊕ U (R/M2 ) and since R/M1 and R/M2 are
fields, we have |U (R/I)| = (|R/M1 | − 1)(|R/M2 | − 1). If m = 0 ∈ R/I, then c = me = 0 and
cd = 0 = m. If m ∈ U (R/I), then
By Lagrange’s Theorem, we deduce that med = m. If ϕ(m) = (0, h) with h ∈ U (R/M2 ), then
ϕ(m)ed = (0, h)1+k(|R/M1 |−1)(|R/M2 |−1) = 0, h(h|R/M2 |−1 )k(R/M1 |−1)
= (0, h 1k(|R/M1 |−1) ) = (0, h) = ϕ(m).
Hence, again med = m in R/I and similarly, if ϕ(m) = (h, 0) with h ∈ U (R/M1 ). This allows
Bob to recover m.
In practice, it is convenient to work in an algebraic context in which there is a simple and perhaps
unique way to determine a representative of cosets in R/I. In particular, when calculating the powers
me or cd in R/I using fast exponentiation, it is convenient for memory storage to constantly reduce
the power to a smallest equivalent expression in R/I. Euclidean domains offer such a context. Since
Euclidean domains are PIDs, every ideal I is equal to (a) for some element a. Then, while performing
the fast exponentiation algorithm, each time we take a power of m, we replace it with the remainder
when we divide by a from the Euclidean division.
Some RSA domains that are also Euclidean domains include Z[i] and F [x], where F is a finite
field.
As an example, we adapt Example 6.6.1 to a scenario in Z[i] with different prime numbers. To
simplify the example, we leave off the issue of encoding characters into the quotient ring R/I.
Example 6.6.3. Alice indicates that she wants to communicate secretly with Bob and they agree
to use RSA over Z[i]. Bob selects p = 15 + 22i and q = 7 + 20i. According to Example 6.4.4, since
N (p) = 709 and N (q) = 449 are prime, p and q are primes in the Gaussian integers. Bob sends
n = pq = −335 + 454i to Alice as well as the encryption key e = 2221 = (100010101101)2 , where
the latter expression is in binary for use with the fast exponentiation algorithm (Section 3.10.2).
Alice wants Bob to receive the number m = 67+232i. Running the fast exponentiation algorithm,
Alice sets the power variable π of m ∈ Z[i]/(n) initially as π = 1. Note that 11 is the highest nonzero
power of 2 in the binary expansion of e = 2221. Also, in the following calculations, though we do
not use the bar a notation, numbers are understood as elements in Z[i]/(n).
• b11 = 1 so π := π 2 m = 67 + 232i.
• b10 = 0 so π := π 2 = (67 + 232i)2 = −49335 + 31088i = 77 + 234i, where the last equality
holds after performing the Euclidean division of π 2 by n in Z[i].
This is the ciphertext c = me = 51 + 104i in Z[i]/(−335 + 454i) that Alice sends in the clear to Bob.
On Bob’s side, in order to calculate the decryption key d, he first needs to determine |Z[i]/(p)|
and |Z[i]/(q)|. By Exercise 6.6.6, the order of each of these quotient rings is |p|2 and |q|2 , respectively.
Hence, d is the inverse of e = 2221 modulo
Performing the extended Euclidean algorithm, Bob finds that d = 208933. In order to recover m,
he will calculate cd using the fast modular exponentiation in the finite ring Z[i]/(n), during which,
like Alice, he takes Euclidean remainders when divided by n at each stage of the algorithm. Since
in binary d = (110011000000100101)2 , the for loop in the algorithm with take only 18 steps. 4
where all the coefficients are understood to be in F31 . Let m(x) = x3 + 2x. Calculate m(x)3 in
F31 [x]/(n(x)).
6.7
Algebraic Integers
From a historical perspective, various other branches of mathematics—geometry, number theory,
analysis, and so forth—shaped various areas in algebra. Number theory motivated considerable
investigation in ring theory, and in particular topics related to divisibility. It is difficult to briefly
summarize the goal of number theory since it includes many different directions of investigations:
approximations of real numbers by rationals, distribution of the prime numbers in Z, multivariable
equations where we seek only integer solutions (Diophantine equations), and so on. In current termi-
nology, algebraic number theory is a branch of algebra applied particularly to studying Diophantine
equations and related generalizations.
This section offers a glimpse into algebraic number theory as an application of algebra internal
to mathematics. A more complete introduction to algebraic number theory requires field theory
(Chapter 7) including Galois theory (Chapter 11). For further study in algebraic number theory,
the reader may consult [46].
Since algebraic number theory is a vast area, in this section, we content ourselves to introduce
algebraic integers and illustrate how unique factorization in the Gaussian integers answers an oth-
erwise challenging problem in number theory. In so doing, we will show the historical motivation
behind certain topics in ring theory.
Q
R
We typically do not study this construction where K is any field but we often restrict our
attention to subfields of C.
312 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS
Definition 6.7.1
A field K that is a subfield of C and contains Q as a subfield is called a number field.
As we will see more in depth in Section 7.2, the adjective “algebraic” pertains to an element being
the solution to a polynomial. So an algebraic relationship between Z would have to do with how
integers arise as roots of polynomials in Q[x]. Now the solution set to any polynomial p(x) ∈ Q[x]
is equal to the set of solutions of q(x) = cp(x) ∈ Z[x] where c is the least common multiple of all
the denominators of the coefficients in p(x). Then Proposition 6.5.11 shows that if q(x) has only
integers for roots, then its leading coefficient is 1 or −1. This is tantamount to being monic. This
motivates the following definition.
Definition 6.7.2
Let K be a number field. The algebraic integers in K, denoted OK , is the set of elements
in K that are solutions to monic polynomials p(x) ∈ Z[x].
It is not at all obvious from this definition that OK is a ring. We first establish an alternate
characterization of algebraic integers in a number field K before establishing this key result.
Lemma 6.7.3
Suppose α ∈ K. Then α ∈ OK if and only if (Z[α], +) is a finitely generated free abelian
group.
Proof. First suppose that α ∈ OK . Then there exists a monic polynomial p(x) ∈ Z[x] such that
p(α) = 0. Then, there exists a positive integer n and c1 , c2 , . . . , cn ∈ Z such that
αn = −(cn−1 αn−1 + · · · + c1 α + c0 ).
Then by an induction argument, every power k ≥ n of α can be written as a Z-linear combination
of 1, α, α2 , . . . , αn−1 . Thus,
Z[α] ⊆ {cn−1 αn−1 + · · · + c1 α + c0 | c1 , c2 , . . . , cn−1 ∈ Z}.
The right-hand side is a finitely generated free abelian group so by Theorem 4.5.9 is also a finitely
generated free abelian group.
Conversely, suppose that Z[α] is a finitely generated free abelian group. Then there exist
a1 , a2 , . . . , an ∈ Z[α] such that every element p(α) ∈ Z[α] can be written as a linear combination
p(α) = c1 a1 + c2 a2 + · · · + cn an with ci ∈ Z.
Note that for all ai , the element αai ∈ Z[α]. Hence, for each i = 1, 2, . . . , n, there exist integers mij
such that
Xn
αai = mij aj . (6.12)
j=1
n
Viewing M = (mij ) as a matrix and ~a ∈ C is the vector whose ith coordinate is ai , then (6.12) can
be written as
α~a = M~a.
~
Since ~a 6= 0, then ~a is an eigenvector of the matrix M and α is an eigenvalue. Thus, α is the root of
the equation det(xI − M ) = 0. Since M is a matrix of integers, then det(xI − M ) is a polynomial in
Z[x]. By considering the Lagrange expansion of the determinant, it is easy to see that det(xI − M )
is monic. Hence, α ∈ OK .
Theorem 6.7.4
The set OK of algebraic integers in K is a subring of K.
6.7. ALGEBRAIC INTEGERS 313
for some qi (x) ∈ Z[x]. Also by Lemma 6.7.3, we deduce that every element in Z[α, β] is an integer
linear combination of the mn elements ai bj with 1 ≤ i ≤ n and 1 ≤ j ≤ m. In particular, every
element in Z[α − β] and in Z[αβ] can be written as a linear combination of ai bj . Hence, both α − β
and αβ are in OK and so OK is a subring of K.
φn = fn φ + fn−1 ,
where fn is the nth term of the Fibonacci sequence. Consequently, for every polynomial p(x) ∈ Z[x],
the element p(φ) can be written as
ᾱ
α−1 = .
αᾱ
√ √
This shows that√Q[ D] is a field so we√write Q( D).
The ring Z[ D] is a subring of Q( D). However, just because
√ √
Z[ D] = {a + b D | a, b ∈ Z}
√
does not imply that it is the ring of integers in Q( D).
Theorem 6.7.5
Let D be a square-free integer. Then
where (√
D if D ≡ 2, 3 (mod 4)
ω= √
1+ D
2 if D ≡ 1 (mod 4).
314 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS
√ √
Proof. Let α = a + b D ∈ Q( D). Then α is a root of mα (x) = x2 − 2a + (a2 − Db2 ). Let
p(x) ∈ Q[x] be any polynomial that has α as a root. Then we know from elementary algebra that
the quadratic conjugate ᾱ must also be a root. But
so from Proposition 6.5.9 we conclude that if p(x) has α as a root, then mα (x) divides p(x). Write
p(x) = mα (x)q(x).
We now assume that p(x) is monic and in Z[x]. Gauss’ Lemma implies that that mα (x) ∈ Z[x].
Hence, instead of considering all polynomials p(x), we only need to consider mα (x). However, since
we conclude that mα (x) ∈ Z[x], we must have 2a ∈ Z and a2 − Db2 ∈ Z.
Write a = 2s for s ∈ Z. Then we also have
s2
− Db2 = t ∈ Z. (6.13)
4
If b is the fraction b = pq , then after clearing the denominators we get q 2 s2 − 4Dp2 = 4q 2 t. Reducing
this equation to mod 4, we find that q 2 s2 ≡ 0 (mod 4). This leads to two nonexclusive cases:
s2 − Dp2 = 4t
where s and p are odd. Considering this equation mod 4 again, we see
√ that this can only hold
if D ≡ 1 (mod 4). Hence, if D ≡ 1 (mod 4), then OQ(√D) = 12 Z[ D], which we can write
√
1+ D
more succinctly as Z[ω] where ω = .
2
Case 2: s ≡ 0 (mod 2). Then a ∈ Q is √ in fact an integer, which implies that b is also an integer.
So if Case 1 does not hold, then Z[ D] = OQ(√D) .
it is easy to check that N is multiplicative in the sense that N (αβ) = N (α)N (β) for all α, β ∈ Q.
It is also easy to check that for all square-free D, on the ring of integers OQ(√D) the field norm is
an integer.
The above two properties lead to the following fact: An element α ∈ OQ(√D) is a unit if and only
if N (α) = ±1 and the inverse of α is
ᾱ
α−1 = .
αᾱ
√
Example 6.7.6 (Eisenstein√Integers). With D = −3, the algebraic integers in Q( −3) form
the ring Z[ω], where ω = 1+i2 3 . Note that the cube roots of unity are the roots of the equation
x3 − 1 = 0, which are given by
√
2 −1 ± i 3
(x − 1)(x + x + 1) = 0 ⇐⇒ x = 1 or x = .
2
6.7. ALGEBRAIC INTEGERS 315
√
Writing ζ = (−1 + i 3)/2 we see that ω = ζ + 1 so OQ[√−3] = Z[ζ]. As an element in Z[ζ], the norm
function is 2 2
b b
N (a + bζ) = a − +3 = a2 − ab + b2 .
2 2
The units in Z[ζ] satisfy a2 − ab + b2 = 1. Solving as a quadratic for a in terms of b requires the
discriminant b2 − 4(b2 − 1) ≥ 0, which leads to b2 ≤ 34 . Similarly for a, we must have a2 ≤ 34 . This
leads to only nine pairs (a, b) to check and we find that the six units are
√ ! √ !
−1 + i 3 1+i 3
±1, ±ζ = ± , ±(1 + ζ) = ± .
2 2
h √ i
The ring Z −1+i 2
3
is called the ring of Eisenstein integers. 4
Definition 6.7.7
Let R ⊆ S, where R and S are commutative rings with an identity. An element s ∈ S is
called integral over R if there exists a monic polynomial f (x) ∈ R[x] such that f (s) = 0.
The ring S is called integral over R if every element in S is integral over R.
Definition 6.7.8
If R ⊆ S where R and S are as in the previous definition, then R is called integrally closed
in S if every element in S that is integral over R belongs to R. If R is an integral domain,
we simply say that R is integrally closed (without “in S”) if R is integrally closed in its
field of fractions.
Example 6.7.9. For example, Z is not integrally closed in Q(i) because i ∈ Q(i) is a root of the
monic polynomial x2 + 1 and i ∈
/ Z. In contrast, as we will see in the following proposition, Z[i] is
integrally closed. 4
Proposition 6.7.10
If K is a number field, then the ring of algebraic integers OK is integrally closed.
is a generating set of Z[a0 , a1 , . . . , an−1 , c]. Since (Z[c], +) is a subgroup of (Z[a0 , a1 , . . . , an−1 , c], +),
it is finitely generated by Theorem 4.5.9. By Lemma 6.7.3, we conclude that c ∈ OK . Hence, OK is
integrally closed.
a + bi = ρ1 ρ2 · · · ρr
in a unique way (up to reordering and multiplication by units). Now consider also the unique
factorization of n in N
If for some i, pi ≡ 3 (mod 4), then the prime pi must divide some N (ρj ). By Proposition 6.4.20, we
must have ρj = pi and N (ρj ) = p2i . If p1 ≡ 1, 2 (mod 4), then also by Proposition 6.4.20, pi factors
into two irreducible elements that are complex conjugates to each other. This proves the following
theorem by Fermat.
Theorem 6.7.12
Given a positive integer n satisfying Theorem 6.7.11, the equation x2 + y 2 = n with (x, y) ∈
Z has 4(a1 +1)(a2 +1) · · · (a` +1) solutions, where ai = ordpi (n) for all the primes pi dividing
n such that pi ≡ 1 (mod 4).
Proof. Counting the number of ways n can be written as a sum of two squares is the same problem
of finding all distinct elements A + Bi ∈ Z[i] such that N (A + Bi) = n. Suppose that the prime
factorization of n in N is
n = 2k pa1 1 · · · pa` ` q12b1 · · · qm
2bm
,
where pi ≡ 1 (mod 4) and qj ≡ 3 (mod 4). Then the prime factorization of n in Z[i] is
where each πi is an irreducible element in Z[i] such that N (πi ) = πi π i = pi . We notice that
(1 − i) = −i(1 + i) so not only is (1 − i) a conjugate to 1 + i it is also an associate. However, this
does not occur for any of the irreducible πi .
A Gaussian integer α such that N (α) = αα = n can only be created by
where u is a unit and where for each ≤ i ≤ `, the nonnegative integers ci and di satisfy ci + di = ai .
For each i, there are ai + 1 ways to choose the pairs (ci , di ). By the unique factorization in Z[i], for
all such pairs (ci , di ) and for all i, the resulting Gaussian integers α are distinct. There are 4 units
in Z[i] so there are exactly 4(a1 + 1)(a2 + 1) · · · (a` + 1) Gaussian integers α such that N (α) = n.
To illustrate Theorem 6.7.12, consider 325 = 52 × 13. In Z[i], we have 5 = (2 + i)(2 − i) and
13 = (3 + 2i)(3 − 2i). The six nonassociate Gaussian integers with a norm of 325 are
Multiplying by the three nontrivial units −1, i, and −i, gives the 3 × 6 = 18 other solutions to
x2 + y 2 = 325.
(b) Consider a parallelogram with vertices 0, 1, ω, and ω + 1. Show that the furthest an interior
point P can be from any of the four vertices. [Hint: Show first that P must be the circumcenter
for one of the triangles of the parallelogram.]
p
(c) Use the previous part to prove that every element in C is at most (1 + |D|)/(4 |D|) away from
some element in Z[ω].
(d) Show that (1 + |D|)2 /16|D| < 1 for D = −3, −7, −11.
(e) Show that d(z) = N (z) = |z|2 is a Euclidean function on Z[ω].
[With previous results, this exercise shows that OQ(√D) is a Euclidean domain with the norm function
as the Euclidean function for D = −1, −2, −3, −7, −11. It turns out, though it is more difficult to
prove, that these are the only negative values of D for which OQ(√D) is a Euclidean domain.]
√ √ h √ i
4. Do the Euclidean division of (21 + 13 −7)/2 by (3 + 5 −7)/2 in Z 1+ 2 −7 . (See Exercise 6.7.3.)
√
5. Consider the element 61 (4 + 4 · 281/3 + 282 3) in the field F = Q( 3 28). Show that this is an algebraic
integer in F by showing that it solves a monic cubic polynomial in Z[x].
6. Find all ways to write 91 as a sum of two squares.
7. Find all ways to write 338 as a sum of two squares.
8. Find all ways to write 61,000 as a sum of two squares.
√
9. Prove that if α is an algebraic integer in C, then n α is another algebraic integer for any positive n.
6.8
Projects
√ √
Project I. The Ring Z[ 2]. In Example 5.2.3 we briefly encountered the ring Z[ 2].
√ √ √
Recall that the norm function on Z[ 2] is N (a + b 2) = |a2 − 2b2 |. Show first that α ∈ Z[√ 2]
is a unit if and only if√N (α) =√1. Primes in Z are√not necessarily still irreducible
√ in Z[ 2].
For √example, 7 = (3 + 2)(3 − 2). However, 3 + 2 is irreducible since N (3 + 2) = 7 so if
3 + 2 = αβ, then N (α)N (β) = 7 so either N (α) = 1 or N (β) = 1, and one of them would be
a unit.
Here are a few questions to pursue: Find some units in this ring. Can you find patterns in the
units,√like some process that may give you many units? Investigate the irreducible elements
in Z[ 2]. Try to find some. Make sure the ones you list are not associates of each other (i.e.,
√
multiples of each other via a unit). Try to find √
any patterns as to which elements in Z[ 2]
are irreducible. (You could plot elements a + b 2 as points (a, b) on a graph and look for
patterns.)
Project II. Non-UFD Rings. Find examples of non-UFD rings. Give a number of examples of
nonequivalent factorizations into irreducibles. Prove that certain elements are irreducible. Can
you find nonequivalent factorizations of the same number in which√ the number of irreducible
factors is different? (Suggestion: Focus on rings of the form Z[ D] or OQ(√D) .)
Project III. Eisenstein Integers. Revisit Example 6.7.6 about Eisenstein integers. Look for
irreducible elements in the ring of Eisenstein integers. The Eisenstein integers form a hexagonal
lattice in C. Can you discern any patterns in the irreducible elements in this lattice?
Project IV. RSA in Fp [x]. (For students who have addressed Project VII in Chapter 4.) Modify
the theory for RSA to work over the ring Fp [x]. What takes the role of the primes and the
product of two primes? Decide whether RSA over Z or RSA over Fp [x] is better and state
your reasons why.
6.8. PROJECTS 319
Introductory courses in modern algebra usually present groups, rings, and fields as the three most
important algebraic structures. Though fields are a particular class of rings, they possess unique
properties that lead to many fruitful investigations. This is why they are often viewed as their
own algebraic structure. Chapter 6 studied properties of divisibility in commutative rings. Since
every nonzero element in a field has a multiplicative inverse, questions concerning divisibility are
not interesting: Every nonzero element is an associate to every other. Nonetheless, fields possess a
rich structure.
Field theory has a number of applications internal to mathematics and valuable for digital com-
munication and information security. However, the study of polynomial equations (e.g., how to solve
them, attempts to find patterns in the roots) motivated the development of field theory.
Matching up our study of fields to the list of key themes as outlined in the preface, this chapter
focuses on general properties, key examples, important objects, ways to conveniently describe fields,
and subobjects (subfields). Chapter 11 on Galois theory focuses on properties of homomorphisms
between fields. Both chapters introduce many applications. In fact, as we will see in this chapter
and in Chapter 11, field theory answered many problems in mathematics that had been unsolved
for centuries, not only in classical algebra but also in Euclidean geometry.
7.1
Introduction to Field Extensions
We have already seen that a field is a commutative ring with an identity in which every nonzero
element has an inverse. We have not, however, yet investigated subfields, homomorphisms between
fields, how to concisely describe (or generate) a field, and other important issues as outlined in the
preface of the book. The particular properties of these aspects of field theory are what warrant
studying fields in their own right.
Proposition 7.1.1
A homomorphism of fields ϕ : F → F 0 either is identically 0 or is injective.
Proof. The kernel Ker ϕ is an ideal in F . However, the only two ideals in F are (0) and F itself. If
Ker ϕ = (0), then ϕ is injective. If Ker ϕ = F , then ϕ is identically 0.
In previous sections, we termed an injective homomorphism (in group theory or ring theory)
an embedding. So we call an injective homomorphism an embedding of F into F 0 . By the first
isomorphism theorem for rings, when there is an embedding of F in F 0 , there exists a subring of F 0
that is isomorphic to F . Consequently, we can view F as a subfield of F 0 . Thus, the existence of
nontrivial homomorphisms between fields is tantamount to containment.
Recall that the characteristic char(R) of a ring R is either 0 or the least positive integer n such
that n · 1 = 0 if such an n exists. By Exercise 5.1.21, the characteristic of an integral domain is
either 0 or a prime number p. Since fields are integral domains, this result applies. The concept of
characteristic of a field leads to a slightly more nuanced concept.
321
322 CHAPTER 7. FIELD EXTENSIONS
Definition 7.1.2
The prime subfield of a field F is the subfield generated by the multiplicative identity.
In other words, the prime subfield is the smallest (by inclusion) field in F that contains the
identity. Suppose that a field has positive characteristic p. Then the prime subfield must contain
the elements
0, 1, 2 · 1, . . . , (p − 1) · 1
and p · 1 = 0. However, the multiplication on these elements as defined by distributivity gives this
set of elements the structure of Fp = Z/pZ. On the other hand, if we suppose that the field F has
characteristic 0, then F must contain
Therefore, Z is contained in F . But then the field F must also contain the field of fractions of Z,
namely Q. Thus, Q is the prime field of F . We have proven the following proposition.
Proposition 7.1.3
Let F be a field. The prime subfield of F is Q if and only if char(F ) = 0 and the prime
subfield of F is Fp if and only if char(F ) = p.
We have encountered a few fields before. For example, Q, R, and C are fields of characteristic 0
while the finite field Fp is by definition of characteristic p.
Definition 7.1.4
If K is a field containing F as a subfield, then K is called a field extension (or simply
extension) of F . This relationship of extension is often denoted by K/F .
The notation for field extension, resembles the usual notation for a quotient ring but the two
constructions are not related. There is never a confusion of notation because the construction of
quotient fields never occurs in field theory. Indeed, the only ideals in a field are the trivial ideal or
the whole field, so the only resulting quotient rings are the trivial field and the field itself.
By Proposition 7.1.3, every field is a field extension of either Q or Fp for some prime p.
Previous sections illustrated a few ways to construct some field extensions. These will become
central to our study of field extensions.
The first method uses a commutative ring generated by elements and then passing to the associ-
ated field of fractions. In other words, suppose that F is a field contained in some integral domain
R. If α ∈ R − F , then F [α] is the subring of R generated by R and α. See Section 5.2.1. Since F [α]
is an integral domain, we can take the field of fractions F (α) of F [α]. See Section 6.2. In this way,
we construct the field extension K = F (α) of F . This construction extends to fields generated by
F and subsets S ⊆ R − F such √ that R(S) √ is the field of fractions of the integral domain R[S].
For example, the fields Q( 2) or Q( 3 7, i) are field extensions of Q, inside C.
This method presupposes that F is a subring of some integral domain R and hence is a subfield
of the field of fractions of R.
The second method involves quotient rings of a polynomial ring. Let F be a field. Since F [x]
is a Euclidean domain, it is also a PID so every ideal I in F [x] is of the form I = (p(x)) for some
polynomial p(x) ∈ F [x]. By Proposition 5.7.10, the ideal (p(x)) is maximal if and only if it is prime
if and only if p(x) is irreducible. So if p(x) is a irreducible polynomial of degree greater than 0, then
the quotient ring F [x]/(p(x)) is a field. Furthermore, the inclusion F into F [x]/(p(x)) is an injection
so F is a subfield of F [x]/(p(x)).
Example 7.1.5. Consider the polynomial p(x) = x2 − 5. By Proposition 6.5.11,√it has no rational
roots so by Proposition 6.5.12 p(x) is irreducible in Q[x]. Then Q[x]/(p(x)) = Q[ 5] is a field. It is
7.1. INTRODUCTION TO FIELD EXTENSIONS 323
√
obvious by construction that √Q[ 5] is an integral domain. To show that every nonzero element is
invertible, consider α = a + b 5 6= 0. Then
√ √
1 a−b 5 a−b 2
= √ √ = 2 . (7.1)
α (a + b 5)(a − b 5) a − 5b2
2
Since there is no rational number pq such that pq2 = 5, the denominator of this expression is a nonzero
rational number so (7.1) gives a formula for the inverse. 4
√
One may attempt to generalize the idea in Example 7.1.5 and ask if a ring such as Q[ 3 7] is also
a field. This turns out to be true but the proof becomes more difficult than √ the proof√in the above
Example 7.1.5. For example, finding the inverse to an element such as 1 + 3 3 7 − 12 ( 3 7)2 is not as
simple.
Proposition 7.1.6
Let K be a field extension of F . Then K is a vector space over the field F .
Proof. With the addition, (K, +) is an abelian group. Furthermore, by distributivity and associa-
tivity of the multiplication in K, the following properties hold:
• r(α + β) = rα + rβ for all r ∈ F and all α, β ∈ K;
• (r + s)α = rα + sα for all r, s ∈ F and all α ∈ K.
324 CHAPTER 7. FIELD EXTENSIONS
The great value of this simple proposition is that it makes it possible to use the theory of vector
spaces to derive information and structure about extensions of F . The concept of degree is an
important application of this connection between vector spaces over F and extensions of F .
Definition 7.1.7
If the extension K/F has a finite basis as an F -vector space, then the degree of the extension
K/F , denoted [K : F ], is the dimension of K as a vector space over F . In other words
[K : F ] = dimF K. If the extension K/F does not have a finite basis, we say that the
degree [K : F ] is infinite.
√ √
Example
√ 7.1.8. Consider the√extension Q( 5) over Q. According√ to Example 7.1.5, Q( 5) =
in Q( 5) can be written uniquely as a + b 5 for some a, b ∈ Q. Hence,
Q[ √5] so every element √
{1, 5} is a basis for Q( 5) over Q. Thus,
√
[Q( 5) : Q] = 2. 4
The two methods outlined above for constructing field extensions of a given field F appear quite
different. However, a first key result that emerges from the identification of a field extension of F
with a vector space over F is that these two methods are in fact the same. We develop this result
in the theorems below.
Theorem 7.1.10
If [F (α) : F ] = n is finite, then α is the root of some irreducible polynomial p(x) ∈ F [x] of
degree n. Furthermore, F (α) = F [α].
a0 + a1 α + a2 α2 + · · · + ak αk
where ai ∈ F . All these linear combinations are polynomials in F [x] evaluated at α. Since F (α)
has dimension n as a vector space over F , the set {1, α, α2 , . . . , αn } is linearly dependent. Thus,
there exists some nontrivial polynomial q(x) of degree at most n such that q(α) = 0. Let p(x) be a
polynomial of least degree such that p(α) = 0 and let us call deg p(x) = d. (The fact that such a
polynomial exists follows from the well-ordering principle of the integers and the fact that [F (α) : F ]
is finite.)
We claim that p(x) is irreducible in F [x]. Suppose that p(x) is not irreducible. Then p(x) =
p1 (x)p2 (x) with p1 (x), p2 (x) ∈ F [x] with positive degrees. Thus, evaluating p(x) at the root α gives
Since there are no zero divisors in F , p1 (α) = 0 or p2 (α) = 0. This contradicts the fact that p(x)
is a polynomial in F [x] of minimal degree for which α is a root. Hence, we conclude that p(x) is
irreducible.
7.1. INTRODUCTION TO FIELD EXTENSIONS 325
This expresses αd as a linear combination of {1, α, · · · , αd−1 }. By a recursion argument, we can see
that for all m ≥ 0, the element αm can also be written as a linear combination of {1, α, · · · , αd−1 }.
Hence, the set of powers of α spans F [α] as a vector field over F . However, a priori F [α] is only a
subset of F (α).
By definition, every element in F (α) can be written as a rational expression of α, namely
a(α)
γ= where a(x), b(x) ∈ F [x] and b(α) 6= 0.
b(α)
Suppose also that a(x) and b(x) are chosen such that b(x) has minimal degree and γ = a(α) b(α) .
Performing the Euclidean division of p(x) by b(x) we get p(x) = b(x)q(x) + r(x), where deg r(x) <
6 0. Then
deg b(x) or r(x) = 0. Assume that r(x) =
Hence, a(α)/b(α) can be written as a2 (α)/b2 (α) where deg b2 (x) < deg b(x). This contradicts the
choice that b(x) has minimal degree. Consequently, r(x) = 0 and hence b(x) divides a(x). Then
the expression γ = a(α)/b(α) can only be such that b(x) has minimal degree among such rational
expressions if b(x) is a constant. Consequently, F (α) ⊂ F [α] and therefore F (α) = F [α].
Consequently, it also follows that d = dimF F (α) = n and p(x) is an irreducible polynomial such
that p(α) = 0.
Definition 7.1.11
Let F be a field. An extension field K over F is called simple if K = F (α) for some α ∈ K.
It is important to note as a contrast that F (α) is not necessarily equal to F [α] if [F (α) : F ] is
infinite which may occur if α is not the root of a polynomial in p(x). For example, keeping t as a
free parameter, F [t] is a subring of F (t). Furthermore, t is not a unit in F [t] whereas in F (t), the
multiplicative inverse to the polynomial t is the rational expression 1t . Hence, F [t] is a strict subring
of F (t).
Theorem 7.1.10 may feel unsatisfactory because the hypotheses assumed that α was some element
in a field extension of F that remained unspecified. So one might naturally ask whether there exists
a field extension of F that contains some α such that p(α) = 0. We have already seen that the answer
to this question is yes and we encapsulate the result in the following converse to Theorem 7.1.10.
Theorem 7.1.12
Let p(x) ∈ F [x] be an irreducible polynomial of degree n. Then K = F [x]/(p(x)) is a field
in which θ = x = x + (p(x)) that satisfies p(θ) = 0. Furthermore, the elements
1, θ, θ2 , . . . , θn−1
Example 7.1.13. As an example of constructing a field extension, let F = F2 and consider the
polynomial p(x) = x3 + x + 1. Since p(x) has no roots in F2 and since it is a cubic, it is irreducible
by Proposition 6.5.12. Hence, K = F2 [x]/(x3 + x + 1) is a field extension of F2 with [K : F2 ] = 3.
Consequently, K is a finite field containing 23 = 8 elements. 4
326 CHAPTER 7. FIELD EXTENSIONS
a(x)q(x) + b(x)p(x) = 1
in F [x]. In the quotient ring K, this implies that a(x)q(x) = 1. Thus, in K, a(α)q(α) = 1, so that
a(α) is the inverse to q(α). This method is not the simplest method to find inverses in K. The
example below illustrates this method and a faster algorithm that uses linear algebra.
Example 7.1.14. Let F = Q. Consider p(x) = x3 − 2. This is irreducible in Q[x]. Then K =
Q[x]/(x3 − 2) is a field and√we denote by
√ θ an element in K such that θ3 − 2 = 0. (For simplicity,
we could assume that θ = 2 ∈ C but 2 is not the only complex number that solves x3 − 2 = 0.)
3 3
K = {a + bθ + cθ2 | a, b, c ∈ Q}.
1
α + β = 8 + 2θ + θ2 .
2
For the product, we remark that θ3 = 2 so also θ4 = 2θ and thus
3 1 1
αβ = 15 + 9θ − θ2 − 5θ − 3θ2 + (2) + 5θ2 + 3(2) − (2θ)
2 2 2
1 2
= 22 + 3θ + θ .
2
To find the inverse of α via the Extended Euclidean algorithm, we first find a linear combination
between x3 − 2 and x2 − x + 3. The Euclidean Algorithm gives
Linear algebra offers an easier way to find inverses (and, more generally, compute division).
Suppose that α−1 = a0 + a1 θ + a2 θ2 . Then this element satisfies
(3 − θ + θ2 )(a0 + a1 θ + a2 θ2 ) = 1
⇐⇒3a0 + (3a1 − a0 )θ + (3a2 − a1 + a0 )θ2 + (−a2 + a1 )θ3 + a2 θ4
⇐⇒3a0 + (3a1 − a0 )θ + (3a2 − a1 + a0 )θ2 + (−a2 + a1 )(2) + a2 (2θ)
⇐⇒(−2a2 + 2a1 + 3a0 ) + (2a2 + 3a1 − a0 )θ + (3a2 − a1 + a0 )θ2 .
Interpreting this calculation gives precisely the same result for (3 − θ + θ2 )−1 as above. 4
F [α] ∼
= F [x]/(p(x)) ∼
= F [β].
In fact, the (composition) isomorphism described above satisfies f : F [α] → F [β] with f (c) = c for
all c ∈ F , f (α) = β, and all other values of f resulting from the axioms of a homomorphism.
It is not at all uncommon that F [α] 6= F [β] so this isomorphism is not trivial or even an
automorphism. Recall from group theory that an automorphism is an isomorphism from a field to
itself.
We have shown that there is a close connection between properties of subfields and morphisms
(homomorphisms between fields). The only nontrivial morphisms are either injections, which are
embeddings, and isomorphisms. Despite or perhaps because of this restriction, the study of field
extensions and automorphisms of fields is rich and has many applications.
6. Consider the field of order 8 constructed in Example 7.1.13. Call θ an element in F such that
θ3 + θ + 1 = 0, so that we can write F = F2 [θ].
(a) Let α = θ2 + 1, β = θ2 + θ + 1, and γ = θ. Calculate: (i) αγ + β; (ii) α/γ; (iii) α2 + β 2 + γ 2 .
(b) Solve for x in terms of y in the equation y = αx + β.
(c) Prove that the function f : F → F defined by f (α) = α3 is a cyclic permutation on F .
7. Consider the field F of order 8 constructed in Example 7.1.13. Prove that U (F ) is a cyclic group.
√ √
8. Prove that {1, 2, 3} are linearly independent in C as a vector space over Q.
√ √
9. Prove that {1, 3, i, i 3} are linearly independent in C as a vector space over Q.
√ √
10. Consider the ring K = Q[ 2, 5].
√ √
(a) Prove that K = Q[ 2 + 5] and prove also that this is a field. Indicate [K : Q].
√ √ √ √ √
(b) Set γ = 2 + 5. Show that B1 = {1, 2, 5, 10} and B2 = {1, γ, γ 2 , γ 3 } are two bases of K
as a vector space over Q.
(c) Determine the change of coordinate matrix from the basis B2 to B1 coordinates.
(d) Use part (c) to write 2 + 3γ 2 − 7γ 3 in the basis B1 .
√ √ √
(e) Use part (c) to write −3 + 2 − 5 + 7 10 as a linear combination of {1, γ, γ 2 , γ 3 }.
11. Construct a field of 9 elements and write down the addition and multiplication tables for this field.
12. Consider the field F3 (t) of rational expressions with coefficients in F3 . Let
2t + 1 1 t+1
α= , β= , γ= .
t+2 2t2 + 1 t2 + 1
Calculate (a) α + β; (b) βγ; (c) αγ/β.
13. Prove that an automorphism of a field F leaves the prime subfield of F invariant.
√ √ √ √
14. Prove that the function f : Q[ 5] → Q[ 5] defined by f (a + b 5) = a − b 5 is an automorphism.
15. Prove that there exists an isomorphism of fields f : R → R that maps π to −π.
16. Let K = F (α) where α is the root of some irreducible polynomial p(x) ∈ F [x]. Suppose that p(x) =
an xn + · · · + a1 x + a0 . Show that the function fα : K → K defined by fα (x) = αx is a linear
transformation and that the matrix of fα with respect to the ordered basis (1, α, α2 , . . . , αn−1 ) is
0 0 0 ··· 0 −a0 /an
1 0 0 · · · 0 −a1 /an
0 1 0 · · · 0 −a2 /an
.
. . . .
.. .. .. . . ... ..
.
0 0 0 · · · 1 −an−1 /an
17. (Analysis) Prove that the only continuous automorphism on the field of real numbers is the identity
function.
18. Let ϕ : F → F 0 be an isomorphism of fields. Let p(x) ∈ F [x] be an irreducible polynomial and let
p0 (x) be the polynomial obtained from p(x) by applying ϕ to the coefficients of p(x). Let α be a root
of p(x) in some extension of F and let β be a root of p0 (x) in some extension of F 0 . Prove that there
exists an isomorphism
Φ : F (α) → F 0 (β)
such that Φ(α) = β and Φ(c) = ϕ(c) for all c ∈ F .
√ √
19. Let D be a square-free integer and let K = Q[ D]. Prove that the function f : Q[ D] → M2 (Q)
defined by
√
a Db
f (a + b D) =
b a
√
is an injective ring homomorphism. Conclude that M2 (Q) contains a subring isomorphic to Q[ D].
√
20. Consider the field Q[ 3 2].
√
(a) Prove that the function ϕ : Q[ 3 2] → M3 (Q) defined by
√ √ a 2c 2b
3 3 2
ϕ(a + b 2 + c 2 ) = b a 2c
c b a
is an injective
√ homomorphism. Conclude that M3 (Q) contains a subring that is a field isomorphic
to Q[ 3 2].
7.2. ALGEBRAIC EXTENSIONS 329
√
3
√ 2
(b) Use this homomorphism and matrix inverses to find the inverse of 3 − 2+532 .
21. Consider the field of rational expressions K1 = Q(x) with coefficients in Q and also the field K2 =
√
Q( p | p is prime). Prove that K1 and K2 are extensions of Q of infinite degree. Prove also that K1
and K2 are not isomorphic.
7.2
Algebraic Extensions
Section 7.1 introduced field extensions and emphasized the properties that follow from viewing an
extension of a field F as a vector space over F . Theorem 7.1.10 brought together two disparate
ways of constructing field extensions. It is also precisely this theorem that connects field theory
so closely with the study of polynomial equations. This section further develops consequences of
Theorem 7.1.10 by studying field extensions K/F as a field K containing roots of polynomials in
F [x].
Definition 7.2.1
An element α ∈ K is called algebraic over F if α is a root of some nonzero polynomial
f (x) ∈ F [x]. If α ∈ K is not algebraic over F , then α is called transcendental over F .
If every element of K is algebraic over F , then the extension K/F is called an algebraic
extension.
√
Consider the fields Q ⊆ R. The element 2 in R is algebraic over Q because it is the root of
x2 − 2. As another example, note that it is easy to show that cos(3θ) = 4 cos3 θ − 3 cos θ. Hence,
setting θ = π9 , we see that
π π π 1
4 cos3 − 3 cos = cos = .
9 9 3 2
Hence, though we do not know the value of cos(π/9), we see that it is a root of the cubic equation
4x3 − 3x − 12 = 0 and so it is algebraic over Q. By an abuse of language, if we say that a number is
algebraic (with no other qualifiers) we usually imply that K = C and F = Q.
A first important property about algebraic elements is that for each algebraic element α there
exists a naturally preferred polynomial with α as a root.
Proposition 7.2.2
Let K/F be a field extension and let α ∈ K be algebraic over F . There exists a unique
monic irreducible polynomial mα,F (x) ∈ F [x] such that α is a root of mα,F (x).
Proof. Consider the set of polynomials S = {p(x) ∈ F [x] − {0} | p(α) = 0}. Since α is algebraic over
F , the set S is nonempty. By the well-ordering principle, S contains an element p(x) of least degree n.
Let p(x) be an element of S of least degree. Assume p(x) is reducible with p(x) = p1 (x)p2 (x). Then
0 = p(α) = p1 (α)p2 (α) so p1 (α) = 0 or p2 (α) = 0. So p1 (x) ∈ S or p2 (x) ∈ S but this contradicts
the minimality of p(x) in S. Thus, any polynomial in S of minimum degree is irreducible.
Now let
be two polynomials of least degree in S. Then α is also a root of q(x) = bn a(x) − an b(x) so q(x)
is either 0 or in S. However, the subtraction cancels the leading terms so q(x) is either 0 or has
deg q(x) < n. Since n is the least degree of any polynomial in S, we conclude that q(x) = 0.
Thus, bn a(x) = an b(x) and any two polynomials in S of least degree are multiples of each other.
Consequently, there exists a unique monic irreducible polynomial in S.
The notation of mα,F (x) indicates the dependence of the polynomial on the specific field of
coefficients. The polynomial may change based on the context of the field of coefficients but, as the
following corollary shows, the corresponding polynomials in different fields are related.
Corollary 7.2.3
Let F ⊆ L ⊆ K be a chain of fields and suppose that α ∈ K is algebraic over F . Then α
is algebraic over L and mα,L (x) divides mα,F (x) in L[x].
Proof. Both polynomials mα,L (x) and mα,F (x) are in L[x] and have α as a root. The polynomial
division in L[x] of the two polynomials gives
where r(x) = 0 or deg r(x) < deg mα,L (x). However, since α is a root of both polynomials, we
deduce that r(α) = 0. Since deg mα,L (x) is the least degree of a nonzero polynomial in L[x] that
has α as a root, then r(x) = 0 and hence mα,L (x) divides mα,F (x).
Definition 7.2.4
The polynomial mα,F (x) is called the minimal polynomial for α over F . The degree of the
algebraic element α over F is the degree deg mα,F (x).
The proof of Theorem 7.1.10 already established the following proposition but we restate it in
this context to make the connection explicit.
Proposition 7.2.5
Let α be algebraic over F . Then F (α) ∼
= F [x]/(mα,F (x)) and [F (α) : F ] = deg mα,F (x).
This proposition illustrates the reason for using the term “degree” as opposed to just “dimension”
for the quantity [F (α) : F ].
√
Example 7.2.6. Consider the element α = 3 7 over Q. It is a root of x3 − 7, however, we do not
yet know that this is the minimal polynomial m √ 3
7,Q (x). By Proposition 6.5.11, we can tell that
x3 − 7 does not have a rational root. Since it is a cubic, we deduce that it is irreducible. Hence,
mα,Q (x) = x3 − 7. 4
√ √
Example 7.2.7. Consider the element α √ = 2+ 3 ∈ C. We determine √ the degree and the minimal
polynomial over Q. Note that α2 = 2 + 2 6 + 3 so then α2 − 5 = 2 6. Hence,
So α is a root of p(x) = x4 − 10x2 + 1. We do not yet know if p(x) is the minimal polynomial of α,
since we have not checked if it is irreducible. By the Rational Root Theorem, and since neither 1
nor −1 is a root of p(x), then p(x) has not roots and hence no linear factors. If p(x) is reducible,
then it must be the product of two quadratic polynomials. Furthermore, without loss of generality,
we can assume the polynomials are monic. Furthermore, by Gauss’ Lemma, if p(x) factors over Q,
then it must factor over Z. Hence, we see that if p(x) factors, then there are two cases:
Hence,√α is a root of one of those two quadratics. By direct observation, we find that mα,L (x) =
x2 − 2 2x − 1. √ √ √
To take a different approach
√ in looking for mα,K (x), where√K = Q( 3), notice that√α− 3 = 2.
After squaring, α2 − 2 3α + 3 = 2 so α is a root of x2 − 2 3x + 1. Since
√ α∈ / Q( 3), then α is
not a root of a degree-1 polynomial so we must have mα,K (x) = x2 − 2 3 + 1. Again, we see that
mα,K (x) divides mα,Q (x) since
√
mα,Q (x) = mα,K (x)(x2 + 2 3x + 1). 4
To recapitulate some of our results, Theorem 7.1.10 established that if [F (α) : F ] is finite then
α is algebraic. Conversely, if [F (α) : F ] = n is finite, then for some n, the set {1, α, α2 , . . . , αn } is
linearly dependent, so there exist nonzero ci ∈ F such that
cn αn + · · · + c1 α + c0 = 0.
Thus, α is algebraic. Notice that this reasoning extends further and supports the following propo-
sition.
Proposition 7.2.8
If the extension K/F is of finite degree then every element in K is algebraic and hence K
is an algebraic extension.
We should underscore that this implication is not an “if and only if” statement. Indeed, it is easy
to construct algebraic extensions that are not of finite degree. However, in order to make examples
of this precise, we need a few more facts about degrees.
Theorem 7.2.9
Let F ⊆ K ⊆ L be fields. The degrees of the extensions satisfy
[L : F ] = [L : K][K : F ].
332 CHAPTER 7. FIELD EXTENSIONS
Proof. Suppose first that either [L : K] or [K : F ] is infinite. If [K : F ] is infinite, then there does
not exist a finite basis for K as a vector space over F . Since L contains K as a subspace then
dim L ≥ dim K so L does not have a finite basis over F either so [L : F ] is infinite. If [L : K] is
infinite, then L does not have a finite basis as a vector space of K. If L possessed a finite basis as
a vector space over F , then since F ⊂ K, this would serve as a finite basis of L over K. Hence, if
[L : K] is infinite, then so is [L : F ].
Now suppose that [L : K] = n and [K : F ] = m. Let {α1 , α2 , . . . , αn } be a basis of L over K
and let {β1 , β2 , . . . , βm } be a basis of K over F . Every element x ∈ L can be written as
n
X
x= ci αi
i=1
Since {βj }m
j=1 forms a basis of K over F , then for each j,
n
X
dij αi = 0.
i=1
Since the set {αi }ni=1 forms a basis of L over K, it is linearly independent so dij = 0 for all pairs
(i, j). Thus,
{αi βj | 1 ≤ i ≤ n, 1 ≤ j ≤ m}
is a linearly independent set. Hence, it forms a basis of L over F and so
[L : F ] = dimF L = nm = [L : K][K : F ].
Though Theorem 7.2.9 describes how the degree evaluates on extensions of extensions, it can
also be used to deduce information about field containment as the following example shows.
√ √ √
Example
√ 7.2.10. We prove that√ 7 ∈
/ Q(
√
3
7). The minimal polynomial of 3 7 over Q is x3 − 7 so
[Q( 3 7) : Q] = 3. Assuming that 7 ∈ Q( 3 7). Then
√ √
3
Q ⊆ Q( 7) ⊆ Q( 7).
So by Theorem 7.2.9, √ √ √ √
3 3
[Q( 7) : Q( 7)][Q( 7) : Q] = [Q( 7) : Q].
√3
√
Hence,√3 = 2[Q(
√ 7) : Q( 7)], which is a contradiction because degrees of extensions are integers.
Thus, 7 ∈ / Q( 3 7).
Note that this reasoning is much easier than proving directly that there do not exist a, b, c ∈ Q
√ √ √ 2
such that 7 = a + b 3 7 + c 3 7 . 4
7.2. ALGEBRAIC EXTENSIONS 333
4 2
√ √
Example 7.2.11. In Example 7.2.7 we √ found that mα,Q (x)√= x − 10x + 1 for α = 2 + 3.
√ : Q] = 4. Since Q ⊆ Q( 2) ⊆ Q(α), and [Q( 2) : Q] = 2, then by Theorem
Hence, [Q(α) √ 7.2.9,
[Q(α) : Q( 2)] = 2. Hence, α is the root of an irreducible quadratic polynomial in Q( 2)[x]. 4
Definition 7.2.12
A field extension K over F is said to be simple field extension if K = F (α) for some element
α ∈ K. Moreover, the element α is called a primitive element of K over F .
The extension K is said to be finitely generated if K = F (α1 , α2 , . . . , αk ) for some elements
α1 , α2 , . . . , αk ∈ K.
This definition makes no assumption that the generating elements α1 , α2 , . . . , αk are algebraic
over F .
Note that if F is a field then F (α, β) = F (α)(β), or more precisely the field extension over F
generated by α and β is equal to the field extension over F (α) generated by β. Of course, this also
implies that F (α, β) = F (β)(α).
Suppose now that α1 , α2 , · · · , αk are all algebraic over a field F and have degree deg αi = ni .
We define a chain of subfields Fi by F0 = F and
Fi = F (α1 , α2 , · · · , αi ) for 1 ≤ i ≤ k.
We then have
F = F0 ⊆ F1 ⊆ F2 ⊆ · · · ⊆ Fk .
For all i we have ni = deg mαi ,F (x). Furthermore, Fi = Fi−1 (αi ) so by Corollary 7.2.3, mαi ,Fi−1 (x)
divides mαi ,F (x). Thus, [Fi : Fi−1 ] ≤ ni . Therefore,
This give an upper bound for [Fk : F ]. However, Theorem 7.2.9 gives a lower bound on [Fk : F ]
because F ⊆ F (αi ) ⊆ Fk . This establishes the important theorem.
Theorem 7.2.13
A field extension K/F is finite if and only if K is generated by a finite number of algebraic
elements over F . If these algebraic elements have degrees n1 , n2 , . . . , nk over F , then
lcm(n1 , n2 , . . . , nk ) [K : F ] and [K : F ] ≤ n1 n2 · · · nk .
which is less than 4 × 6 = 24, the product of the degrees of the generating algebraic elements. This
gives another example where [K : Q] is strictly less than the product of the degrees of the generating
algebraic elements. 4
Theorem 7.2.13, along with other theorems on degrees, lead to a powerful corollary that would
be rather difficult to prove directly from the definition of algebraic elements.
Corollary 7.2.17
Let α and β be two algebraic elements over a field F . Then the following elements are also
algebraic:
α
α + β, α − β, αβ, (for β 6= 0).
β
Proof. Suppose that α and β are algebraic over F with degrees n1 and n2 , respectively. By Theo-
rem 7.2.13, [F (α, β) : F ] ≤ n1 n2 . Let γ be α + β, α − β, αβ, or α/β. Then F (γ) is a subfield of
F (α, β). By Theorem 7.2.9, [F (γ) : F ] divides [F (α, β) : F ] so [F (γ) : F ] = d is finite. By Theo-
rem 7.1.10, γ is the root of an irreducible polynomial of degree d in F [x]. Hence, γ is algebraic.
√
1+ 2 √ √
Example 7.2.18. Consider the element γ = √ . This is an element in Q( 2, 3), which has
1+ 3
degree 4 over Q. Thus, γ is algebraic. For completeness, we can look for the minimal polynomial of
γ. We first have √ √ √ √
γ(1 + 3) = (1 + 2) =⇒ γ − 1 = 2 − γ 3.
Squaring both sides gives
√ √
γ 2 − 2γ + 1 = 2 − 2 6γ + 3γ 2 =⇒ 2 6γ = 2γ 2 + 2γ + 1.
Let K be a field and consider the set of subfields of K. Consider the relation “is an extension of
finite degree of” on the set of subfields of K. Theorem 7.2.9 proves that this relation is transitive.
Since the relation of algebraic extension also satisfies antisymmetry and reflexivity, we deduce that
it is a partial order on the subfields of K.
We can now mention a few√field √ extensions
√ √ that are algebraic and of infinite degree. For example,
it is possible to show that Q( 2, 3, 5, 7, . . .) is an algebraic extension of infinite degree over Q.
(See Exercise 7.2.19.) This also provides an example of a field extension that is not finitely generated
but still algebraic.
7.2. ALGEBRAIC EXTENSIONS 335
Another important example of an algebraic extension of infinite degree is the subfield of C of all
algebraic numbers, denoted Q, i.e., complex numbers that are roots of polynomials in Q[x]. The set
of algebraic numbers forms a field by virtue of Corollary 7.2.17. It is easy to√see that Q does not
have finite degree over Q since, for all positive integers, it contains the field Q( n 2) which has degree
n over Q. Thus, [Q : Q] is greater than every positive integer n so this degree is infinite.
An interesting property about algebraic numbers is that Q is a countable set. (See Exer-
cise 7.2.23.) To many who first encounter it, the result that Q is countable feels counterintuitive
since the set of reals is uncountable and every real number can be approximated to arbitrary preci-
sion by rational numbers. As we think of all the possibilities covered by algebraic numbers and how
few numbers we know for certain to be transcendental, it seems even more counterintuitive that Q
is a countable subset of the uncountable set C.
Proposition 7.2.19
Define the relation 4 on Alg(L/F ) by K1 4 K2 if K2 is an algebraic extension of K1 . Then
4 is a partial order on Alg(L/F ).
Proof. For all K ∈ Alg(L/F ), it is obvious that K is an algebraic extension of itself. Hence, the
relation 4 is reflexive. If K1 4 K2 and K2 4 K1 , then K1 ⊆ K2 and K2 ⊆ K1 so K1 = K2 . Hence,
4 is antisymmetric.
Suppose that K3 is an algebraic extension of K2 , which in turn is an algebraic extension of K1 .
Let α ∈ K3 . Since K3 is algebraic over K2 , α is the root of a minimal polynomial
mα,K2 (x) = cn xn + · · · + c1 x + c0 ∈ K2 [x].
Since K2 is algebraic over K1 , then each coefficient ci is algebraic over K1 . By Theorem 7.2.13, the
degree [K1 (c0 , c1 , . . . , cn ) : K1 ] divides (degK1 c0 )(degK1 c1 ) · · · (degK1 cn ). In particular, this field is
finite. Furthermore, by Theorem 7.2.9,
[K1 (α) : K1 ] = [K1 (α) : K1 (c0 , c1 , . . . , cn )][K1 (c0 , c1 , . . . , cn ) : K1 ] = n[K1 (c0 , c1 , . . . , cn ) : K1 ].
Hence, K1 (α) is a finite extension of K1 . Thus, K3 is an algebraic extension of K1 and 4 is
transitive.
For any two subfields K1 and K2 of L that are algebraic extensions of F , the intersection K1 ∩K2
is again subfield of L that is an algebraic extension of F . It is the greatest lower bound between K1
and K2 with respect to the partial order of “algebraic extension of.”
Let L be an extension of a field F and let K1 , K2 be two subfields of L that are algebraic
extensions of F . It is easy to show that the intersection K1 ∩ K2 is a field extension of F . In general,
the union K1 ∪ K2 is not another field extension. In order to show that the partial order of algebraic
extensions as a least upper bound for any two algebraic extensions of F , we need to introduce the
composite of fields.
Definition 7.2.20
Let K1 and K2 be two subfields of any field E. Then the composite field K1 K2 is the
smallest subfield of E (by inclusion) that includes both K1 and K2 .
336 CHAPTER 7. FIELD EXTENSIONS
Proposition 7.2.21
Let K1 and K2 be two finite extensions of a field F , both contained in a field extension L.
Then [K1 K2 : F ] ≤ [K1 : F ][K2 : F ].
Proposition 7.2.22
Let K1 and K2 be two subfields of a field L. Let γ ∈ K1 K2 . Then
α1 β1 + α2 β2 + · · · + αm βm
γ= . (7.2)
a1 b1 + a2 b2 + · · · + an bn
for some integers m and n and for some elements α1 , α2 , . . . , αm ∈ K1 , a1 , a2 , . . . , an ∈ K1 ,
β1 , β2 , . . . , βm ∈ K2 and b1 , b2 , . . . , bn ∈ K2 .
Proof. Let S be the set of all elements in L of the form (7.2), assuming the denominator is nonzero.
For all α ∈ K1 , we have α = (α · 1)/(1 · 1) ∈ S. Hence, K1 ⊆ S and similarly K2 ⊆ S.
Since K1 K2 contains K1 and K2 and is a field, S ⊆ K1 K2 . Performing distributivity on a
product of linear combinations
(α1 β1 + α2 β2 + · · · + αm βm )(a1 b1 + a2 b2 + · · · + an bn )
produces a linear combination (with possibly mn terms) of products of elements from K1 and K2 .
Consider the difference of two elements in S. By performing cross-multiplication and distributivity
on products of linear combinations, one recovers another expression of the form (7.2). Hence, by the
One-Step Subgroup Criterion, (S, +) is a subgroup of (L, +). Similarly, the division of two nonzero
elements in S is again an element in S. Thus, S is a subring of L, containing the identity and closed
under taking inverses. Thus, S is a subfield of L. Since K1 K2 is the smallest subfield of L containing
both K1 and K2 , then K1 K2 ⊆ S. Consequently, S = K1 K2 .
Proposition 7.2.23
Let L be an extension of a field F and let K1 and K2 be two subfields of L that are algebraic
over F . Then K1 K2 is another algebraic extension of F .
Proof. Let γ ∈ K1 K2 . Then, by Proposition 7.2.22, γ is equal to an expression of the form (7.2).
However, by a repeated application of Corollary 7.2.17, since αi , βi , aj , bj with i = 1, 2, . . . , m and
j = 1, 2, . . . , n are algebraic, then γ is also algebraic. Thus, K1 K2 is an algebraic extension of F .
We summarize the results of this section into a concise theorem.
Theorem 7.2.24
Let L be an extension of a field F . The relation “is an algebraic extension of” on the set of
algebraic extensions of F in L is a partial order. Furthermore, this partial order is a lattice
such that for any two algebraic extensions K1 and K2 of F in L, the least upper bound is
K1 K2 and the greatest lower bound is K1 ∩ K2 .
For any two subfields K1 and K2 of L that are algebraic extensions of F , the Hasse diagram
of the lattice Alg(L/F ) includes the following subdiagram illustrating the least upper bound K1 K2
and the greatest lower bound K1 ∩ K2 .
7.2. ALGEBRAIC EXTENSIONS 337
K1 K2
K1 K2
K1 ∩ K2
p A
α− > n.
q q
Proof. Let mα (x) be the minimal polynomial of α over Q. Let c be the least common multiple of the
denominators of the coefficients of mα (x) and set f (x) = cmα (x). Then f (x) ∈ Z[x] is a polynomial
of degree n, with α as a root, with integer coefficients, and such that the greatest common divisor
of the coefficients is 1.
Let δ be any positive real number less than the distance between α and any other root, namely
0 < δ < min(|α − α1 |, |α − α2 |, . . . , |α − αk |),
where α1 , α2 , . . . , αk are the roots of f (x) that are different from α. Let M be the maximal value of
|f 0 (x)| over the interval [α − δ, α + δ] and let A be a real number with 0 < A < min(δ, 1/M ).
Let p/q be an arbitrary rational number. We consider two cases.
A
Case 1. Suppose p/q ∈
/ [α − δ, α + δ]. Then |α − p/q| > δ > A > qn .
Case 2. Now suppose that p/q ∈ [α − δ, α + δ]. By the mean value theorem, there exists an
c ∈ [p/q, α], such that
f (α) − f (p/q) f (p/q)
f 0 (c) = =− ,
α − p/q α − p/q
338 CHAPTER 7. FIELD EXTENSIONS
p |f (p/q)|
α− ≥ > A|f (p/q)|.
q M
Liouville’s Theorem offers a strategy to prove that some numbers are transcendental by finding
an irrational number α that violates the conclusion of the theorem for all positive n. The following
proposition constructs a specific family of transcendental numbers using this strategy.
Corollary 7.2.26
Let b be a positive integer greater than 2 and let {ak }k≥1 be a sequence whose values are
in {0, 1, 2, . . . , b − 1}. Then the series
∞
X ak
bk!
k=1
p(x) = an xn + · · · + a1 x + a0 ∈ Q[x]
7.3
Solving Cubic and Quartic Equations
As early as middle school and certainly in high school, students encounter the quadratic formula.
The solutions to the generic quadratic equation ax2 + bx + c = 0 with a 6= 0, are
√
−b ± b2 − 4ac
x= .
2a
The original interest in solving quadratic equations came from applications to geometry. There is
historical evidence that as early as 400 B.C.E. Babylonian scholars knew the strategy of completing
the square to solve a quadratic equation. Solutions to the quadratic equation appeared in a variety
of forms throughout history.
Subsequent generations of scholars attempted to find formulas for the roots of equations of higher
degree. One can approach the problem of finding solutions in a variety of ways: radical expressions,
trigonometric sums, hypergeometric functions, continued fractions, etc. However, historically, by a
formula for the roots of a polynomial equation, people understood an expression in terms of radicals of
algebraic combinations of the coefficients of the generic polynomial. For centuries, mathematicians
only made progress on particular cases. Then, in 1545, Cardano published formula solutions for
both the cubic and the quartic equation in Ars Magna. Though Cardano often receives the credit,
Tartaglia (a colleague) and Ferrari (a student) contributed significantly.
We propose to look at some formulas for the solutions to the cubic and the quartic equation and
discuss the merits of the approach.
Throughout this section, we assume that the polynomials are in R[x] but the strategies can be
generalized to C[x].
x3 + ax2 + bx + c = 0. (7.4)
As a first step, we change the variables by setting x = y − a3 . This shift of variables has a similar
effect as completing the square in solving the quadratic formula. We have
a 3 a 2 3a2 a3 2a2 a3
y− +a y− = y 3 − ay 2 + 2 + + ay 2 − y+ .
3 3 3 27 3 9
This change of variables leads to an equation in y equivalent to the original equation but that does
not involve a quadratic term. We get
y 3 + py + q = 0 (7.5)
7.3. SOLVING CUBIC AND QUARTIC EQUATIONS 341
where
a2 2a3 − 9ab
p=b− and q =c+ .
3 27
Cardano’s strategy introduces two variables u and v, related to each other by
(
u+v =y
3uv + p = 0.
p
In other words, u and v are the two roots to the quadratic equation t2 − yt − 3 = 0. Plugging
y = u + v into (7.5) gives
u3 + 3uv(u + v) + v 3 + p(u + v) + q = 0
⇐⇒ u3 + v 3 + (3uv + p)(u + v) + q = 0
⇐⇒ u3 + v 3 + q = 0.
p3
u6 + qu3 − = 0.
27
This becomes a quadratic equation in u3 with the two solutions of
r
3 q q2 p3
u =− ± + . (7.6)
2 4 27
By (7.3), the possible values of u are
s r
i 3 q q2 p3
u=ω − ± + for i = 0, 1, 2. (7.7)
2 4 27
from which we see that u3 and v 3 are the two distinct roots of (7.6). We give u the + and v the −
sign. However, the identity 3uv = −p leads to precisely three combinations of the possible powers
on ω. The three roots of the cubic equation (7.5) are
s r s r
3 q q2 p3 3 q q2 p3
y1 = u0 + v0 = − + + + − − + ,
2 4 27 2 4 27
√ ! √ !
s r s r
−1 + i 3 3 q q 2 p3 −1 − i 3 3 q q2 p3
2
y2 = ωu0 + ω v0 = − + + + − − + ,
2 2 4 27 2 2 4 27
√ ! √ !
s r s r
2 −1 − i 3 3 q q2 p3 −1 + i 3 3 q q2 p3
y3 = ω u0 + ωv0 = − + + + − − + .
2 2 4 27 2 2 4 27
The three roots of the original cubic equation (7.4) are given by xi = yi − a3 .
The square root that appears in formula (7.7) indicates that there may be a bifurcation in
behavior for the solutions for whether the expression under the square root is positive or negative.
Indeed, the expression under the square root plays a similar role for the cubic equation as b2 − 4ac
plays in the quadratic formula.
342 CHAPTER 7. FIELD EXTENSIONS
Definition 7.3.1
When a cubic equation is written in the form y 3 + py + q = 0, the expression
∆ = −27q 2 − 4p3
The reader might wonder why we define the discriminant as above rather than the quantity
q 2 /4 + p3 /27, which arose naturally from Cardano’s method. The concept of discriminant has a
more general definition (see Definition 11.5.12) so we have stated the definition of the discriminant
of a cubic to conform to the more general definition.
Theorem 7.3.2
Consider the cubic equation y 3 + py + q = 0 with p, q ∈ R. Then
• if ∆ > 0, then the cubic equation has 3 real roots;
Theorem 7.3.2 assumes that p and q are real numbers. The solutions for the cubic equation also
are correct when p and q are complex numbers. In this latter case, in the calculation for u0 , any
value of the three possible values of the cube root of a complex will recover all three distinct roots.
Example 7.3.3. Consider the equation x3 − 3x − 1 = 0. Cardano’s solution for the cubic involves
r √
3 q q2 p3 1 3
u =− ± + = ± i.
2 4 27 2 2
Though Cardano did not have complex numbers at his disposal, we can do u3 = cos( π3 ) + i sin( π3 ) =
eiπ/3 . Thus, u0 = eiπ/9 and v0 = e−iπ/9 . The roots of the equation are
x1 = eiπ/9 + e−iπ/9 = 2 cos(π/9),
x2 = ei2π/3 eiπ/9 + e−i2π/3 e−iπ/9 = 2 cos(7π/9),
x3 = e−i2π/3 eiπ/9 + ei2π/3 e−iπ/9 = 2 cos(5π/9). 4
The proof of Theorem 7.3.2 indicates that Cardano’s formula is not particularly easy to deal
with. If the cubic has three real roots then ∆ > 0, so q 2 /4 + p3 /27 must be negative, which makes
r
q2 p3
+
4 27
an imaginary number. It is precisely this case in which the solution to the cubic has three real roots.
In particular, in order to find these real roots, we must pass into the complex numbers.
Example 7.3.4. Consider the cubic equation x3 − 15x − 20 = 0. We have ∆ = 2700 so the equation
should have three real roots. Also,
√ √
u0 = 3 10 + 5i and v0 = 3 10 − 5i.
Writing these complex numbers in polar form (see Appendix A.1) gives
√ √ √
10 + 5i = 125ei arctan(1/2) =⇒ 3
10 + 5i = 5ei arctan(1/2)/3 ,
√ √ √
10 − 5i = 125e−i arctan(1/2) =⇒ 3
10 − 5i = 5e−i arctan(1/2)/3 .
In particular, one of the solutions is
√ √ √
3 3 1 1
10 + 5i + 10 − 5i = 2 5 cos arctan .
3 2
In a similar manner, we can find trigonometric expressions for the other two roots. 4
Example 7.3.5. Consider the polynomial equation x3 + 6x2 + 18x + 18 = 0. Setting x = y − 2, we
get the equation
y 3 + 6y − 2 = 0.
The discriminant is
∆ = −4p3 − 27q 2 = −4 × 216 − 27 × 4 = −972,
so there will be two complex roots and one real root. We calculate
q
3
p √ √
3
q
3
p √ √
3
u0 = −q/2 + −∆/108 = 3 1 + 3 = 4, v0 = −q/2 − −∆/108 = 3 1 − 3 = − 2,
so the three roots of the original equation are
√
3
√
3
x1 = −2 + 4 − 2,
√ ! √ ! √
−1 + i 3 √ 3 −1 − i 3 √ 3 1 √ 3
√
3 3 √
3
√
3
x2 = −2 + 4− 2 = −2 + (− 4 + 2) + i ( 4 + 2),
2 2 2 2
√ ! √ ! √
−1 − i 3 √ 3 −1 + i 3 √ 3 1 √ 3
√
3 3 √
3
√
3
x3 = −2 + 4− 2 = −2 + (− 4 + 2) − i ( 4 + 2). 4
2 2 2 2
344 CHAPTER 7. FIELD EXTENSIONS
where we can assume the polynomial is monic after dividing by the leading coefficient. As with the
cubic equation, the change of variables x = y − a4 eliminates the cubic term and changes (7.8) into
y 4 + py 2 + qy + r = 0, (7.9)
for p, q, and r depending on a, b, c, and d. We propose to solve (7.9). We follow the strategy
introduced by Ferrari in which we rewrite (7.9) as
y 4 = −py 2 − qy − r (7.10)
and add an expression that simultaneously makes both sides into perfect squares. Because y 4 is
alone on one side, we are limited to what we can add to create a perfect square. We choose to add
the quantity
t2
ty 2 + (7.11)
4
so that 2
t2
4 2 2 t
y + ty + = y + .
4 2
The trick to this method is to choose a value of t that makes the right-hand side into a perfect
square as well. Adding (7.11) on the right-hand side of (7.10) gives
2
t
(t − p)y 2 − qy + −r . (7.12)
4
Now a quadratic expression Ax2 +Bx+C is the square of a linear expression if and only if B 2 −4AC =
0. Hence, for (7.12) to be a perfect square, t must satisfy
2
t
q 2 − 4(t − p) − r = 0 ⇐⇒ t3 − pt2 − 4rt + (4rp − q 2 ) = 0.
4
This is called the resolvent equation for the quartic equation (7.9). We can solve for t using the
solution method for the cubic, and in fact any of the three solutions work for the rest of the algorithm
to finish solving the quartic. So when t solves the resolvent equation, (7.10) becomes
2
t
y2 + = (my + n)2
2
y 4 = −y 2 − 6y − 1
7.3. SOLVING CUBIC AND QUARTIC EQUATIONS 345
to get √ √
y 4 + 4y 2 + 4 = 3y 2 − 6y + 3 =⇒ (y 2 + 2)2 = ( 3y − 3)2 .
Hence,
√ √ √ √ √ √
(y 2 + 2)2 − ( 3y − 3)2 = 0 =⇒ (y 2 + 3y + 2 − 3)(y 2 − 3y + 2 + 3) = 0.
Now applying the quadratic formula to two separate quadratic polynomials we get the four roots of
√ √ √ √ √ √
q
1 1
q
y 2 + 3y + 2 − 3 = 0 =⇒ y = − 3 ± 3 − 4(2 − 3) = − 3± 4 3−5 ,
2 2
√ √ √ √ 1 √ √
q
1
q
y 2 − 3y + 2 + 3 = 0 =⇒ y = 3 ± 3 − 4(2 + 3) = 3 ± −4 3 − 5 .
2 2
The first two roots are real and the last two roots are complex. 4
For Exercises 7.3.1 to 7.3.11 solve the equation using Cardano-Ferrari methods. For the solutions to a cubic
equation, if all the roots are real, then write the solutions without reference to complex numbers.
1. x3 − 15x + 10 = 0
2. x3 + 6x − 2 = 0
3. x3 − 9x + 10 = 0
4. x3 − 6x + 4 = 0
5. x3 − 12x + 8 = 0
6. x3 − 12x + 16 = 0
7. x3 + 3x2 + 12x + 4 = 0
8. x3 − 9x2 + 24x − 16 = 0
9. x4 + 4x2 + 12x + 7 = 0
10. x4 + 4x2 − 3x + 1 = 0
11. x4 − 4x3 + 4x2 − 8x + 4 = 0
12. Consider the polynomial p(x) = x3 − 6x2 + 11x − 6.
Apply Cardano’s method to solve this equation. After finding the roots, explain why it was so easy
to solve.
14. Prove that a palindromic polynomial of odd degree has −1 as a root. Use this and Exercise 7.2.9 to
find all the roots to x5 + 2x4 + 3x3 + 3x2 + 2x + 1 = 0.
15. Consider the polynomial p(x) = x6 + 4x4 + 4x2 + 1. Use either the strategy provided by Exercise 7.2.9
to find the roots or use Cardano’s method to solve the equation in x2 to find all the roots. Which do
you think is easier?
346 CHAPTER 7. FIELD EXTENSIONS
7.4
Constructible Numbers
7.4.1 – Euclidean Geometry
In classical geometry, one of the most common types of exercises requests the student to construct
certain geometrical figures. Practical problems in surveying and architecture were likely the original
purpose for geometric constructions. All ancient civilizations possessed some geometric knowledge
but each expressed their scholarship in different ways.
Such construction problems ask for a method to create a specific configuration (circle, triangle,
line segment, point, etc.), with a specified property, and using specified tools (compass, straightedge,
ruler, and so on) at one’s disposal. The mathematics in some cultures would outline a recipe involving
specific numbers and then conclude, “and by this procedure, we have constructed a such and such
with such and such properties.” The numbers used needed to be generic enough that the validity of
the construction did not rely on any particular properties of those numbers.
Greek mathematics, exemplified by Euclid’s Elements, overlaid the practical geometric problems
with a philosophical approach. Instead of providing a recipe for a geometric construction (and merely
claim that it works), they defined their terms and common notions, and then, starting from a small
list of five postulates, proved propositions about geometric objects using logic. Some geometry
propositions in Euclid’s Elements establish certain measure relationships while others state “it is
possible to construct...” a specific configuration. For example, Proposition 12 in Book I states that
it is possible to draw a straight line perpendicular to a given infinite straight line L through a given
point not on L. The proof provides the construction using a compass and a straightedge and also
establishes through logic that the construction produces the described configuration.
It is particularly interesting that propositions in the Elements never refer to specific distance
values or angle values. (The Elements do refer to right angles and rational multiples thereof but no
angle is ever measured in degrees, radians, or any other unit.) Effectively, the propositions are true
regardless of any units used. Perhaps because of this feature, the geometric constructions assumed
the use of a straightedge, a ruler without distance markings.
Solutions to many such construction problems became jewels in the crown of Greek mathematics
and served as examples for the purity of proofs for many generations of mathematics education. The
ability to construct a circle inscribed in a triangle or the problem of constructing a regular pentagon
are interesting, though still elementary, examples of these achievements.
A few problems stymied mathematicians for centuries and even millennia. For example, Propo-
sition 9 in Book I of the Elements gives a construction of how to bisect a given angle α. More
specifically, given two lines that meet at a point P and span an angle α between then, construct a
line through P that makes equal angles with the other two lines. However, the problem of trisecting
the angle (constructing a line that cuts an angle by a third) remained an open problem for many
centuries. A few other problems that remained open for just as long included: constructing a reg-
ular heptagon; (“Squaring the circle”) construct a square with the same area as a given circle; and
(“Doubling the cube”) given a line segment a, construct a line segment b such that the cube with
side b has twice the volume as the cube with side a.
To the surprise of many, a large number of these open problems in geometry were either resolved
or proved impossible using field theory.
Definition 7.4.1
The set of constructible numbers is the set of real numbers a ∈ R such that given a segment
OR, it is possible to construct with a straightedge and compass a segment OA such that
as distances OA = |a| · OR.
Proposition 7.4.2
Let a and b be any nonnegative constructible numbers. Then
a √
a ± b, ab, (if b 6= 0), a
b
are also constructible.
Proof. Fix two points O and R in the plane and let L be the line through O and R. Let A and B
be two points on the line L with lengths OA = aOR and OB = bOR and such that A and B are
on the same side of L from O as R is. Construct the circle Γ of center O and radius OA. The circle
Γ intersects L in two points, one of which is A. Call A0 the other point. Since O is between A0 and
B, then for distances
A0 B = A0 O + OB = (a + b)OR.
Hence, a + b ∈ C. Suppose without loss of generality that A is between O and B. Then
b = OB = OA + AB =⇒ AB = (b − a)OR.
Hence, b − a ∈ C.
L
A0 O R A B
Next, we prove that if a, b ∈ C, then ab ∈ C. Let A and B be points on the line L with lengths
OA = aOR and OB = bOR such that A and B are on the same side of L from O as R. Construct
(via Proposition 11 of Book I in Elements) the line L0 that is perpendicular to L and that goes
through O. Construct the circle Γ of center O and radius OR. It intersects L0 in two points. Pick
one of these intersections and call it R0 . Construct also the circle Γ0 of center O and radius OB. It
intersects L0 in two points. Call B 0 the point that is on the same side of L as R0 .
Construct the line L2 through R0 and A. Construct (via Proposition 31 of Book I in Elements)
the line L3 parallel to L2 going through B 0 . Since L2 intersects L (in A) and L3 is parallel to L2 ,
then L3 intersects L in a point we call C.
348 CHAPTER 7. FIELD EXTENSIONS
L0
B0
b
R0
1
L C
O R A B
a L2 L3
ab
By Thales’ Theorem,
OC OB 0 OC bOR
= =⇒ = =⇒ OC = ab · OR.
OA OR0 aOR OR
Hence, ab ∈ C.
The proof that C is closed under division is similar. We suppose that we have already constructed
the points O, R, R0 , A, B and B 0 , and the lines L and L0 . Now construct the line L2 through A and
B 0 . Construct also (via Proposition 31 of Book I in Elements) the line L3 parallel to L2 going
through R0 . L3 intersects L in a point that we call C.
L0
B0
b
R0
1
L
O C R A B
a
a/b
By Thales’ Theorem,
OA OB 0 a · OR · OR a
= 0
=⇒ OC = = · OR.
OC OR b · OR b
Hence, a/b is a constructible number.
√ ←→
Finally, we prove that a ∈ C for all positive a ∈ C. Construct a point A on the line L = OR
such that AO = a · OR and O is between A and R. In particular, AR = (a + 1) · OR. Construct
(via Proposition 10 in Book I of the Elements) the midpoint M of the segment AR. Construct the
circle Γ of center M and radius M A. Construct (via Proposition 11 in Book I of the Elements) the
line L0 that is perpendicular to L and passes through R. The√line L0 intersects Γ in two points. Call
one of them P . We claim that the distance M P is equal to a · OR.
7.4. CONSTRUCTIBLE NUMBERS 349
L0
P
√
a
L
M
A O R
a 1
a+1
The radius M P is M P = M A = 2 · OR and we also have
a−1
a+1
· OR − OR = · OR.
2 2
Example 7.4.3. The angle α = 2π 5 satisfies cos 3α = cos(2π − 3α) = cos 2α. Using addition
formulas, we find that cos 3α = 4 cos3 α − 3 cos α and cos 2α = 2 cos2 α − 1. Thus, cos α solves the
equation
4x3 − 2x2 − 3x + 1 = 0 ⇐⇒ (x − 1)(4x2 + 2x − 1) = 0.
Since cos(α) 6= 1, then cos(α) must be a root of 4x2 + 2x − 1. Using the quadratic formula and
reasoning that cos(2π/5) > 0, we deduce that
√
2π −1 + 5
cos = .
5 4
Having a value for cos(2π/5) suggests the following construction of the regular pentagon.
• Construct the line L0 perpendicular to L at O and let R0 and Q0 be the intersection points of
L0 with Γ1 .
350 CHAPTER 7. FIELD EXTENSIONS
R0
Γ1
L
Q O R
Q0
L0
Γ2
R0
OM = 12 OR
Γ1
√
s
2
1 2
5
L MQ = + 1 OR =
OR
Q M O D R 2 2
√ !
5 1
OD = − OR
2 2
Q0
L0
L00
Γ2
R0
A1
Γ1
√
5−1
OE 2π
L = = cos
Q M O E D R OA1 4 5
Q0
L0
7.4. CONSTRUCTIBLE NUMBERS 351
√
5−1
OE 2π 2π
• Since = = cos , then ∠EOA1 = 5 . Consequently, the segment RA1 is
OA1 4 5
one edge of a regular pentagon.
• Construct a circle of center A1 and radius A1 R. This circle intersects Γ in two points: R and
another point we call A2 . Note that A1 A2 = A1 R.
• Construct a circle of center A2 and radius A2 A1 . This circle intersects Γ in two points: A1
and another point we call A3 . Note that A2 A3 = A2 A1 .
• Construct a circle of center A3 and radius A3 A2 . This circle intersects Γ in two points: A2
and another point we call A4 . Note that A3 A4 = A3 A2 .
L00
A1
A2
Γ1
√
5−1
OE 2π
L = = cos
O E R OA1 4 5
A3
A4
L0 4
Proposition 7.4.2 is already quite interesting. Since 1 ∈ C, then using addition, subtraction and
division repeatedly, we deduce that Q ⊆ C. Since C is closed under taking square roots of positive
numbers, we also can construct numbers like
r
√ √ √
q q
1
3, 10 − 2 3, or 2 + 3 + 5.
7
0
√ the set C defined
Consider recursively by 1 ∈ C 0 and for all positive a, b ∈ C 0 , then a ± b, ab, a/b,
and a are also in C . Proposition 7.4.2 establishes that C 0 ⊆ C.
0
However, the proposition falls short of the whole story. It does not yet determine whether C 0 ⊆ C
is a strict subset inclusion or whether C 0 = C. In other words, can Euclidean constructions lead to
other constructible numbers than those in C 0 ? A further application of algebra answers this question.
Proposition 7.4.4
Let P be a point with coordinates (x0 , y0 ). The segment OP is constructible if and only if
the numbers x0 and y0 are constructible.
352 CHAPTER 7. FIELD EXTENSIONS
Proof. We assume that O is the intersection of the x-axis and the y-axis and suppose that R is on
the x-axis.
Suppose that the point P is constructible, in the sense that the segment OP is constructible.
Construct (via Proposition 11 in Book I of the Elements) the line L1 perpendicular to the x-axis
through the point P . The line L1 intersects the x-axis in a point P1 with coordinates (x0 , 0). Since
the segment OP1 is constructible and OP1 = x0 · OR, then x0 ∈ C. Following a similar construction
for the projection of P onto the y-axis, we deduce that x0 and y0 are constructible numbers.
Conversely, since x0 and y0 are constructible, we can construct P1 on the x-axis and P2 on the
y-axis such that OP1 = x0 · OR and OP2 = y0 · OR. Construct the line L1 perpendicular to the
x-axis that goes through the point P1 and construct also the line L2 perpendicular to the y-axis that
goes through the point P2 . Call P the intersection of L1 and L2 . We have given a construction for
the segment OP . Furthermore, since OP1 P P2 is a rectangle, the coordinates of P are (x0 , y0 ).
When tracing a line with a straightedge, we always draw a line that passes through two already
given points. The equation for a line through (x1 , y1 ) and (x2 , y2 ) is
y2 − y1
y = y1 + (x − x1 ) ⇐⇒ (x2 − x1 )(y − y1 ) = (y2 − y1 )(x − x1 ).
x2 − x2
When using a compass, we trace out a circle with center A with radius AB where A and B are
points already obtained in the construction or specified in the hypotheses. The equation for a circle
of center (x0 , y0 ) and radius of length r is
(x − x0 )2 + (y − y0 )2 = r2 .
Beyond these tracing operations, we also will consider the intersection points between: (1) two lines;
(2) two circles; (3) a line and a circle.
Let P1 , P2 , P3 , and P4 be four points obtain by some compass and straightedge construction
from the initial segment OR. Suppose that the coordinates of Pi are (xi , yi ). Now let L1 be the line
through P1 and P2 and let L2 be the line through P3 and P4 . Assuming that L1 and L2 are not
parallel, the point Q of intersection between L1 and L2 satisfies the system of linear equations
(
(y2 − y1 )x − (x2 − x1 )y = x1 (y2 − y1 ) − y1 (x2 − x1 )
(y4 − y3 )x − (x4 − x3 )y = x3 (y4 − y3 ) − y3 (x4 − x3 ).
Via Cramer’s rule, if xi , yi ∈ F for i = 1, 2, 3, 4, where F is some field extension of Q, then the point
Q has coordinates in F .
Consider now the intersection of two circles. Let P1 and P2 be two distinct points obtained
by some compass and straightedge construction from the initial segment OR. Suppose that the
coordinates of Pi are (xi , yi ) where xi , yi are in some field extension F of Q. Suppose also that r1
and r2 are two radii that are constructible numbers. The intersection points of the circle Γ1 with
center P1 and radius r1 and the circle Γ2 with center P2 and radius r2 satisfy the system of equations
(
(x − x1 )2 + (y − y1 )2 = r12
(x − x2 )2 + (y − y2 )2 = r22 .
(x2 − 2xx1 + x21 ) + (y 2 − 2y1 y + y12 ) − (x2 − 2xx2 + x22 ) − (y 2 − 2y2 y + y22 ) = r12 − r22
⇐⇒ 2(x2 − x1 )x + 2(y2 − y1 )y = r12 − r22 + x22 + y22 − x21 − y12 .
solutions as when the two circles do not intersect. Otherwise, if the circles do intersect, both the
points of intersection have coordinates in a field extension K of F with [K : F ] = 1 or 2.
Now consider the intersection of a circle and a line. Let Γ be a circle with center (x0 , y0 ) and
radius r and let (x1 , y1 ) and (x2 , y2 ) be two distinct points. Suppose that there is a field extension F
of Q such that xi , yi ∈ F and r ∈ F . The points of intersection of Γ and the line L through (x1 , y1 )
and (x2 , y2 ) satisfies the system of equations
(
(x − x1 )2 + (y − y1 )2 = r2
(y2 − y1 )x − (x2 − x1 )y = x1 (y2 − y1 ) − y1 (x2 − x1 ).
As in the previous case, x1 6= x2 , in which case we can solve in the equation for the line, for y in
terms of x, or y1 6= y2 , in which case we can solve in the equation for the line, for x in terms of
y. The the equation for the circle leads to a quadratic equation, either in y or in x. If there is no
solution in real numbers to the equation, then we interpret this case to mean that Γ and L do not
intersect. If the equation has solutions, then these solutions are in a field extension K of F with
[K : F ] = 1 or 2.
This discussion leads to the following strengthening of Proposition 7.4.2.
Theorem 7.4.5
If a real number α ∈ R is constructible then [Q(α) : Q] = 2k for some nonnegative integer k.
Furthermore, the set of constructible numbers is exactly the set C 0 of real numbers defined
0
√ as the set0 that contains 1 and for any positive elements a, b ∈ C , then a ± b, ab,
recursively
a/b and a are in C .
Proof. First, suppose that α is a constructible number. In the Euclidean construction of a segment
OP with OP = α · OR, we construct a sequence of points P1 , P2 , . . . , Pn with Pn = P and such that
every controlling parameter of any geometric object (center and radius of a circle, two points of a
line) is O, R, or one of these points. Let αi be the constructible number such that OP = αi · OP .
Then
Furthermore, from the above discussion, we know that [Q(αi ) : Q(αi−1 )] is 1 or 2. Hence, [Q(α) :
Q] = 2k for some nonnegative integer k.
Proposition 7.4.2 established that C ⊆ C 0 . However, at each stage of a Euclidean construction
since [Q(αi ) : Q(αi−1 )] is 1 or 2. If [Q(αi ) : Q(αi−1 )] = 1 then a point obtained from previous
points at the ith stage of the construction has coordinates that result from addition, subtraction,
multiplication, or division of coordinates of previous points. If [Q(αi ) : Q(αi−1 )] = 2 then αi is the
root of some quadratic polynomial with coefficients in Q(αi−1 ). In particular, αi is the sum of an
element in Q(αi−1 ) with the square root of an element in Q(αi ). Hence, we conclude that C = C 0 .
One of the profound consequences of Theorem 7.4.5 is that it gives a way to show if certain geo-
metric configurations cannot be obtained by a compass and straightedge construction. For example,
Exercise 7.4.3 guides a proof that, unlike with a regular pentagon, it is impossible to construct a
regular heptagon with a compass and a straightedge. The following three corollaries, as innocuous
as they seem, when first stated answered long-standing open problems in geometry.
Corollary 7.4.6
It is impossible to double the cube by a compass and straightedge construction.
Proof. √Let OR be one edge of√a cube C. A cube with double √the volume would have an edge of
length 3 2 · OR. However, [Q( 3 2) : Q] = 3. By Theorem
√ 7.4.5, 3 2 is not a constructible number so
3
it is impossible to construct a segment of length 2 · OR with a compass and straightedge.
354 CHAPTER 7. FIELD EXTENSIONS
Corollary 7.4.7
It is impossible to square the circle by a compass and straightedge construction.
Proof. Recall that “squaring the circle” refers to the construction of starting from a circle Γ of center
O and radius OR, to construct a square whose area is equal to that of Γ. The area of Γ is πOR2 .
The area of a square is a2 , where a is the length
√ of the side. Constructing a square as desired would
lead to constructing√a line segment of length πOR. However, by √ Lindemann’s theorem that π is
transcendental, [Q( π) : Q] is infinite. Hence, by Theorem 7.4.5, π is not a constructible number
and thus it is impossible to construct a segment of the desired length.
For the last corollary we consider, we state a generalization of Theorem 7.4.5. The proof follows
from the same procedure as that given for Theorem 7.4.5.
Theorem 7.4.8
Suppose that O, R, C1 , C2 , . . . , Cn are points given in the plane such that OCi = γi OR.
If a point A can be obtained from O, R, C1 , C2 , . . . , Cn with a compass and straightedge
construction and OA = αOR, then
[Q(α, γ1 , γ2 , . . . , γn ) : Q(γ1 , γ2 , . . . , γn )] = 2k
Corollary 7.4.9
It is impossible to trisect every angle using a compass and straightedge construction.
Proof. Let θ = ∠AOR be an angle. There is no assumption that OA is constructible from OR.
θ L
O A
Using angle addition formulas, it is easy to show that for any angle α,
3 3 θ θ
cos(3α) = 4 cos α − 3 cos α =⇒ cos θ = 4 cos − 3 cos .
3 3
So cos(θ/3) is a root of the polynomial 4x3 − 3x − cos θ whose coefficients are in the field Q(cos θ).
For an arbitrary angle θ, 4x3 − 3x − cos θ is irreducible in Q(cos θ)[x] so
By Theorem 7.4.8, it is impossible to construct a point C from O, R, and A using a compass and
straightedge such that ∠ROC = 13 ∠ROA.
7.5. CYCLOTOMIC EXTENSIONS 355
7.5
Cyclotomic Extensions
In the study of polynomial equations of higher order, there is arguably a simplest equation of a given
degree, namely
z n − 1 = 0.
The roots of this polynomial are called the nth roots of unity. Without an understanding of the
properties of the roots of this polynomial, we should not expect to have a clear understanding of
roots of polynomials of degree n. This section studies the roots of unity and extensions of Q that
involve adjoining a root of unity.
rn einθ = 1e2πki .
With the condition that r is a positive real number, r = 1 and nθ = 2πk for some k. Thus,
2πk
θ= , for k ∈ Z.
n
356 CHAPTER 7. FIELD EXTENSIONS
e6πi/10 e4πi/10
e8πi/10 e2πi/10
−1 = e10πi/10 1
e12πi/10 e18πi/10
e14πi/10 e16πi/10
The values k = 0, 1, . . . , n−1 give n distinct complex numbers. Since a nonzero polynomial of degree
n in F [x] can have at most n roots in the field F , then this gives all the roots. The nth roots of
unity are
2πk 2πk
e2πik/n = cos + i sin for k = 0, 1, 2, . . . , n − 1.
n n
We will often denote ζn = e2πi/n so that the nth roots of unity are ζnk . As in Figure 7.1, the elements
ζnk , with k = 0, 1, . . . , n − 1, form the vertices of a regular n-gon on the unit circle.
The set of nth roots of unity, denoted by µn , forms a subgroup of (C∗ , ×). Indeed, 1 ∈ µn , so µn
is nonempty and ζna (ζnb )−1 = ζna−b ∈ µn , so µn is a subgroup of (C∗ , ×) by the One-Step Subgroup
Criterion. Furthermore, µn is isomorphic to Z/nZ via a 7→ ζna .
Definition 7.5.1
A primitive nth root of unity is an nth root of unity that generates µn .
By Proposition 3.3.7, Z/nZ can be generated by a if and only if gcd(a, n) = 1. Therefore, the
primitive roots of unity are of the form ζna where 1 ≤ a ≤ n with gcd(a, n) = 1.
Any solution to the equation xn − 1 = 0 over a field F is an element of finite order in the
multiplicative group of units U (F ). Combining results from group theory with field theory gives the
following result about the group of units in any field F .
Proposition 7.5.2
Let F be a field. Any finite group Γ in U (F ) is cyclic.
Proof. Let |Γ| = n. Since Γ is a finite abelian group, the Fundamental Theorem of Finitely Generated
Abelian Groups applies. Suppose that as an invariant factors expression
Γ∼
= Zn1 × Zn2 × · · · × Zn` .
Since ni+1 | ni for 1 ≤ i ≤ ` − 1, then xn1 − 1 = 0 for all x ∈ Γ. Assume Γ is not cyclic. Then n1 < n
and all n elements of Γ solve xn1 − 1 = 0. This contradicts the fact that the number of distinct roots
of a polynomial is less than or equal to its degree. (Corollary 6.5.10.) Hence, Γ is cyclic.
Definition 7.5.3
Let p be a prime number. An integer a such that a ∈ Z/pZ generates U (Fp ) is called a
primitive root modulo p.
7.5. CYCLOTOMIC EXTENSIONS 357
Primitive roots modulo p are not unique in Fp . Since U (Fp ) is cyclic, then U (Fp ) ∼
= Zp−1 , so
there are φ(p − 1) generators. However, the existence of a primitive root modulo p is not at all
directly obvious from modular arithmetic.
Example 7.5.4. Consider the prime p = 17. Then 2 is not a primitive root modulo 17 because 2
has order 8 and thus does not generated U (17), which has order 16. On the other hand, whether by
hand or assisted by computer, we can show that 3 has order 16 in U (17). Hence, 3 is a primitive
root modulo 17. 4
Definition 7.5.5
The nth cyclotomic polynomial Φn (x) is the monic polynomial whose roots are the primitive
nth roots of unity. Namely,
Y
Φn (x) = (x − ζna ).
1≤a≤n
gcd(a,n)=1
A priori, the cyclotomic polynomials are elements of C[x] but we will soon see that the Φn (x) ∈
Z[x] and satisfy many properties.
We note right away that deg Φn (x) = φ(n) where φ is Euler’s totient function. If we write
Y Y Y Y Y
xn − 1 = (x − ζni ) = (x − ζn(n/d)i ) = (x − ζdi ),
1≤i≤n d|n 1≤i≤d d|n 1≤i≤d
gcd(i,d)=1 gcd(i,d)=1
then we deduce the following implicit formula for the cyclotomic polynomials,
Y
xn − 1 = Φd (x). (7.13)
d|n
The product identity (7.13) provides a recursive formula for Φn (x), starting with Φ1 (x) = x − 1.
For example, we have
x2 − 1
x2 − 1 = Φ1 (x)Φ2 (x) =⇒ Φ2 (x) = = x + 1.
x−1
By the same token, we have
x3 − 1
x3 − 1 = Φ1 (x)Φ3 (x) =⇒ Φ3 (x) = = x2 + x + 1.
x−1
A few other examples of the nth cyclotomic polynomials are
x4 − 1 x4 − 1
Φ4 (x) = = 2 = x2 + 1,
Φ2 (x)Φ1 (x) x −1
x5 − 1
Φ5 (x) = = x4 + x3 + x2 + x + 1,
Φ1 (x)
x6 − 1 x3 + 1
Φ6 (x) = = = x2 − x + 1,
Φ3 (x)Φ2 (x)Φ1 (x) x+1
x7 − 1
Φ7 (x) = = x6 + x5 + x4 + x3 + x2 + x + 1,
Φ1 (x)
x8 − 1 x8 − 1
Φ8 (x) = = 4 = x4 + 1.
Φ4 (x)Φ2 (x)Φ1 (x) x −1
358 CHAPTER 7. FIELD EXTENSIONS
The first few calculations hint at a number of properties that cyclotomic polynomials might
satisfy. We might hypothesize that the Φn (x) are in Z[x] and furthermore, that they only involve
coefficients of 0, 1, and −1. We might also hypothesize that the polynomials are palindromic. Also,
since each Φn (x) is obtained by dividing out of xn − 1 all terms that we know must divide xn − 1,
we might also hope that the Φn (x) are irreducible. Some of these hypotheses are true and some are
not.
Proposition 7.5.6
The cyclotomic polynomial Φn (x) is a monic polynomial in Z[x] of degree φ(n).
Proof. We have already shown that these polynomials have degree φ(n). We need to prove that
Φn (x) ∈ Z[x]. We prove this by strong induction, noticing that Φ1 (x) = x − 1 ∈ Z[x].
Suppose that Φk (x) is monic and in Z[x] for all k < n. Let f (x) be the polynomial defined by
Y
f (x) = Φd (x).
d|n
d<n
By induction, the polynomial is monic and has coefficients in Z. We know that xn − 1 = Φn (x)f (x)
and so by polynomial division in Q[x], Φn (x) is a polynomial with coefficients in Q. By Gauss’
Lemma we conclude that Φn (x) ∈ Z[x].
Proposition 7.5.7
The polynomial Φn (x) is palindromic.
Proof. If z = ζna ∈ C is a root of Φn (x), then z1 = ζn−a = ζn n − a is also a root of Φn (x), since
n − a is relatively prime to n whenever a is. Therefore, Φn (x) = xϕ(n) Φn (1/x). However, given any
polynomial p(x)Q[x], xdeg p p(1/x) is the polynomial obtained from p(x) by reversing the order of
the coefficients.
It is possible to prove that, as polynomials, xgcd(m,n) − 1 is the monic greatest common divisor
of xm − 1 and xn − 1. We say that the sequence of polynomials {xn − 1}∞ n=1 is a strong divisibility
sequence in Z[x]. In [8], the authors proved that the strong divisibility property is sufficient to define
the cyclotomic polynomials Φn (x) in Z[x] that satisfy the recursive formula (7.13), without reference
to roots of unity.
The hypothesis that all the coefficients of Φn (x) at −1, 0, or 1 is not true. The first cyclotomic
polynomial that has a coefficient different from −1, 0, or 1 is Φ105 (x). As Exercise 7.5.15 shows, it
is not a coincidence that the integer 105 happens to be the first positive integer that is the product
of three odd primes. A direct calculation gives,
Φ105 (x) = 1 + x + x2 − x5 − x6 − 2x7 − x8 − x9 + x12 + x13 + x14 + x15 + x16 + x17 − x20
− x22 − x24 − x26 − x28 + x31 + x32 + x33 + x34 + x35 + x36 − x39 − x40
− 2x41 − x42 − x43 + x46 + x47 + x48 .
Theorem 7.5.8
For all n ∈ N∗ , the cyclotomic polynomial Φn (x) is an irreducible polynomial in Z[x] of
degree φ(n).
Proof. What is left to show is that Φn (x) is irreducible. We will show that if p is any prime not
dividing n, then Φn (x) is irreducible over Fp .
7.5. CYCLOTOMIC EXTENSIONS 359
Suppose that Φn (x) factors into f (x)g(x) over Q and, without loss of generality, we suppose that
f (x) is irreducible with deg f (x) ≥ 1. Let ζ be a primitive nth root of unity. Then ζ p is also a
primitive root (since p - n).
Assume now that g(ζ p ) = 0. Then ζ is a root of g(xp ) and since f (x) is the minimal polynomial
of ζ, we have g(xp ) = f (x)h(x). Reduction modulo p into Fp we get
where the last equality holds by the Frobenius homomorphism. Therefore, f¯(x) and ḡ(x) have an
irreducible factor in common in the UFD Fp [x]. This implies that Φn (x) has a multiple root in
Fp and hence that xn − 1 has a multiple root in the finite field. We prove that this leads to a
contradiction. Recall the polynomial derivative D described in Exercise 6.5.23. If xn − 1 has a
multiple root in some field extension of Fp , then this multiple root must be a root of xn − 1 and of
the derivative D(xn −1) = nxn−1 . However, since p - n, then the only root of D(xn −1) is 0, whereas
0 is not a root of xn − 1. So by contradiction, we know that ζ p is not a root of g(x). Therefore, ζ p
must be a root of f (x).
Now let a be any integer that is relatively prime to n. Then we can write a = p1 p2 · · · pk as a
product of not necessarily distinct primes that are relatively prime to n, and hence which do not
divide n. From the above paragraph, if ζ is a root of f (x), then ζ p1 is also a root of f (x). Then
ζ p1 p2 = (ζ p1 )p2 is also a root of f (x). By induction, we deduce that ζ a is a root of f (x). This now
means that every primitive nth root is also a root of f (x) so g(x) is a unit and f (x) is an associate
of Φn (x). Hence, Φn (x) is irreducible.
Definition 7.5.9
The field extension Q(ζn ) is called the nth cyclotomic extension of Q.
Since the roots of Φn (x) involve powers of ζn , then all the roots of Φn (x) are in the cyclotomic
extension Q(ζn ). Theorem 7.5.8 immediately gives the following result.
Corollary 7.5.10
For any integer n ≥ 2, the degree of the cyclotomic field over Q is
[Q(ζn ) : Q] = φ(n).
Example 7.5.11. In Exercise 2.1.16, we proved that if 2n −1 is prime then n is prime. The converse
implication is not necessarily true. With cyclotomic polynomials at our disposal, we see that
Y Y Y
2n − 1 = Φd (2) = (2 − 1) Φd (2) = Φd (2).
d|n d|n d|n
d>1 d>1
Therefore, if n is not prime, then 2n − 1 is certainly not prime because it is divisible by Φd (2) for
d > 1 and d|n. 4
Proof. The proof involves manipulations of double products. We begin with the product on the
right of the Möbius inversion formula,
µ(d)
Y Y Y Y Y
(an/d )µ(d) = be = (be )µ(d) .
d|n d|n e|(n/d) d|n e|(n/d)
Note that the pairs (d, e) with d | n and e | (n/d) are the same as those with e | n and d | (n/e).
Hence,
Y µ(d) Y Y Y P
an/d = (be )µ(d) = (be ) d|(n/e) µ(d) . (7.15)
d|n e|n d|(n/e) e|n
P
Now if k = 1, then d|k µ(d) = µ(1) = 1. On the other hand, suppose k is any integer with
P
k > 1. Then the nonzero terms in d|k µ(d) correspond to products of distinct prime divisors of
P
k. Suppose that p1 , p2 , . . . , pr are the distinct prime divisors of k. In the sum d|k µ(d), we group
together terms arising from products of i distinct prime divisors of k. Then
r
X X r
µ(d) = (−1)i = (1 − 1)r = 0.
i=0
i
d|k
Though the Möbius inversion formula has many applications in number theory, we presented it
here to provide an alternative nonrecursive formula for the cyclotomic polynomials.
Proposition 7.5.13
For all positive integers n, Y
Φn (x) = (xn/d − 1)µ(d) .
d|n
Maple Function
primroot(p); Returns the least positive primitive root modulo p.
primroot(n,p); Returns the least positive primitive root a modulo p with n < a < p.
cyclotomic(n,x); Returns the nth cyclotomic polynomial.
mobius(n); Returns the Möbius function of n.
gcd(xm − 1, xn − 1) = xgcd(n,m) − 1,
where in the first greatest common divisor we take the monic polynomial.
11. Suppose that p | n. Prove that Φpn (x) = Φn (xp ).
α
12. Suppose n = pα 1 α2
1 p2 · · · p` , where pi are distinct primes. Prove that
`
α1 −1 α2 −1 α −1
···p` `
Φn (x) = Φp1 p2 ···p` (xp1 p2
).
Exercises 7.5.20 through 7.5.25 deal with dynatomic polynomials defined as follows. Let F be a field and let
P (x) ∈ F [x]. Consider sequences in F that satisfy the recurrence relation xn+1 = P (xn ). A fixed point is an
element c in F or an extension of F that satisfies P (x) − x = 0. A 2-cycle is such a sequence that satisfies
x2 = x0 . An element on a 2-cycle satisfies the equation P (P (x)) − x = 0. However, fixed points also solve
the equation P (P (x)) − x = 0. Consequently, elements that are on a 2-cycle but are not fixed points are
solutions to the equation
P (P (x)) − x
.
P (x) − x
An n-cycle recurrence sequence as defined above such that xn = x0 . Points on n-cycles satisfy P n (x)−x = 0,
where by P n (x) we mean P (x) iterated n times. For example, P 3 (x) = P (P (P (x))). For any d that divides
n, all the points on a d-cycle also satisfy P n (x) − x = 0. Similar to cyclotomic polynomials, we define the
nth dynatomic polynomial of P (x) recursively by ΦP,1 (x) = P (x) − x and
Y
P n (x) − x = ΦP,d (x).
d|n
An n-cycle that is not also a d-cycle for any d that divides n is called a primitive n-cycle. Points on a
primitive n-cycle must be roots of ΦP,n (x). It is possible, though not easy to prove that ΦP,n (x) ∈ F [x] for
all P (x) ∈ F [x] [8].
20. Prove that polynomial iteration satisfies (a) P n (P m (x)) = P m+n (x); (b) (P n )m (x) = P mn (x); and
(c) deg P n (x) = kn where deg P (x) = k.
21. Suppose that deg P (x) = m > 1. Prove that
X
deg ΦP,n (x) = µ(d)mn/d .
d|n
1
µ(d)mn/d primitive n-cycles.
P
Deduce that a sequence satisfying xn+1 = P (xn ) can have at most n d|n
22. Let Q(x) = x2 − 2. Calculate ΦQ,2 (x), ΦQ,3 (x), and ΦQ,4 (x).
23. Let P (x) = x3 − 2. Calculate ΦP,2 (x), ΦP,3 (x), and ΦP,4 (x).
24. Let P (x) = x2 − 2x + 2. Calculate ΦP,2 (x), ΦP,3 (x), and ΦP,4 (x).
25. Let Q(x) = x2 − 45 . Prove by direct calculation that for this particular polynomial, ΦQ,2 (x) divides
ΦQ,4 (x).
7.6
Splitting Fields and Algebraic Closure
7.6.1 – Splitting Fields
Let K be a field extension of F . Proposition 7.2.2, one of the key propositions of the section, estab-
lished that for any element α ∈ K that is an algebraic element over F , there exists a unique monic
polynomial mα,F (x) of minimal degree in F [x] such that α is a root of mα,F (x). The motivating
observation of this section is that though F (α) contains the root α, it does not necessarily contain
all the roots of mα,F (x).
√
Example 7.6.1. Let F = Q and consider α = 3 7 ∈ R. The minimal polynomial for α is f (x) =
x3 − 7. However, the three roots of this polynomial are
√ √ −1 + i√3 √ −1 − i√3
3 3 3
7, 7 , 7 .
2 2
√
Obviously, Q( 3 7) is a subfield of R but the other roots of the minimal polynomial are not in R. 4
7.6. SPLITTING FIELDS AND ALGEBRAIC CLOSURE 363
Definition 7.6.2
A field extension K of F is called a splitting field for the polynomial f (x) ∈ F [x] if f (x)
factors completely into linear factors in K[x] but f (x) does not factor completely into linear
factors in F 0 [x] where F 0 is any field with F ( F 0 ( K.
If K is an extension of F , we will also use the terminology that f (x) ∈ F [x] splits completely in
K to mean that f (x) factors into linear factors in K[x]. However, this does not mean that K is a
splitting field of f (x) but simply that K contains a splitting field of f (x).
Theorem 7.6.3
For any field F , if f (x) ∈ F [x], there exists an extension K of F that is a splitting field for
f (x). Furthermore, [K : F ] ≤ n! where n = deg f (x).
Proof. We proceed by induction on the degree of f . If deg f = 1, then F contains the root of f (x)
so F itself is a splitting field for f (x) and the degree of F over itself is 1.
Suppose that the theorem is true for all polynomials of degree less than or equal to n. Let f (x)
be a polynomial of degree n + 1. If f (x) is reducible, then f (x) = a(x)b(x) where deg a(x) = k with
1 ≤ k ≤ n. By induction, both a(x) and b(x) have splitting fields, E1 and E2 . Then the composite
of these two fields, E1 E2 , is a splitting field for f (x). Furthermore, by the induction hypothesis and
Proposition 7.2.21, [E1 E2 : F ] is less than or equal to [E1 : F ][E2 : F ] = k!(n + 1 − k)! ≤ (n + 1)!.
Suppose now that f (x) is irreducible. Then F 0 = F [x]/(f (x)) is a field extension of F in which
the element x̄ is a root of f (x). Note that [F 0 : F ] = (n + 1). In F 0 [t], (t − x̄) is a linear factor of
f (t). One obtains the factorization as follows. If
n+1
X
f (x) = ai xi ,
i=0
Therefore, in F 0 [t], f (t) factors into f (t) = (t − x̄)q(t) where q(t) has degree n. By the induction
hypothesis, q(t) has a splitting field K over F 0 such that [K : F 0 ] ≤ n!. Therefore, f (x) splits
completely in K. Also,
By induction, the theorem holds for all fields and for all polynomials f (x).
We would like to shift the notion of a splitting field of a polynomial f (x) over F into a property
of a field extension, without necessarily referring to a specific polynomial f (x).
Definition 7.6.4
A normal extension is an algebraic extension K of F that is the splitting field for some
collection (not necessarily finite) of polynomials fi (x) ∈ F [x].
√
1+ 5
Example 7.6.5. The splitting field of f (x) = x2 − x − 1 over Q is Q 2 . The degree of the
extension is 2. 4
364 CHAPTER 7. FIELD EXTENSIONS
In fact, a splitting field of any quadratic polynomial f (x) ∈ F [x] is F (α) where α is one of the
roots.
Example 7.6.6. Consider the polynomial f (x) = x3 − 7. Example 7.6.1 lists the three roots of
f (x) in C.
√ A splitting field √K for f (x) must contain all three of the roots. In particular,
√ K must
contain 3 7 and ζ3 = (−1 + −3)/2. This is sufficient so a splitting field is K = Q( 3 7, ζ3 ). Recall
that ζ3 is algebraic with degree 2 and with√minimal polynomial x2 + x + 1. Furthermore, √ ζ3 is not a
real number so it is not an element of Q( 3 7) which is a subfield of R. Hence, [K : Q( 3 7)] = 2 and
so the degree of the extension is
√3
√3
√
3
[K : Q] = [Q( 7, ζ) : Q] = [K : Q( 7)][Q( 7) : Q] = 2 · 3 = 6. 4
√
Example√7.6.7. We point out that the extension Q( 3 7) is not a normal extension of Q because it
contains 3 7 but not the other two roots of the minimal polynomial x3 − 7. 4
Example 7.6.8. Consider the cubic polynomial p(x) = x3 +6x2 +18x+18 ∈ Q[x] in√Example √ 7.3.5.
We prove that the splitting field K of p(x) has degree [K : Q] = 6. Let x1 = 2 + 3 4 − 3 2. Note
that x1 has degree 3 over Q since the polynomial is irreducible (by Eisenstein’s Criterion modulo
2). However, x1 ∈ R, whereas x2 and x3 have imaginary components. Thus, x2 , x3 ∈ / Q(x1 ) and the
splitting field of p(x) is a nontrivial extension of Q(x1 ). We can conclude from Theorem 7.6.3 that
the splitting field of p(x) has degree 6. However, we can also tell that
2 18
p(x) = (x − x1 ) x + (6 + x1 )x − .
x1
So x2 and x3 are the roots of the quadratic polynomial. 4
Example 7.6.9. Consider the polynomial g(x) = x4 +2x2 −2. We can find the roots by first solving
a quadratic polynomial for x2 . Thus,
√
−2 ± 4 + 8 √
x2 = = −1 ± 3.
2
p √ p √
Thus,
p the four roots of g(x) are ± −1 + 3 and p ± √ −1 − 3.pSince g(x) is irreducible,p we have
√ √ √
[Q( −1 + 3) : Q] = 4. It is easy to tell that −1 − 3 ∈
/ Q( −1 + 3) because Q( −1 + 3)
p √ √ p √
is a subfieldpof R whereas −1 − 3 is a complex number. Noticing first that 3 ∈ Q( −1 + 3),
√ √ p √ p √
as 3 = ( −1 + 3)2 + 1, we see that −1 − 3 is an algebraic element over Q( −1 + 3)
satisfying the polynomial equation √
x2 + 1 + 3 = 0.
p √ p √
Thus, a splitting field of g(x) over Q is K = Q( −1 + 3, −1 − 3). Furthermore,
√ √
q q
[K : Q] = K : Q −1 + 3 Q −1 + 3 : Q = 2 · 4 = 8.
This degree is a strict divisor of the upper bound 4! = 24 as permitted by Theorem 7.6.3. 4
From the examples, the splitting field of some polynomials seems a natural construction so it
may seem puzzling why we have been saying “a” splitting field. From the construction of a splitting
field as described in Theorem 7.6.3 it is not obvious that splitting fields are unique. The following
theorem establishes this important property.
Theorem 7.6.10
Let ϕ : F ∼= F 0 be an isomorphism of fields. Let f (x) ∈ F [x] and let f 0 (x) ∈ F 0 [x] be the
polynomial obtained by applying ϕ to the coefficients of f (x). Let E be a splitting field
for f (x) over F and let E 0 be a splitting field for f 0 (x) over F 0 . Then the isomorphism ϕ
extends to an isomorphism σ : E ∼ = E 0 such that σ|F = ϕ.
7.6. SPLITTING FIELDS AND ALGEBRAIC CLOSURE 365
Corollary 7.6.11
Any two splitting fields for a polynomial f (x) ∈ F [x] over a field F are isomorphic.
Example 7.6.12 (Cyclotomic Fields). Recall that cyclotomic extensions are extensions of Q
that contain the n-roots of unity, i.e., the roots of xn − 1 = 0. As in Section 7.5, we call ζn
the complex number
2π 2π
ζn = e2πi/n = cos + i sin .
n n
Then all the roots of unity are of the form ζnk for 0 ≤ k ≤ n − 1. This shows, however, that all the
roots of unity are in Q(ζn ). Consequently, Q(ζn ) is the splitting field of xn − 1 and more precisely
of the cyclotomic polynomial Φn (x). 4
In some examples and exercises, we have seen that on occasion an extension F (α, β) of F is
nonetheless a primitive extension with F (α, β) = F (γ). Splitting fields allow us to prove that this
always happens under certain circumstances.
Proof. Let K be the splitting field of mα,F (x)mβ,F (x). Let α1 , α2 , . . . , αm be the roots of mα,F (x)
in K and let β1 , β2 , . . . , βn be the roots of mβ,F (x). We assume that α = α1 and β = β1 . Every
field of characteristic 0 has an infinite number of elements. Choose some element d ∈ F such that
α − αi
d 6= for i ≥ 1 and j > 1.
βj − β
Now, since mβ,F (γ) (x) divides mβ,F (x) in F (c)[x], then mβ,F (x) splits completely in K[x]. Fur-
thermore, the only zeros of mβ,F (γ) (x) must also be zeros of mβ,F (x), namely β1 , β2 , . . . , βn . How-
ever, a zero of mβ,F (γ) (x) must also be a zero of p(x), namely some x0 such that γ − dx0 = αi or in
other words, some x0 such that
By definition of d, the only x0 that satisfies this and is a root of mβ,F (x) is x0 = β. Thus, in K[x]
and hence also in F (α, β), we deduce that β is the only root of mβ,F (γ) (x). Since it is irreducible,
mβ,F (γ) (x) = x − β. This shows that β ∈ F (γ) From this, we also deduce that α = γ − dβ ∈ F (γ).
Hence, F (α, β) ⊆ F (γ) and the theorem follows.
Definition 7.6.14
Let F be a field. A field L is called an algebraic closure of F if L is algebraic over F and
if every polynomial f (x) ∈ F [x] splits completely in L.
Definition 7.6.15
A field F is said to be algebraically closed if every polynomial f (x) ∈ F [x] has a root in F .
Notice that if F is algebraically closed, then every polynomial f (x) ∈ F [x] has a root α in F .
Consequently, in F [x], the polynomial factors f (x) = (x − α)p(x) for some p(x) ∈ F [x]. Since p(x)
and any subsequent factors must have a root, then by induction, every polynomial splits completely.
Consequently, a field F is algebraically closed if it is an algebraic closure of itself. This remark
motivates the following easy proposition.
Proposition 7.6.16
If L is an algebraic closure of F , then L is algebraically closed.
Proof. Let f (x) ∈ L[x] and let α be a root of f (x). Then α gives an algebraic extension L(α) of L.
However, since L is algebraic over F , by Theorem 7.2.24, L(α) is an algebraic extension of F . Thus,
α is algebraic over F and hence α ∈ L. This shows that L is algebraically closed.
The concept of an algebraic closure of a field is a rather technical one. Though the previous
portion of this section outlined how to construct a splitting field of a polynomial over a field, finding
a field extension in which all polynomials split completely poses a problem of construction. Indeed,
though Definition 7.6.14 defines the notion of algebraic closure, it is not at all obvious that an
algebraic closure exists for a given field F . Also, if an algebraic closure of F exists, it is not readily
apparent whether algebraic closures are unique. This section provides answers to these questions
but the proofs of some of the results depend on Zorn’s Lemma, which is equivalent to the Axiom of
Choice.
It is also not at all clear that any algebraically closed fields exist. From the properties of complex
numbers, the quadratic formula, and Cardano’s cubic and quartic formula, one may hypothesize
that the field of complex numbers is algebraically closed. Indeed, as early as the 17th century,
mathematicians, including the likes of Euler, Laplace, Lagrange, and d’Alembert, attempted to
prove this. The first rigorous proof was provided by Argand in 1806. Since then, mathematicians
7.6. SPLITTING FIELDS AND ALGEBRAIC CLOSURE 367
have discovered proofs involving techniques from disparate branches of mathematics. Because of its
importance in algebra and the difficulty of proving it, the fact that C is an algebraically closed field
became known as the Fundamental Theorem of Algebra.
The “simplest” proof of the Fundamental Theorem of Algebra uses theorems from complex
analysis that are outside the scope of this text. We will provide an algebraic proof in Section 11.5,
but it requires more theory than we have yet provided. Consequently, for the moment, we accept
this result without proof.
It should not surprise us that analysis might be required to prove that C is algebraically closed.
The construction of C depends on the construction of R and properties of real functions are precisely
the purview of analysis. For example, the concept of continuity leads to the Intermediate Value
Theorem, which can be used to show that any polynomial p(x) ∈ R[x] of odd degree has a root. It
is the notion of continuity that ensures the polynomial does not change signs without having a root.
The proof of Theorem 7.6.10 showed how to construct a splitting field K of a single polynomial
f (x) ∈ F [x] over the field F . An algebraic closure of a field F must essentially be a splitting field
for all polynomials in F [x]. It is hard to imagine what such a field would look like and how to
describe such a field. If we had only a finite number of polynomials f1 (x), f2 (x), . . . , fk (x), then the
composite of the k splitting fields, which is also the splitting field of f1 (x)f2 (x) · · · fk (x), contains
all the roots of these polynomials. However, F [x] contains an infinite number of polynomials, so we
are faced with a problem of constructibility.
To keep track of all the polynomials in F [x], Artin devised the strategy of introducing a separate
variable for each polynomial. We give his proof here below.
Theorem 7.6.18
For any field F , there exists an algebraically closed field K containing F .
Proof. Let P be the set of associate classes of irreducible elements in F [x]. Every class in P can
be represented by a unique monic nonconstant irreducible polynomial p(x). Let S be a set of
indeterminate symbols that is in bijection with P via [p] ↔ xp , where p(x) is monic. Consider the
multivariable polynomial ring F [S]. In F [S] consider the ideal
We first prove that I is a proper ideal of F [S]. Assume that I = F [S]. Then there exist monic
irreducible polynomials p1 , p2 , . . . , pn (x) and polynomials g1 , g2 , . . . , gn ∈ F [S] such that
construct the field extension Ki+1 of Ki in the same way. This iterated construction creates a chain
of nested field extensions of F ,
F = K0 ⊆ K1 ⊆ K2 ⊆ · · · ⊆ Kn ⊆ · · ·
in which every polynomial q(x) ∈ Ki [x] has a root in Ki+1 . Let K be the union of all the fields
[
K= Ki .
i≥0
q(x) = qk xk + · · · + q1 x + q0 .
It is interesting to observe that the existence of a maximal ideal M containing I follows from
Zorn’s Lemma.
The field K constructed in the above proof may seem woefully large. Indeed, the strategy of
the proof simply provides a well-defined construction of a field extension that is large enough to
be algebraically closed. However, this could be far larger than an algebraic closure. The following
proposition pares down the algebraically closed field K to an algebraic closure of F .
Proposition 7.6.19
Let L be an algebraically closed field and let F be a subfield of L. The set K of elements
in L that are algebraic over F is an algebraic closure of F .
Proof. By definition, K is algebraic over F . Furthermore, every polynomial f (x) ∈ F [x] ⊂ L[x]
splits completely over L. But each root α of f (x) is algebraic of F so is an element of K. Therefore,
all the linear factors (x − α) of the factorization of f (x) are in K[x]. Hence, f (x) splits completely
in K[x] and hence K is an algebraic closure of F .
Theorem 7.6.18 coupled with Proposition 7.6.19 establish the existence of algebraic closures for
any field F . This has not yet answered the important question of whether algebraic closures of a
field are unique (up to isomorphism). In order to establish this, we need an intermediate theorem.
Theorem 7.6.20
Let F be a field, let E be an algebraic extension of F and let f : F → L be an embedding
(injective homomorphism) of F into an algebraically closed field L. Then there exists an
embedding λ : E → L that extends f , i.e., λ|F = f .
Proof. Let S be the set of all pairs (K, σ), where K is a field with F ⊆ K ⊆ E such that σ extends
f to an embedding of K into L. We define a partial order 4 on S where (K1 , σ1 ) 4 (K2 , σ2 ) means
if K1 ⊆ K2 and σ2 extends σ1 , i.e., σ2 |K1 = σ1 . The set S is nonempty since it contains the pair
(F, f ). For any chain
{(Ki , σi )}i∈I
in the poset (S, 4) define K 0 = i∈I . Every element α0 ∈ K is in Ki for some I ∈ I. Define the
S
function σ 0 : K 0 → L by σ 0 (α) = σi if α ∈ Ki . This function is well-defined because if Ki ⊆ Kj ,
then σj |Ki = σi so σj (α) = σi (α). Therefore, the choice of index i to use for defining σ 0 is irrelevant.
The pair (K 0 , σ 0 ) is an upper bound for the described chain. Consequently, Zorn’s Lemma applies
and we conclude that S contains maximal elements.
7.6. SPLITTING FIELDS AND ALGEBRAIC CLOSURE 369
For a maximal element (K, λ) in S, the field K is a subfield of E and the function λ : K → L is
an embedding of K into L that extends f . Assume that there exists α ∈ E − K. Since E is algebraic
over F , by Corollary 7.2.3, it is algebraic over K. By Exercise 7.1.18, λ : K → L can be extended
to an embedding K(α) → L, contradicting the maximality of the pair (K, λ). Hence, E − K = ∅, so
K = E and the function λ : E → L is an extension of F → L.
Theorem 7.6.20 gives the following important Proposition.
Proposition 7.6.21
Let F be a field and let E and E 0 be two algebraic closures of F . Then E and E 0 are
isomorphic.
10. Let p(x) ∈ F [x] be a polynomial of degree n and let K be the splitting field of p(x) over F . Prove
that [K : F ] in fact divides n!.
11. Let p(x), q(x) ∈ F [x] be two polynomials with deg p(x) = m and deg q(x) = n. Notice that p(q(x))
is a polynomial of degree mn. Prove that the splitting field E of p(q(x)) has a degree that satisfies
[E : F ] ≤ m!(n!)m . Prove also that for for m, n ≥ 2, this quantity strictly divides (mn)!.
12. Let P (x) be a polynomial in F [x]. Suppose that a dynatomic polynomial ΦP,n (x) has degree k. (See
Exercises 7.5.20 through 7.5.25.) Prove that if the roots of a dynatomic polynomial are only primitive
n-cycles, then k is divisible by n and the degree of the splitting field E of ΦP,n (x) has an index [E : F ]
that is less than or equal to
k(k − n)(k − 2n) · · · (2n) · n · 1.
13. Let p(x) ∈ Q[x] be a palindromic polynomial of even degree 2n. Let K be the splitting field of p(x).
Prove that [K : Q] ≤ 2n n!. [Hint: See Exercise 7.2.9.] [Note: This degree is less than the value of
(2n)! allowed by Theorem 7.6.3.]
14. Prove that a field F is algebraically closed if and only if the only the irreducible polynomials in F [x]
are precisely the polynomials of degree 1.
15. Prove that a field F is algebraically closed if and only if it has no proper algebraic extension.
16. Let K be an algebraic extension of a field F . Prove that K = F .
17. Prove that there exists no algebraically closed field F such that Q ( F ( Q(π).
7.7
Finite Fields
Fields of characteristic 0 and fields of characteristic p have a number of qualitative differences. This
section builds on theorems of Section 7.6 to analyze finite fields. In particular, the main theorem
of this section is that finite fields of a given cardinality are unique up to isomorphism. However, in
order to establish this foundational result, we must take a detour into the concept of separability.
Definition 7.7.1
A root αi is said to have multiplicity ni if (x − αi )ni divides f (x) in K[x] but (x − αi )ni +1
does not divide f (x). If ni > 1, we will say that αi is a multiple root.
Since K[x] is a UFD, we can use the ordπ : K[x] → N function and say that α is a root of f (x) if
ord(x−α) f (x) > 0 and that the multiplicity of α is n = ord(x−α) f (x). According to the definition,
α is a multiple root whenever ord(x−α) f (x) > 1.
Definition 7.7.2
A polynomial f (x) ∈ F [x] is called separable if it has no multiple roots in its splitting field
over F .
7.7. FINITE FIELDS 371
Definition 7.7.3
An algebraic extension K/F is called separable if for all α ∈ K, the minimal polynomial
mα,F (x) is a separable polynomial. An algebraic extension that is not separable is called
inseparable.
It may at first seem difficult to imagine a field extension that is not separable. In this section,
we will show that many field extensions that we have studied so far are separable. The following
example illustrates an inseparable extension.
Exercise 6.5.23 presented the concept of a derivative of a polynomial. In essence, let p(x) ∈ F [x].
def
We define the derivative of p(x) with respect to x as the polynomial Dx (p(x)) = p0 (x), where p0 (x)
is the derivative encountered in calculus. We know that deg Dx (p(x)) < deg p(x) regardless of the
field. Furthermore, the derivative of a polynomial satisfies the addition rule and the Leibniz rule for
multiplication. The polynomial derivative is particularly useful for the following proposition.
Proposition 7.7.5
A polynomial f (x) ∈ F [x] is separable if and only if f (x) and Dx f (x) are relatively prime.
Proof. Suppose that f (x) is not separable. Then there exists a root α of f (x) such that f (x) =
(x − α)2 q(x) in the splitting field K of f (x). Then by the properties of the derivative,
We see that α is a root of Dx (f (x)), so mα,F (x) divides Dx (f (x)). Then mα,F (x) divides both f (x)
and Dx (f (x)) so these two polynomials are not relatively prime.
Conversely, suppose that f (x) and Dx (f (x)) are not relatively prime. Then there exists a monic
irreducible polynomial a(x) of degree greater than 1 that divides them both. Let α be a root of a(x)
in the splitting field K of f (x). Then f (x) = (x − α)q(x) for some polynomial q(x) ∈ K[x]. Thus,
Consequently, q(x) = (x − α)g(x) for some polynomial g(x) ∈ K[x] and we deduce that
Proposition 7.7.6
If char F = 0, then every irreducible polynomial is separable.
372 CHAPTER 7. FIELD EXTENSIONS
Proof. Let a(x) be an irreducible polynomial in F [x]. If deg a(x) = 1, then a(x) is separable trivially.
Suppose that deg a(x) ≥ 2. If LT(a(x)) = an xn , then the leading term of Dx (a(x)) is nan xn−1 .
Hence, deg Dx (a(x)) = n − 1 ≥ 1. Since a(x) is irreducible, any polynomial b(x) that divides a(x)
must either be a nonzero multiple of a(x) or a nonzero constant. If b(x) must also divide Dx (a(x)),
then deg b(x) ≤ n − 1. Hence, b(x) cannot be a nonzero constant multiple of a(x). So it must be a
nonzero constant. Thus, Dx (a(x)) and a(x) are relatively prime and so by Proposition 7.7.5, a(x)
is separable.
Corollary 7.7.7
Let F be a field of characteristic 0. Every algebraic extension of F is separable.
Proof. Let K be an algebraic extension of F and let α ∈ K. Then mα,F (x) is irreducible and by
Proposition 7.7.6, mα,F (x) is separable. Thus, K is separable.
The proof of Proposition 7.7.6 might not work on all polynomials in F [x] when F has a positive
characteristic. In characteristic 0, if deg a(x) = n ≥ 1, then we know that deg Dx (a(x)) = n − 1.
However, if char F = p and deg a(x) = pk, then the derivative of the leading term is
Hence, deg Dx (a(x)) < n−1. Furthermore, the derivative of any monomial whose power is a multiple
of p has a derivative that is identically 0. This leads to the following important point.
Proposition 7.7.8
Let F be a field of characteristic p. Suppose that a(x) is irreducible. Then a(x) is separable
if and only if one of the monomials of a(x) has a degree that is not a multiple of p. Fur-
thermore, for any irreducible polynomial, there exists an irreducible separable polynomial
b(x) and a nonnegative integer k
k
a(x) = b xp .
Proof. By Proposition 7.7.5, a(x) is not separable if and only if a(x) and Dx (a(x)) are divisible by
a factor of degree greater than 0 in F [x]. However, since a(x) is irreducible, the only divisor of a(x)
of degree greater than 0 is any nonzero multiple of itself. Hence, a(x) is not separable if and only
if a(x) divides Dx (a(x)). Since either Dx (a(x)) = 0 or deg Dx (a(x)) < deg a(x), we conclude that
a(x) is not separable if and only if Dx (a(x)) = 0 if and only all monomials of a(x) have a degree
that is a multiple of p. This proves the first claim of the proposition.
Consequently, if a(x) is not separable, then a(x) = a1 (xp ) for some polynomial a1 (x).
Let k be the greatest nonnegative integer such that pk divides the degree of all monomial of a(x).
`
Then a(x) = b xp and at least one term of b(x) has a degree not divisible by p.. By the first part
of the theorem, then b(x) is separable. Furthermore, if b(x) is reducible with b(x) = b1 (x)b2 (x), then
k k
a(x) is reducible with a(x) = b1 (xp )b2 (xp ). By a contrapositive, since a(x) is irreducible, then
b(x) is irreducible.
Proposition 7.7.9
Let F be a finite field with |F | = q. Then q = pn for some prime p and some positive
integer n. In this case, F is an extension of Fp of degree n.
Definition 7.7.10
If F is finite the function σp : F → F is called the Frobenius automorphism. If F is not
finite, σp is called the Frobenius endomorphism.
By Fermat’s Little Theorem (Theorem 2.2.16), the Frobenius automorphism σp is the identity
function on Fp . However, on field extensions of Fp , the automorphism is nontrivial. For example,
consider the field of order 9 defined by F = F3 [x]/(x2 +x+2). Let us call θ the element corresponding
to x in F . Notice that θ2 = 2θ + 1. Then F = {a + bθ | a, b ∈ F3 }. Obviously, σ3 (a) = a for all
a ∈ F3 . However,
Proposition 7.7.11
Every irreducible polynomial over a finite field F is separable.
Proof. Let a(x) be a inseparable polynomial over a finite field of characteristic p. By Proposi-
tion 7.7.8, a(x) = b(xp ) for some polynomial b(x). Then
However, since the Frobenius automorphism is a bijection on the finite field, for each i = 0, 1, . . . , n,
there exist ci ∈ F such that cpi = bi . Hence,
Proposition 7.7.9 pointed out that any finite field has order pn . The converse is the main theorem
of this section.
Theorem 7.7.12
For all primes p and for all positive integers n, there exists a unique (up to isomorphism)
finite field of order pn . Furthermore, every finite field is isomorphic to one of these.
Proof. Proposition 7.7.9 established the second part of the theorem. We need to prove the first part.
n
Consider the polynomial xp − x ∈ Fp [x]. Then
n
Dx (xp − x) = −1,
n n
which has no roots. Hence, xp −x and −1 are relatively prime and so xp −x is separable. Therefore,
this polynomial has pn distinct roots in its splitting field K.
374 CHAPTER 7. FIELD EXTENSIONS
Definition 7.7.13
If q is a prime power q = pn , we denote by Fq or Fpn the unique field of order q.
The uniqueness of finite fields of a given finite cardinality is not obvious from how we construct a
finite field. As a simple example, let us consider the field of 8 elements. The polynomials x3 + x + 1
and x3 + x2 + 1 are irreducible cubic polynomials in F[x]2 . Consequently, we could construct a field
of eight elements by
Theorem 7.7.12 established the unique field of 8 elements is the splitting field of x8 + 1. Using K1
as a reference,
• 0 is the root of x = 0;
• 1 is the root of x + 1 = 0;
Theorem 7.7.12 affirms that a similar partitioning occurs in the construction of every finite field.
The above remark about the factorization of x8 −1 generalizes to any prime p and any field extension
of degree n. Denote by Ψp,n (x) the product of all irreducible polynomials of degree n in Fp [x]. Then
n Y
xp − x = Ψp,n (x).
d|n
where µ(n) is the Möbius function on positive integers. In particular, this implies that
X
deg Ψp,n (x) = µ(d)pn/d .
d|n
However, each irreducible factor of Ψp,n (x) has degree n so we have proved the following result.
Proposition 7.7.14
There are
1X
µ(d)pn/d
n
d|n
We conclude the section with a brief comment on the subfield structure of finite fields.
Exercise 7.7.8 asks the reader to prove that, for any prime p, the field Fpd is a subfield of Fpn if
and only if d|n. Consequently, the Hasse diagram representing the subfield structure of Fpn is the
same as the Hasse diagram of the partial order of divisibility on the divisors of n. For example, if
n = 100, for any prime p the subfield structure of Fp100 has the following Hasse diagram.
Fp100
Fp20 Fp50
F p2 F p5
Fp
Deduce that (p − 1)! ≡ −1 (mod p). (This fact is called Wilson’s Theorem.) Prove also that if n is a
positive composite integer greater than 4, then (n − 1)! ≡ 0 (mod n).
7. Let a ∈ Fp − {0}. Prove that the polynomial xp − x + a ∈ Fp [x] is irreducible.
n
8. Suppose that d | n. Prove that Fpd ⊆ Fpn and that [Fpn : Fpd ] = d
.
4
9. Consider the polynomial p(x) = x + x + 1 ∈ F2 [x].
(a) Show that p(x) is irreducible.
(b) Show that p(x) factors into two quadratics over F4 and exhibit these two quadratic polynomials.
10. Prove that a polynomial f (x) over a field F of characteristic 0 is separable if and only if it is the product
of irreducible polynomials that are not associates of each other. [Note: Consequently, separable
polynomials over a field of characteristic 0 are polynomials that are square-free in F [x].]
11. Find a generator of U (F27 ).
12. The polynomial p1 (x) = x2 + x + 1 ∈ F5 [x] is irreducible. Call θ an element in F25 = F5 [x]/(p1 (x))
that satisfies θ2 + θ + 1 = 0.
(a) Find all other irreducible monic quadratic polynomials in F5 [x].
(b) For each of the 10 polynomials found in the previous part, write the two roots in F25 as aθ + b
for a, b ∈ F5 .
13. Let q = pn . Prove that the Frobenius automorphism ϕ = σp : Fq → Fq is a Fp -linear transformation.
Prove also that ϕn is the identity transformation.
14. Consider the Frobenius map ϕ from the previous exercise. Determine the eigenvalues and all corre-
sponding eigenspaces for ϕ.
15. Consider the Frobenius automorphism σ3 : F9 → F9 . Show how σ3 maps the elements of F9 . [Hint:
Use the identification F9 = F3 [x]/(x2 + x + 2).]
16. Prove that (1 + xp )n = (1 + x)pn in Fp [x]. Deduce that pn ≡ nk (mod p).
pk
17. Let Fq be a finite field and let f (x) be an irreducible polynomial of degree n in Fq [x]. Suppose that
α is one of the roots of f (x) in the field Fqn . Prove that
2 n−1
α, αq , αq , . . . , αq
2k 2k−1
(x2 + x)/(x2 + x).
7.8
Projects
Project I. Field Extensions in Mn (F ). Revisit Exercises 7.1.19 and 7.1.20 Try to generalize
these results to other or any simple extension F (α) of a field F . Use your results to illustrate
interesting multiplications and divisions in the field F (α).
7.8. PROJECTS 377
Project II. Cardano’s Triangle. Recall Cardano’s method to solve the cubic equation. When
the discriminant is negative, so that the solution has three real roots, a geometric interpretation
of the method shows the roots arising as the projections onto the x-axis of the vertices of some
equilateral triangle rotated around some point on the x-axis. (For the equation, x3 +px+q = 0,
that point is the origin.) Explore the solution of the cubic from a geometric perspective. Can
you see how to start from the geometry of projecting vertices of an equilateral triangle into a
cubic equation? Explain the solution to a cubic equation from this geometric perspective.
Project III. Cardano’s Method in C. Section 7.3 presented Cardano’s method for solving the
cubic and quartic equation with the assumption that the coefficients of the polynomial area
real. Discuss the method and the content of the section assuming that the coefficients of the
polynomial are complex numbers. How much changes and how much stays the same?
Project IV. Constructing a Regular 17-gon. The prime number 17 has φ(17) = 16 = 24 .
Hence, [Q(ζ17 ) : Q] = 16. Call ζ = ζ17 . Explain why Theorem 7.4.5 does not rule out the
possibility of constructing cos(2π/17) = 21 (ζ + ζ −1 ). In fact, we will see in Section 11.2 that
this guarantees that cos(2π/17) is constructible. Show that:
• α1 = ζ + ζ 2 + ζ 4 + ζ 8 + ζ 9 + ζ 13 + ζ 15 + ζ 16 is real and is the root of a quadratic polynomial
in Q;
• α2 = ζ + ζ 4 + ζ 13 + ζ 16 is real and is the root of a quadratic polynomial in Q(α1 );
• α3 = ζ + ζ 16 = 2 cos(2π/17) is the root of a quadratic polynomial in Q(α2 ).
2π
Use this sequence to write cos as a combination of nested square root expressions. Also
17
use this sequence to find a straightedge and compass construction of the regular 17-gon. Justify
your construction.
Project V. Irreducible Polynomials in F2 [x]. In certain applications of cryptography, it is
particularly useful to have irreducible polynomials of degree n in F2 [x]. Providing at least one
for each n, attempt to find as many irreducible polynomials of degree k = 2, 3, . . . , n. Do you
see any patterns in which polynomials will be irreducible? Can you devise a fast algorithm to
find an irreducible polynomial of degree n in F2 [x].
Project VI. Epicycloids in Z/nZ? Let n be a somewhat large, say n ≥ 40 integer and consider
the group µn of nth roots of unity. This is a finite subgroup of (U (C), ×). Consider plotting
the elements of µn on the unit circle in C. For various n and for a given small positive integer
m trace an edge between z and z m for all z ∈ µn . The edges create an envelope of a certain
epicycloid. Explain why this is true? Study properties of the epicycloid depending on m and
n. If this graph were created by nail and string artwork, is it ever possible to create the work
with a single piece of string? (Why or why not?)
Project VII. Frobenius Automorphism. For various values of p and n, find a matrix corre-
sponding to the Frobenius automorphism σp on Fpn as a linear transformation on Fpn as a
vector space over Fp . Can you identify patterns in this associate matrix?
8. Group Actions
Though this textbook waited until this point to introduce group actions, from a historical perspec-
tive, group actions came first and motivated group theory. Mathematicians did not define groups ex
nihilo and study their properties from their axioms. Evariste Galois, often credited with defining a
group in the modern sense (Definition 3.2.1), formalized the axioms of groups while studying proper-
ties of symmetry among roots of a polynomial. Subsequent work by mathematicians simultaneously
revealed the richness of the algebraic structure of groups, discovered group-theoretic patterns in
many areas, and developed Galois theory, which applies group actions to the study of polynomial
equations. This textbook covers group actions in this chapter and Galois theory in Chapter 11.
Historically, mathematicians arrived at groups as a set S of bijective functions f : X → X (that
perhaps preserved some interesting property), in which S is closed under composition and taking
function inverses. In broad strokes, group actions involve viewing a group G as a subgroup of the
group of permutations on a set. More precisely, if a group G acts on a set X, then each group
element is a bijective function on X. Section 8.1 defines group actions in the modern sense and
offers many examples.
The perspective of group actions that simultaneously consider properties in the group G and in
the set X on which it acts leads to more information that is available simply from the group itself.
Section 8.2 presents orbits and stabilizers, which specifically considers this interplay between the set
and the group. Section 8.3 presents some properties that are specific to transitive group actions,
including block structures in group actions.
The general theory of group actions proves to be particularly fruitful when we consider a group
acting on itself in some manner. Section 8.4 presents results pertaining to the action of a group on
itself by left multiplication and by conjugation, resulting in Cayley’s Theorem, Cauchy’s Theorem,
and the Class Equation. Section 8.5 introduces a specific action of a group on certain subsets
of its subgroups, which leads to Sylow’s Theorem, a profound result in group theory with many
consequences for the classification of groups.
Section 8.6 offers a brief introduction to the representation theory of finite groups. Though
representation theory is a broad branch of algebra, this section gives a glimpse into how the interplay
between a group acting on a set with some other structure often uncovers interesting results about
both structures. The section serves the dual role of further illustrating groups actions and whetting
the reader/student’s appetite for further study in that area.
8.1
Introduction to Group Actions
To introduce group actions, we consider the dihedral group as first presented in Section 3.1. Chap-
ter 3 presented many examples involving the dihedral group simply in terms of internal group
structure. However, at the outset, we introduced Dn as a set of bijections on the vertices of the
regular n-gon. Hence, if we label the vertices of the regular n-gon as {1, 2, . . . , n}, then Dn can be
viewed as a subgroup of Sn , the set of bijections on the vertices.
Group actions generalize as broadly as possible the perspective of viewing groups as sets of
functions on a set. More precisely, we take the algebraic structure of groups and connect them to
the algebraic structure of sets by seeing how groups can be understood as transformations on a set.
We warn the reader that since group actions arise in so many different contexts within mathe-
matics, there exist a variety of different notations and expressions.
379
380 CHAPTER 8. GROUP ACTIONS
Definition 8.1.1
A group action of a group G on a set X is a function from G × X → X, with outputs
written as g · x or simply gx, satisfying
If a group G acts on a set X, then X is sometimes called a G-set. As another point of terminology,
the function G×X → X is sometimes also called a pairing. Some authors use the shorthand notation
G X to mean “the group G acts on the set X.”
The axioms capture the desired intuition for groups as sets of functions on X. In essence, every
group element behaves like a function on X in such a way that function composition corresponds to
the group operation and the identity of the group behaves as the identity function. More precisely,
for each g ∈ G, the operation g · x is a function we can denote by σg : X → X with σg (x) = g · x.
Recall that we denote by SX the set of bijective functions from a set X to itself.
Proposition 8.1.2
Let G be a group acting on a set X.
Similarly, σg (σg−1 (x)) = x. Hence, the function σg : X → X is bijective with inverse function
(σg )−1 = σg−1 .
To show that ρ is a homomorphism, let g1 , g2 ∈ G. Then ρ(g1 ) ◦ ρ(g2 ) is a bijection X → X such
that for all x,
In other words, actions of a group G on a set X are in one-to-one correspondence with homo-
morphisms from G to SX . Any homomorphism ρ : G → SX is called a permutation representation
because it relabels the elements of G with permutations. We say that a group action induces a
permutation representation of G. This inspires us to give an alternate definition for a group action
that is briefer than Definition 8.1.1.
8.1. INTRODUCTION TO GROUP ACTIONS 381
In every group action, the group identity acts as the identity function on X. However, in an
arbitrary group action, many other group elements could have no effect on X. It is an important
special case when the group identity is the only group element that acts as the identity.
Definition 8.1.4
Suppose that a group action of G on X has a permutation representation of ρ. Then the
action is called faithful if Ker ρ = {1}. We also say that G acts faithfully on X.
Since Ker ρ = {1}, then ρ is injective. This means that ρ(g) 6= ρ(h) for all g 6= h in G.
Therefore, the action is faithful if and only if each distinct group element corresponds to a different
function on X. Furthermore, by the First Isomorphism Theorem, if an action is faithful, then
G∼= G/(Ker ρ) ∼ = Im ρ, which presents G as a subgroup of SX .
Definition 8.1.1 is sometimes called a left group action of G on the set X to reflect the notational
habit of applying functions on the left of the domain element. Beyond notation habit, in a left group
action, the composed element (g1 g2 ) · x involves first acting on x by g2 and then by g1 .
3 2
4 1
5 6
With the specific example of a hexagon, if we label the vertices of the hexagon {1, 2, 3, 4, 5, 6} as
in Figure 8.1, then the corresponding permutation representation ρ satisfies
ρ(r) = (1 2 3 4 5 6)
ρ(s) = (2 6)(3 5).
Note that if we labeled the vertices of the hexagon differently, then we would induce a different
homomorphism of D6 into S6 .
It is evident from the construction that Dn acts faithfully on {1, 2, . . . , n}. Indeed, when defining
elements in Dn we only care about bijections on the set of vertices and we do not consider two
functions different if they have the same effect on all elements of {1, 2, . . . , n}. 4
Example 8.1.6 (Permutation Group). As mentioned in the introduction to this section, the
permutation group Sn acts on the set {1, 2, . . . , n} by viewing each permutation σ ∈ Sn as a bijection
382 CHAPTER 8. GROUP ACTIONS
on {1, 2, . . . , n}. This example is not surprising since Sn was essentially defined by bijections on
{1, 2, . . . , n} compose with each other. The permutation action of Sn on {1, 2, . . . , n} is faithful. 4
Example 8.1.7 (Linear Algebra). Let F be a field and consider the vector space V = F n over
the field F , for some positive integer n. Then the group GLn (F ) acts on V by multiplying a vector
by an invertible matrix. Explicitly, the pairing of the action GLn (F ) × V → V is A · ~v = A~v , where
the right-hand side is matrix-vector multiplication.
Note that though the notation A~v is familiar from linear algebra, this is the first time we give
precise algebraic context to matrix-vector multiplication.
We can show that this action is faithful by considering how GLn (F ) acts on the standard basis
vectors ei in which the n-tuple that is all 0s except for a 1 in the ith entry. Recall that Aei is the
ith column of A. Therefore, if Aei = ei for all i = 1, 2, . . . , n, then the ith column of A is ei (as a
column), so A = I. 4
Example 8.1.8 (Trivial Action). Let G be a group and X any set. Then the action gx = x for
all g ∈ G and x ∈ X is called the trivial action of G on X. Every group element g acts as the identity
on X. In this sense, every group can act on every set. Intuitively, a trivial action is opposite from
a faithful action in that Ker ρ = G for a trivial action, whereas Ker ρ = {1} for a faithful action. 4
Example 8.1.9. Consider the group D6 and how it acts on the diagonals of the hexagon. The
diagonals of the hexagon are d1 = {1, 4}, d2 = {2, 5}, and d3 = {3, 6}. For any of these 2-element
subsets of vertices, any dihedral symmetry of the hexagon maps a diagonal into another diagonal.
Therefore, D6 acts on {d1 , d2 , d3 }.
This action is not faithful because r3 · di = di for i = 1, 2, 3. Note that s ∈
/ Ker ρ because even
though s · d1 = d1 , we also have s · d2 = d3 . The permutation representation ρ of this group action
is completely defined by ρ(r) = (1 2 3) and ρ(s) = (2 3). 4
Example 8.1.10. Consider the group D5 and consider the set of 11 polygonal regions inside the
pentagon bordered by the complete graph on the set of vertices as shown below. The dihedral group
D5 acts on the set of polygonal regions. If we label the regions with the integers 1, 2 . . . , 11, the
action induces a homomorphism ρ : D5 → S11 .
v2
v3
v1
v4
v5 4
Example 8.1.11 (Rigid Motions of the Cube). Consider the group G of rigid motions (solid
rotations) of the cube. There are many actions that are natural to consider. G acts on:
• the set of segments that connect centroids of opposite faces on the cube (3 elements).
8.1. INTRODUCTION TO GROUP ACTIONS 383
8
7
5
6
M0
180◦ M
4
3
1
2
As one action of particular interest, consider the action of G on the diagonals through the center
of the cube. This action correspond to a homomorphism ρ : G → S4 . Let us label the vertices of the
cube by {1, 2, 3, 4, 5, 6, 7, 8} and label these long diagonals by d1 = {1, 7}, d2 = {2, 8}, d3 = {3, 5},
and d4 = {4, 6}. See Figure 8.2.
Pick an edge e. Let e0 be the edge of the cube that centrally symmetric to e through the middle
(centroid) O of the cube. Let M and M 0 be the midpoints of e and e0 respectively. Let Re be the
rotation of the cube by 180◦ around the line (M M 0 ). The rigid motion Re interchanges the two
diagonals that are on the square defined by the edge e and e0 (the plane defined by e and O) but it
leaves unchanged the diagonals that are in the plane perpendicular to the plane defined by e and e0 .
Thus, ρ(Re ) is a transposition in S4 , the symmetric group on {d1 , d2 , d3 , d4 }. There are six pairs of
centrally symmetric edges, which lead to six distinct rigid motions of the form Re , which induce the
6 transpositions in S4 . Since S4 is generated by its transpositions, we deduce that ρ(G) = S4 .
Furthermore, it is not hard to show (see Exercise 3.3.27) that |G| = 24. Thus, therefore the
homomorphism ρ is a surjective function between two finite groups of the same size. We deduce
that ρ is bijective, so ρ is an isomorphism. We conclude that the group of rigid motions of the cube
is isomorphic to S4 . 4
Example 8.1.12 (Sets of Functions). Let Fun(X, Y ) be the set of functions from the set X to
a set Y .
If G acts on Y , then there is a natural action of G on Fun(X, Y ) via
(g · f )(x) = g · (f (x)).
It is an easy exercise to see that this is a group action. If G acts on X, then there also exists a
natural action of G on Fun(X, Y ) via
It is crucial that the right-hand side involve g −1 . We check the compatibility axiom for this action.
Let g, h ∈ G and let f ∈ Fun(X, Y ). Writing h · f = f 0 , then the function g · (h · f ) satisfies
It easy to check the identity axiom. Hence, (8.1) does indeed define an action on Fun(X, Y ). 4
Example 8.1.13 (Rearrangement of n-tuples). Let A be a set, let n be a positive integer, and
let X = An be the set of n-tuples of A. Consider the pairing Sn × X → X that permutes the entries
of (a1 , a2 , . . . , an ) ∈ An according to a permutation σ. In other words, in the action of σ on the
384 CHAPTER 8. GROUP ACTIONS
n-tuple (a1 , a2 , . . . , an ), the ith entry is sent to the σ(i)th position. Note that in σ · (a1 , a2 , . . . , an ),
the ith entry is the σ −1 (i)th entry of (a1 , a2 , . . . , an ). Thus,
σ · (a1 , a2 , . . . , an ) = (aσ−1 (1) , aσ−1 (2) , . . . , aσ−1 (n) ). (8.2)
We show that this defines a group action of Sn on X = An . First, for all τ, σ ∈ Sn , we have
τ · (σ · (a1 , a2 , . . . , an )) = τ · (aσ−1 (1) , aσ−1 (2) , . . . , aσ−1 (n) )
= (aσ−1 (τ −1 (1)) , aσ−1 (τ −1 (2)) , . . . , aσ−1 (τ −1 (n)) )
= (a(τ σ)−1 (1) , a(τ σ)−1 (2) , . . . , a(τ σ)−1 (n) )
= (τ σ) · (a1 , a2 , . . . , an ).
Also, 1 · (a1 , a2 , . . . , an ) = (a1 , a2 , . . . , an ) since it does not permute the elements.
If (8.2) seems counterintuitive at first, observe that as sets An is equal to Fun({1, 2, . . . , n}, A)
and the action described in (8.2) is precisely the action defined in Example 8.1.12.
In contrast, it is important to realize that the function Sn × X → X defined
σ · (a1 , a2 , . . . , an ) = (aσ(1) , aσ(2) , . . . , aσ(n) )
is not a group action of Sn on X as it fails the compatibility axiom. 4
There are many other types of group actions of considerable interest. The examples provided so
far just scratch the surface. The following subsection presents a few important actions of a group
acting on itself. Following that, the reader is encouraged to peruse the exercises for many other
examples.
Definition 8.1.14
Let G be a group and let X be a G-set. A G-subset of X is a subset S ⊆ X such that
g · x ∈ S for all g ∈ G and all x ∈ S. We also say that S is closed under the action of G or,
equivalently, that S is invariant under G.
Whenever a subset S of X is closed under the action of G, then the axioms of the action of G
on X restrict to S, giving S the structure of a G-set.
As an example, consider the plane R2 equipped with an origin O and with a labeled x-axis and
y-axis. Consider the natural action of the dihedral group Dn on R2 , where r corresponds to rotation
by 2π/n around the origin O and where s corresponds to reflection about the x-axis. Any Dn -subset
of R2 is a subset of R2 that has dihedral symmetry, i.e., is invariant under the action of Dn .
Example 8.1.15. Let G be the group of rigid motions of a cube described in Example 8.1.11 and
let V be the set of vertices of the cube. There is a natural action of G on V by how the rotation maps
the vertices. Namely for every vertex v ∈ V , g · v is the image of the vertex v under the rotation g.
Let P(V ) be the set of subsets of vertices of V . We equip P(V ) with the G-action defined by
g · {x1 , x2 , . . . , xn } = {g · x1 , g · x2 , . . . , g · xn }.
The set of edges E of the cube is a G-subset of P(V ) since every solid rotation of the cube maps an
edge to another edge. The G-set P(V ) has many other G-subsets, e.g., the set of unordered pairs of
vertices {S ⊆ V | |S| = 2}, the set of faces, the set of long diagonals, etc. However, not all subsets
of P(V ) are G-subsets. For example, given a fixed vertex v0 , the singleton set {v0 } is not a G-subset
since any solid rotation g ∈ G that does not leave v0 fixed satisfies g · v0 ∈
/ {v0 }. 4
8.1. INTRODUCTION TO GROUP ACTIONS 385
Definition 8.1.16
Let G be a group and let X and Y be two G-sets (i.e., there is an action of G on X and
on Y ). A G-set homomorphism between X and Y is a function f : X → Y such that
Exercises 8.1.13 and 8.1.14 establish some results about G-set homomorphisms that we might
expect from standard results about group homomorphisms and ring homomorphisms.
Note that in Definition 8.1.16 the group G acting on X and Y is the same. In this perspective, if G
and G0 are nonisomorphic groups, then we consider the collection of G-sets and G0 -sets as two distinct
algebraic structures. Because of this restriction, the above definition might feel unsatisfactory. For
example, suppose that a group G acts on a set X and a group H acts on a set Y . We might consider
the group actions as equivalent if that action is identical after a relabeling of the elements in G with
elements in H and a parallel relabeling of elements in X with elements in Y . To name this desired
phenomenon, we use the following definition.
Definition 8.1.17
A group action homomorphism between two group actions (G, X, ρ1 ) and (H, Y, ρ2 ) is a
pair (ϕ, f ), where ϕ : G → H is a homomorphism and f : X → Y is a function such that
If a group action isomorphism exists between two group actions, they are called isomorphic (or
permutation equivalent).
Example 8.1.18. Consider the natural action of GL2 (F2 ) on the vector space X = F22 of four
elements over F2 . Also consider the action of S3 on the set Y = {0, 1, 2, 3} by fixing 0 and permuting
{1, 2, 3} as usual. Let ϕ : GL2 (F2 ) → S3 be the isomorphism described in Example 3.7.20. Then
the bijection f : X → Y that maps
0 1 0 1
7−→ 0 7−→ 3 7−→ 2 7−→ 1
0 0 1 1
makes the pair (ϕ, f ) into an isomorphism between group actions. 4
5. Let F be a field, let G = GLn (F ), and let X = Mn×n (F ) be the set of n × n matrices with entries in
F . Discuss how the relation of similarity on square matrices in Mn×n (F ) is related to a group action
of G on X.
6. Fix a positive integer n. Let X = {1, 2, . . . , n} and consider the mapping Sn × P(X) → P(X) defined
by
σ · {x1 , x2 , . . . , xk } = {σ(x1 ), σ(x2 ), . . . , σ(xk )}.
(a) Prove that this pairing defines an action of Sn on P({1, 2, . . . , n}).
(b) For a given k with 0 ≤ k ≤ n, define Pk (X) as the set of subsets of X of cardinality k. Prove
that Pk (X) are closed under the action of Sn on P(X).
(c) Prove that a subset Y of P(X) is closed under the action of Sn if and only if Y is the union of
some Pk (X).
7. Consider the action defined in Exercise 8.1.6 where n = 4 and k = 2. The induced permutation
representation is a homomorphism ρ : S4 → S6 . Label the elements in P2 ({1, 2, 3, 4}) according to the
following chart.
label 1 2 3 4 5 6
subset {1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4}
Give ρ(σ) as a permutation in S6 for the permutations σ = 1, (1 2), (1 2 3), (1 2)(3 4), and (1 2 3 4).
8. Let Pn be the of polynomials in R[x] that have degree n or less (including the 0) polynomial. Consider
σ ∈ Sn+1 as a permutation on {0, 1, 2, · · · , n} and define
(g Ker ρ) · x = g · x
(g ? f )(x) = g · f (g −1 · x)
8.2
Orbits and Stabilizers
The action of a group on a set creates some interaction between the algebraic structure of sets and
the algebraic structure of groups. Nontrivial actions of a group G on a set X connect information
about the set X with information about the group G in interesting ways.
For example, if G and X are finite and ρ : G → SX is the permutation representation induced
from an action of G on X, then by Lagrange’s Theorem, | Im ρ| divides |SX |. Furthermore, by the
First Isomorphism Theorem, Im ρ ∼ = G/ Ker ρ so | Im ρ| = |G|/| Ker ρ| and thus | Im ρ| also divides
|G|. Just this connection puts some constraints on what can occur for actions of a given group G
on a set X. For example, if G ∼= Zp , where p is a prime number and if |X| = n < p, then the only
action of G on X is trivial.
8.2.1 – Orbits
One of the first nonobvious connections between groups and sets is that a group action of G on X
defines an equivalence relation.
Proposition 8.2.1
Let G be a group acting on a nonempty set X. The relation defined by x ∼ y if and only
if y = g · x for some g ∈ G is an equivalence relation.
Proof. Let x ∈ X and let 1 be the identity in G. Since 1 · x = x, then x ∼ x for all x ∈ X. Hence,
∼ is reflexive.
Suppose that x, y ∈ X with x ∼ y. Then there exists g ∈ G such that y = g · x. Consequently,
g −1 · y = g −1 · (g · x) = (g −1 g) · x = 1 · x = x.
Definition 8.2.2
Let G be a group acting on a nonempty set X. The ∼-equivalence class {g · x | g ∈ G},
denoted by G · x (or more simply Gx), is called the orbit of G containing x.
388 CHAPTER 8. GROUP ACTIONS
Recall that the equivalence classes of an equivalence relation form a partition of X. Consequently,
the orbits of G on X partition X.
The terminology of “orbit” might appear strange at first pass. Indeed, many algebraists use
the term freely without regularly recalling the etymology. The etymology evokes an application of
group actions to dynamical systems. Though the following example uses a fundamental result of
differential equations to illustrate the terminology, it is not essential for the reader to be familiar
with differential equations. Furthermore, we can only provide an intuitive treatment as the technical
details would take us far afield.
Example 8.2.3 (Dynamical Systems). Let X = Rn . A parametric curve in Rn has the form
~x(t) = (x1 (t), x2 (t), . . . , xn (t)) for real-valued functions xi (t). A first-order differential equation in
the vector function ~x(t) is an equation
where each function Fi involves that n space variables and the parameter (time) variable t. A
solution to the differential equation (or system of parametric equations) is any parametric curve
~x(t) that satisfies (8.3) for all t in some nonempty interval.
Existence and uniqueness theorems (see [11, Theorem 7.1.1] or [3]) in the theory of differential
equations establish conditions on the functions Fi under which solutions exist and are unique given
an initial condition ~x(t0 ) = ~a for a given initial parameter value t0 and some initial condition ~a. This
leads to the concept of a flow which is defined as a one-parameter family of functions φt : Rn → Rn ,
with t ∈ R, such that
φt ◦ φs = φt+s
8.2. ORBITS AND STABILIZERS 389
and φ0 is the identity function on Rn . The solutions to (8.3) give a flow on Rn by defining φt as
φs (~a) = ~x(s),
where ~x(t) is the unique solution to (8.3) with initial condition ~x(0) = ~a. A flow on Rn is an action
of the group (R, +) on the set Rn . Furthermore, for a fixed ~a, the flow φt (~a) describes the trajectory
(orbit) of a particle that is governed by the differential equation (8.3) as t evolves and that starts at
the initial point ~a. Therefore, the orbit for ~a as a group action is precisely the orbit as a trajectory
of a particle governed by this dynamical system starting at the point ~a.
Figure 8.3 shows the vector field F~ (x, y) = 0.8x(1 − y)2 , 0.3y(x − 1) along with the orbits
namely the solution with ~x(0) = (0.3, 2) = ~a and ~x(0) = (0.3, 1) = ~b. These trajectories show the
flow φt of the differential equation applied to the two points ~a and ~b.
The fact that solutions to differential equations of this form (under appropriate conditions) form
a flow, means that if we watch how points in Rn evolve during an interval of time t and then during
a subsequent interval of time s, then the points will have evolved as if we simply considered how
they evolved during an interval of time of duration t + s. 4
Since flows refer to group actions with the group (R, +), the reader should understand that the
intuition provided by the above example, which motivated the term “orbit,” is limiting for what can
happen in group actions in general. Indeed, we regularly consider consider group actions for finite
groups, nonabelian groups, or groups that do not have a natural total order on them.
Definition 8.2.4
Suppose that a group G acts on a set X. We say that G fixes an element x ∈ X if g · x = x
for all g ∈ G. Furthermore, the action is called
(1) free if g · x = h · x for some x ∈ X, then g = h;
(2) transitive if for any two x, y ∈ X, there exists g ∈ G such that y = g · x;
(3) regular if it is both free and transitive;
(4) r-transitive if for every two r-tuples {x1 , x2 , . . . , xr } and {y1 , y2 , . . . , yr } of distinct
elements in X, there exists an element g ∈ G such that
(y1 , y2 , . . . , yr ) = (g · x1 , g · x2 , . . . , g · xr ).
Note that an element x is fixed by G if and only if {x} is an orbit of G. On the opposite
perspective, the action of G on X is transitive if and only if there is only one orbit, namely all of X.
A group action is free if and only if the only element in G that fixes any element in X is the group
identity.
Example 8.2.5. Consider the action of a group G on its set of subgroups Sub(G) by conjugation.
(See Exercise 8.4.10.) There exists a natural bijection between H and gHg −1 . In particular, if
G is a finite group, then the orbits of this action stay within subgroups of G of fixed cardinality.
A subgroup H is fixed by this action if and only if gHg −1 = H for all g ∈ G. Hence, the fixed
subgroups are precisely the normal subgroups of G.
As a specific example, consider the the group D6 , whose lattice of subgroups is given in Exam-
ple 3.6.8. The orbits of the action of D6 on Sub(D6 ) are
{D6 }, {hs, r2 i}, {hr6 i}, {hsr, r2 i}, {hs, r3 i}, {hsr, r3 i, hsr2 , r3 i},
{hr2 i}, {hsi, hsr2 i, hsr4 i}, {hr3 i}, {hsri, hsr3 i, hsr5 i}, {h1i}. 4
390 CHAPTER 8. GROUP ACTIONS
y y
Q
x x
Example 8.2.7. Continuing with examples associated to group actions of D6 on various sets, con-
sider the standard action of D6 on the vertices of a regular hexagon, labeled as in Example 8.1.5.
This action is transitive. In particular, if a, b ∈ {1, 2, 3, 4, 5, 6} then the group element rb−a maps a
into b. However, the action is not 2-transitive (or r-transitive for r ≥ 2). Indeed, if g · a = b with
a < 6, then g · (a + 1) can only be one of the two vertices on either side of b. 4
Example 8.2.8. Let F be a field and consider the action of GLn (F ) on the vector space V = F n
by matrix-vector multiplication. See Example 8.1.7. This action has exactly two orbits. One orbit
corresponds to the fixed point ~0. On the other hand, consider any two nonzero vectors ~a and ~b. Let
M1 , M2 ∈ GLn (F ) be matrices such that the first column of M1 is ~a and the first column of M2 is
~b. Then M = M2 M −1 ∈ GLn (F ) and M~a = ~b. Thus, V − {~0} is an orbit of this action.
1
We point out that the action of GLn (F ) on V − {~0}, though transitive is not 2-transitive. We
can see this by taking ~a and ~b to be collinear vectors and ~u and ~v to be linearly independent vectors.
There is no invertible matrix g ∈ GLn (F ) such that g~a = ~u and g~b = ~v . 4
Example 8.2.9. Consider the set X = Mm×n (R) of real m×n matrices. The group G = GLm (R)×
GLn (R) acts on Mm×n (R) by
(A, B) · M = AM B −1 . (8.4)
Let T : Rn → Rm be a linear transformation and suppose that M is the matrix of T with respect to
a basis of Rn and a basis of Rm . Then the action described above corresponds to effecting a change
of basis on Rm with basis change matrix A and a change of basis on Rn with basis change matrix
B.
If N = AM B −1 , then AM = N B. Since A and B are invertible, Im B = Rn and Im A = Rm .
Hence, rank AM = rank N B implies that rank M = rank N . We proceed to prove the converse.
Suppose that M has rank r. By definition, the linear transformation T (~x) = M~x, expressed
with respect to the standard bases on Rn and Rn has dim Im T = r. By the Rank-Nullity Theorem
dim Ker T + dim Im T = n so dim Ker T = n − r. Let {~ur+1 , ~ur+2 , . . . , ~un } be a basis of Ker T as
a subspace of Rn . Complement this basis with a set of vectors {~u1 , ~u2 , . . . , ~ur } to make an ordered
basis B = (~u1 , ~u2 , . . . , ~un ) of Rn .
8.2. ORBITS AND STABILIZERS 391
Define ~vi = T (~ui ) for 1 ≤ i ≤ r as vectors in Rm . Consider the trivial linear combination
Then, T (c1 ~u1 + c2 ~u2 + · · · + cr ~ur ) = ~0. However, Span(~u1 , ~u2 , . . . , ~ur ) ∩ Ker T = {~0}. So we deduce
that c1 ~u1 + c2 ~u2 + · · · + cr ~ur = ~0 in Rn . Since the ~ui are linearly independent, we deduce that ci = 0
for 1 ≤ i ≤ r. This shows that {~v1 , ~v2 , . . . , ~vr } is a linearly independent set. We now complement
this set with m − r vectors to make an ordered basis B 0 = (~v1 , ~v2 , . . . , ~vm ) on Rm .
By construction, the matrix of T with respect to the basis B on Rn and B 0 on Rm is the matrix
Ir 0
.
00
Consequently, every matrix M of rank r has the above matrix in its orbit. This example has
shown that the orbits of GLm (R) ⊕ GLn (R) acting on Mm×n (R) by (8.4) are the sets
8.2.2 – Stabilizers
Let G act on a set X. An element g ∈ G is said to fix an element x ∈ X if g · x = x. The axioms of
group actions lead to strong interactions between groups and subsets of X, especially in relation to
elements in G that fix an element x ∈ X or conversely all the elements in X that are fixed by some
element g ∈ G.
Proposition 8.2.10
Suppose that a group G acts on a set X. For any element x ∈ X, the subset Gx = {g ∈
G | g · x = x} is a subgroup of G.
g −1 · (g · x) = g −1 · x =⇒ (g −1 · g) · x = g −1 · x =⇒ x = g −1 · x.
Definition 8.2.11
Given x ∈ X, the subgroup Gx = {g ∈ G | g · x = x} is called the stabilizer of x in G.
Group properties lead to the following important theorem and the subsequent Orbit Equation.
Proof. We need to show a bijection between the elements in the equivalence class G · x of x and left
cosets of Gx .
Consider the function f from the orbit G · x to the cosets of Gx in G defined by
We first verify that this association is even a function. Suppose that g1 ·x = g2 ·x. Then (g2−1 g1 )·x = x
so g2−1 g1 ∈ Gx and hence the cosets g1 Gx and g2 Gx are equal. This shows that any image of f is
independent of the orbit representative so f is a function from Gx to the set of left cosets of Gx .
Now suppose that f (y1 ) = f (y2 ) for two elements in G · x, with y1 = g1 · x and y2 = g2 · x. Then
g1 Gx = g2 Gx so g2−1 g1 ∈ Gx . Hence, (g2−1 g1 ) · x = x and, by acting on both sides by g2 , we get
g1 · x = g2 · x. This proves that f is injective.
Finally, to prove that f is also surjective, let hGx be a left coset of Gx in G. Then hGx = f (h·x).
The element h · x is in the orbit G · x so f is surjective.
Interestingly enough, the proof of the Orbit-Stabilizer Theorem does not assume that the group
or the set X is finite. The equality of cardinality holds even if the cardinalities are infinite.
We notice as a special case that if G acts transitively on a set X, then |G : Gx | = |X| for all
elements x ∈ X. In particular, if X and G are both finite, then |G| = |X||Gx |, which implies that
|G| is a multiple of |X|. Furthermore, a group action can only be regular if |G| = |X|.
The Orbit-Stabilizer Theorem leads immediately to the following important corollary. The Orbit
Equation is a generic equation, applicable to such an action, that flows from the fact that orbits of
G partition X. However, the Orbit Equation often gives rise to interesting combinatorial formulas.
Example 8.2.14. As a simple illustration of the Orbit-Stabilizer Theorem, we prove that a group
G of order 15 acting on a set X of size 7 has a fixed point. By the Orbit-Stabilizer Theorem, the
orbit of an element x has order |G : Gx |, where Gx is the stabilizer of x. Hence, |G : Gx | can be
equal to 1, 3, 5, or 15. If |G : Gx | = 1, then G = Gx so all of G fixes x and hence x is a fixed point
of the action. Obviously, there can be no orbit of size 15 in a set of size 7. Assume there is no fixed
point. There the orbits must have order 3 or 5. Hence, if there are r orbits of size 3 and s orbits of
size 5, then 3r + 5s = 7 for r, s ∈ N. If s = 0 then r = 7/3 ∈ / N; if s = 1, then r = 2/3 ∈/ N; and
if s ≥ 2, then r < 0. We have shown that there exist no solutions in nonnegative integers to the
equation 3r + 5s = 7. We conclude by contradiction that there must be a fixed point. 4
We can interpret the result of Exercise 8.1.6 by saying that the orbits of this action consist of the set
of subsets of a given cardinality k, for k ranging from 0 to n. For a fixed k, define A = {1, 2, . . . , k}.
The stabilizer GA is the set of permutations σ ∈ Sn that map {1, 2, . . . , k}. Hence,
because there are k! ways σ can permute {1, 2, . . . , k} and (n−k)! ways σ can permute the remaining
elements of {1, 2, . . . , n}. We recover the fact that there are
|G|
n! n
|G : GA | = = =
|GA | k!(n − k)! k
subsets of X of size k. Furthermore, since |P({1, 2, . . . , n})| = 2n , the orbit equation for this action
is the well-known combinatorial formula
n
n
X n
2 = .
k 4
k=0
8.2. ORBITS AND STABILIZERS 393
Definition 8.2.16
Let G be a group acting on a set X. For any g ∈ G, define the fixed subset
X g = {x ∈ X | g · x = x}.
We note that the fixed subset X g serves a parallel role as the stabilizer of a set element x.
Proof. Consider the set of elements S = {(x, g) ∈ X × G | g · x = x}. We count the number of
elements of S in two different ways, by summing first through elements in G and then by summing
first through X. By summing first through G, we get
X
|S| = |X g |.
g∈G
and the result follows by identifying the two ways of counting |S|.
The Cauchy-Frobenius Lemma has many interesting applications in counting problems and com-
binatorics. In particular, if a counting problem can be phrased in a manner to count orbits of a
group acting on a set, then the lemma provides a strategy to compute the number of orbits m.
Example 8.2.18. Suppose that we wish to design a bracelet with 8 beads using beads of 3 different
colors, such as the one in Figure 8.5. We consider two bracelets equivalent if the arrangement of
bead colors of one can be obtained from the other simply by rotating it. We propose to determine
how many inequivalent bracelets of 8 beads can be made using 3 colors.
The rotation on a bracelet with 8 beads corresponds to the action of Z8 . There are 38 different
ways of putting beads on the bracelet without considering the equivalence. Any group elements in Z8
of the same order d will fix the same number of bracelet colorings. Hence, instead of summing over
elements in Z8 , we can sum over the divisors of 8, corresponding to the order of various elements in
394 CHAPTER 8. GROUP ACTIONS
Z8 . Note that for any d|8, there are φ(d) elements of order d. Finally, note that a bracelet coloring
will be fixed by an element of order d if and only if the color pattern repeats every d8 beads. But
then a contiguous set of d8 can be colored in any way. Hence, the number of inequivalent bracelets
is m, where
X 1
8m = φ(d)38/d =⇒ m = (38 + 34 + 2 · 32 + 4 · 31 ) = 834.
8
d|8
To connect this bead-coloring problem to more theoretical language of group actions, we can view
the set of possible bracelets as the set of functions X = Fun({1, 2, . . . , 8}, {1, 2, 3}). The domain
corresponds to the bead position and the codomain is the bead color. The action of rotating the
bracelet corresponds to the action of Z8 on X by
(σ · f )(x) = f (σ −1 · x),
where A ∈ GL2 (R) and e, f ∈ R. Show that the natural action of G on R2 is 2-transitive.
8.2. ORBITS AND STABILIZERS 395
6. Let G act on a nonempty set X. Suppose that x, y ∈ X and that y = g · x for some g ∈ G. Prove that
Gy = gGx g −1 . Deduce that if the action is transitive then the kernel of the action is
\
gGx g −1 .
g∈G
7. Let G be a group acting on a set X. Prove that a subset S of X is a G-invariant subset of X if and
only if S is a union of orbits.
8. Suppose that a group G acts on a set X and also on a set Y . Prove that a G-set homomorphism
f : X → Y maps orbits of G in X to orbits of G in Y .
9. Show that a group of order 55 acting on a set of size 34 must have a fixed point.
10. Suppose G is a group of order 21. Determine with proof the positive integers n such that an action of
G on a set X of size n must have a fixed point.
11. Let X = R[x1 , x2 , x3 , x4 ]. Consider the action of G = S4 on X by
(a) Find the stabilizer of the polynomial x1 + x2 and give its isomorphism type.
(b) Find a polynomial q(x1 , x2 , x3 , x4 ) whose stabilizer is isomorphic to D4 .
(c) Explicitly list the elements in the orbit of x1 x2 + 5x3 .
(d) Explicitly list the elements in the orbit of x1 x22 x33 .
12. Let G be a group acting on a set X. Let H be a subgroup of G. It acts on X with the action of G
restricted to H. Let O be an orbit of H is X.
(a) Prove that for all g ∈ G, the set gO is an orbit of the conjugate subgroup gHg −1 .
(b) Deduce that if G is transitive on X and if H E G, then all the orbits of H are of the form gO.
13. Let A = {1, 2, . . . , n} and consider the set X = {(a, S) ∈ A × P(A) | a ∈ S}. Consider the action of
Sn on X by σ · (a, S) = (σ(a), σ · S), where σ · S is the power set action. (See Exercise 8.1.4.) Prove
that the Orbit Equation for this action gives the combinatorial formula
n
!
X n
k = n2n−1 .
k
k=0
14. Let A be a finite set of size k and consider the action of Sn on X = An via
(See Example 8.1.13.) Prove that the Orbit Equation for this action is
X n!
kn = ,
s1 +···+sk =n
s1 !s2 ! · · · sk !
Define the action of G = SA ⊕ SB on Pk (A ∪ B) as the standard set of subsets action. Prove that the
Orbit Equation of G on X gives the Vandermonde Identity
! ! !
a+b X a b
= .
k i j
i+j=k
16. Let X be the set of functions from {1, 2, . . . , n} into itself and let G = Sn ⊕ Sn . Define the pairing
G × X → X by
((σ, τ ) · f )(a) = σ · f (τ −1 · a) for all a ∈ {1, 2, . . . , n}.
396 CHAPTER 8. GROUP ACTIONS
This action has the effect of rearranging the elements in the triple according to σ and then permuting
the outcome by τ . Explicitly calculate X g for all g ∈ G and verify the Cauchy-Frobenius Lemma.
18. Recall that the group of rigid motions of a tetrahedron is A4 . Suppose that we color the faces of a
tetrahedron with colors red, green, or blue. We consider two colorings equivalent if one coloring can be
obtained from another by rotating the tetrahedron. How many inequivalent such colorings are there?
19. We consider colorings of the vertices of a square as equivalent if one coloring can be obtained from
the other by any D4 action on the square. How many different colorings are there with (a) 3 colors;
(b) 4 colors; (c) 5 colors?
20. We consider colorings of the vertices of an equilateral triangle as equivalent if one coloring can be
obtained from the other by any D3 action on the triangle. How many different colorings are there
using p colors?
21. Repeat Example 8.2.18 but consider bracelet colorings equivalent if one is obtained from another under
the action of some D8 element. (Note that the bracelet in Figure 8.5 is only fixed by 1 under the Z8
action but by {1, sr} in D8 .)
8.3
Transitive Group Actions
The previous section discussed primarily the orbits of a group action on a set. From the emphasis
of the section, the reader might get the impression that transitive group actions are not interesting
since in that case there is only one orbit, namely the whole set. This could not be further from the
truth.
There still exists considerable structure within a transitive group action. In fact, a group acts
transitively on each orbit, so we may view any group action as a union of transitive group actions.
In order to look deeper into the analysis of group actions, we must address properties of transitive
group actions.
Definition 8.3.1
Let G be a group that acts transitively on a set X. A block is a nonempty subset B ⊆ X
such that for all g ∈ G, either g · B = B or (g · B) ∩ B = ∅.
8.3. TRANSITIVE GROUP ACTIONS 397
For every transitive action of a group G on a set X, the singleton sets {x} in X are blocks as is
the whole set X. Since these subsets are always blocks, we call them trivial. However, as the group
of rigid motions of the cube illustrates, some group actions may possess other blocks. Any of the
long diagonals of a cube is a block whereas a face is not a block. Some group actions do not possess
any nontrivial blocks. We give these a specific name.
Definition 8.3.2
A transitive action of a group G on a set X is called primitive if the only blocks of the
action are the trivial ones.
Example 8.3.3. Consider the group Z8 and its action on X = {1, 2, 3, 4, 5, 6, 7, 8} where Z8 is given
as h(1 2 3 4 5 6 7 8)i in S8 . This action is obviously transitive. Besides the trivial blocks and X, the
subsets
{1, 3, 5, 7}, {2, 4, 6, 8}, {1, 5}, {2, 6}, {3, 7}, {4, 8}
are blocks as well and the action of Z8 on X has no other blocks besides these. 4
Proposition 8.3.5
If B is a block in the action of a group G on a finite set X, then |B| divides |X|.
If Σ is a system of blocks, then the group G acts transitively in the obvious way on Σ, thereby
inducing another group action but on a smaller set. Comparing the action of G on X with the action
of G on a system of blocks Σ, we introduce the following two types of stabilizers.
Definition 8.3.6
Let G act on X and let B ⊆ X, not necessarily a block. We define the pointwise stabilizer
of B as
G(B) = {g ∈ G | g · x = x for all x ∈ B}
and the setwise stabilizer as
G{B} = {g ∈ G | g · B = B}.
Note that if B is a block in the action of G on X, then G{B} is the usual stabilizer of B in the
action of G on Σ. In contrast, for the pointwise stabilizer
\
G(B) = Gx .
x∈B
398 CHAPTER 8. GROUP ACTIONS
5 6
9 7 1 2
8 3
It is not hard to see that G(B) and G{B} are both subgroups and that G(B) E G{B} . (See
Exercise 8.3.9.) Obviously, if B is a singleton B = {b}, then G(B) = G{B} = Gb .
The subgroup H has the effect of cycling through the blocks individually and independently cycling
through the blocks as a whole.
Figure 8.6 gives an intuitive picture of the action of H on {1, 2, 3, 4, 5, 6, 7, 8, 9}. The permutation
(1 2 3) corresponds to a clockwise rotation of the triangle {1, 2, 3}; the permutation σ corresponds
to a counterclockwise rotation of 120◦ of the whole figure; and the permutations
correspond to clockwise rotations by 120◦ individually in the triangles {4, 5, 6} and in {7, 8, 9}.
The setwise stabilizer of B1 is
and this is the same setwise stabilizer for the other two blocks B2 and B3 . The pointwise stabilizer
is
H(B1 ) = h(4 5 6), (7 8 9)i.
We point out that H is not the only subgroup of S9 that has {1, 2, 3} as a block. The cyclic
subgroup h(1 4 7)(2 5 8)(3 6 9)i of order 3 simply cycles through the blocks and has the same system of
blocks as H. The cyclic subgroup h(1 4 7 2 5 8 3 6 9)i of order 9 also has the same system of blocks.4
The following proposition gives a characterization of primitive groups. It relies on the notion of
a maximal subgroup. We call a subgroup H of a group G a maximal subgroup if for all subgroups
K with H ≤ K ≤ G, either K = H or K = G.
8.3. TRANSITIVE GROUP ACTIONS 399
Proposition 8.3.8
Let G act transitively on a set X. Then the action is primitive if and only if Gx is maximal
for all x ∈ X.
Proof. (⇐=) First, suppose that G has a nontrivial block B. Let x ∈ B. If g ∈ Gx , then gB ∩ B 6= ∅
so gB = B and thus g ∈ G{B} . Consequently, Gx ≤ G{B} ≤ G. Since B 6= X, we know that
G{B} 6= G. Furthermore, |B| = 6 1 so there exists another element y 6= x in the block B. Again, since
G acts transitively, there exists h ∈ G such that gx = y. Since y ∈ B and B is a block, hB ∩ B 6= ∅
so hB = B. Therefore, h ∈ G{B} but h ∈ / Gx . Therefore, Gx is a strict subgroup of G{B} . We have
shown that if the action is not primitive then there exists some x such that Gx is not a maximal
subgroup and the contrapositive gives us the desired implication.
(=⇒) Conversely, suppose that Gx is not maximal for some x ∈ X. Then let H be some subgroup
such that Gx H G. Let B be the orbit B = Hx. Since H − Gx 6= ∅, then the orbit B contains
more than one element, so |B| > 1. Assume that B = X. Then H would be transitive and hence
|X| = |G : Gx | = |H : Gx |, which is a contradiction since H G. Thus, B ( X. Now suppose that
gB ∩ B 6= ∅ for some g ∈ G. Let b ∈ gB ∩ B. Then there exist h1 , h2 ∈ H such that b = h1 x = gh2 x.
Therefore, h−1 −1
1 gh2 x = x so h1 gh2 ∈ Gx ≤ H. Consequently, we deduce that if gB ∩ B 6= ∅ then
g ∈ H so gB = B. Hence, B is a strict subset of X that is not a singleton set such that gB = B or
gB ∩ B = ∅. Hence, B is a nontrivial block. Again, we have proven the contrapositive of the desired
implication.
Consequently, we can rephrase the above proposition with the stronger result.
Corollary 8.3.9
A transitive group action (G, X, ρ) is primitive if and only if Gx is maximal for some x ∈ X.
Proof. Suppose that for some x the stabilizer Gx is not maximal and that Gx H G for some
subgroup H. Since the action is transitive, for all y ∈ X, there exists g ∈ G with y = gx. Then
Gy = Ggx = gGx g −1 so Gy gHg −1 G and Gy is not maximal. Hence, there exists x such that
Gx is maximal if and only if Gx is maximal for all x ∈ X.
Proposition 8.3.10
Let (G, X, ρ) be a transitive group action and let N E G. If N fixes a point x, then
N ≤ Ker ρ.
Proof. If x is fixed by N , then Nx = N . Let y ∈ X be arbitrary and let g ∈ G with y = gx. Then
by (8.5), Ny = gNx g −1 = gN g −1 = N . The result follows.
400 CHAPTER 8. GROUP ACTIONS
Proposition 8.3.11
Let (G, X, ρ) be a transitive group action and let N E G. Then
(1) the orbits of N form a system of blocks for G;
Proof. (1) Let O be an orbit of N action on X as a subset of G and define the set Σ = {gO | g ∈ G}.
By Exercise 8.2.12, all the orbits of H acting on X are of the form gO. Hence, Σ is the set of orbits
of N and, since G is transitive, Σ is a partition of X. Thus, Σ is a system of blocks for the action
of G on X.
(2) If G acts primitively, then the action has two systems of blocks, namely {{x} | x ∈ X} and
{X}. By part (1), the set of orbits of N must be one of these two options. In the first case, N
stabilizes all x ∈ X so N ≤ Ker ρ. In the second case, N is transitive.
(3) By part (1), if O and O0 are two orbits of N , then O0 = gO for some g ∈ G. Since N is a
normal subgroup, conjugation on N given by ψg (n) = gng −1 is an automorphism. Furthermore, the
function between H-orbits f : O → O0 defined by fg (x) = gx is a bijection. Then for all n ∈ N and
all x ∈ O,
ψg (n) · fg (x) = (gng −1 ) · (gx) = (gn)x = g(nx) = fg (nx).
In Example 8.3.3, we saw that the natural action of Z8 = hz | z 8 = 1i on {1, 2, . . . , 8} has two
nontrivial systems of blocks. They correspond directly to the proper (normal) subgroups of Z8 ,
namely hz 2 i and hz 4 i.
Proposition 8.3.11 generalizes the defining property of normal subgroups. Consider the action of
G on itself by left multiplication and let H ≤ G be any subgroup. Left multiplication is a transitive
action. The orbits of the action of H on G by left multiplication are precisely the right cosets Hg.
The right cosets form a system of blocks if and only if for all g, x ∈ G, we have x(Hg) = Hg or
x(Hg) ∩ Hg = ∅. If H E G, then x(Hg) = x(gH) = (xg)H = H(xg) so x(Hg) is another right
coset, thereby satisfying the requirements for a block. Conversely, if H 5 G, then there exists g ∈ G
such that gH 6= Hg. However, gH ∩ Hg is never the empty set since it contains g. Hence, H is not
a block. Therefore, a subgroup H is normal if and only if its right cosets form a system of blocks
under the action of left multiplication of G on itself.
Example 8.3.12 (Rigid Motions of the Cube). Consider again the group G of rigid motions
of a cube and its action on the set of vertices. (See Example 8.1.11 and Figure 8.2.) We know that
G∼ = S4 . Furthermore, S4 has only two normal subgroups, namely A4 and K = h(1 2)(3 4), (1 3)(2 4)i.
As rigid motions, A4 is generated by the rotations of degree 120◦ around the axes along long
diagonals. Note that these rotations all have order 3 and correspond to the 3-cycles in S4 . Expressed
as rigid motions, the subgroup K consists of the identity and the three rotations of 180◦ around the
three different axes that join centers of opposite faces.
It is easy to check that (using the labeling as in Figure 8.2) the orbits of A4 are {1, 3, 6, 8} and
{2, 4, 5, 7}. Interestingly enough, the orbits of K are also {1, 3, 6, 8} and {2, 4, 5, 7}. Geometrically,
these orbits correspond to two regular tetrahedra of vertices that are separated from each other by
a diagonal edge in a face. We also observe that no normal subgroup of G has for orbits the system
of blocks of long diagonals through the cube. 4
The above example shows that the converse to part (1) of Proposition 8.3.11 is not true. In other
words, given a transitive group action with a system of blocks Σ, there does not necessarily exist a
normal subgroup N whose orbits are the blocks in Σ.
8.3. TRANSITIVE GROUP ACTIONS 401
When N is a normal subgroup of G, it is always natural to consider the quotient group G/N .
Let Σ be the set of orbits of N in X. For all n ∈ N and all g ∈ G,
(gN ) · (N x) = (gN )x
is a transitive action of the quotient group G/N on the set orbits of N (which is the quotient set of
the equivalence class of the action of N on X). This action might not be trivial but we are led to
the following proposition.
Proposition 8.3.13
Let (G, X, ρ) be a transitive group action and let N E G. Then N has at most |G : N |
orbits and if |G : N | is finite, then the number of orbits of N divides |G : N |.
Proof. Since the action of G/N on Σ is transitive, the number of orbits of N , namely |Σ|, is less
than |G/N |. Let N x be one orbit. By the Orbit-Stabilizer Theorem
|G : N |
|Σ| = |(G/N ) : (G/N )N x | = .
|(G/N )N x |
Hence, |G : N | divides |Σ|.
Corollary 8.3.14
If (G, X, ρ) is a faithful, transitive, primitive group action such that no normal subgroup
H E G is transitive, then G is simple.
Example 8.3.16. We have already seen that the standard action of Sn on X = {1, 2, . . . , n} is
n-transitive. However, consider the standard action of An on X. We can show that it is (n − 2)-
transitive but not (n−1)-transitive. Let (a1 , a2 , . . . , an−2 ) and (b1 , b2 , . . . , bn−2 ) be two (n−2)-tuples
of distinct elements in X. Complete both to ordered n-tuples (a1 , a2 , . . . , an ) and (b1 , b2 , . . . , bn ) of
distinct elements. Consider the permutations
bi
if 1 ≤ i ≤ n − 2
σ(ai ) = bi and τ (ai ) = bn if i = n − 1
bn−1 if i = n.
They both map the (n − 2)-tuple (a1 , a2 , . . . , an−2 ) to (b1 , b2 , . . . , bn−2 ). However, τ = (bn−1 bn )σ so
either one or the other is even. Hence, An is (n − 2)-transitive on {1, 2, . . . , n}. On the other hand,
402 CHAPTER 8. GROUP ACTIONS
any subgroup of Sn that acts (n − 1)-transitively on {1, 2, . . . , n} also acts n-transitively. However,
An does not act n-transitively because the only permutation that maps the n-tuple (1, 2, . . . , n) into
(2, 1, 3, 4, . . . , n) is odd. Hence, An does not act (n − 1)-transitively. 4
Proposition 8.3.17
Let G act transitively on a set X. If r ≥ 2, the G acts r-transitively if and only if for all
x ∈ X, the stabilizer Gx acts (r − 1)-transitively on X − {x}.
Proof. First, suppose that G is r-transitive on X. Let (a1 , a2 , . . . , ar−1 ) and (b1 , b2 , . . . , br−1 ) be two
ordered (r − 1)-tuples in X − {x}. For all x ∈ X, there exists g ∈ G such that
Exercise 8.3.13 asks the reader to prove that every 2-transitive group action is primitive. Con-
sequently, every group action that is r-transitive with r ≥ 2 is primitive. Again, coupled with
Proposition 8.3.11, this result gives a strategy to prove that a group is simple. We will use this
strategy in Section 9.2.4 to prove the simplicity of an important family of groups.
7 6
1
3 5 4
16. Let F be a finite field of order pm for some prime p. Let G be the set of functions in Fun(F, F ) of the
form f (x) = αx + β such that α ∈ F − {0} and β ∈ F .
(a) Prove that G is a nonabelian group of order pm (pm − 1).
(b) Prove that the action of G on F via f · α = f (α) is faithful and transitive.
(c) Prove that G acts 2-transitively on F .
(d) Prove that G contains a normal subgroup of order pm that is abelian.
(e) Determine all the maximal subgroups of G.
17. Consider the elements in X = {1, 2, . . . , 7} and the diagram shown in Figure 8.7. The diagram is called
the Fano plane and arises in the study of finite geometries. Consider the subset L (called lines) of
P(X) whose elements are the subsets of size 3 depicted in the Fano plane diagram either as a straight
line or the circle. So for example {1, 2, 5} and {5, 6, 7} are in L. Let G be the largest subgroup of S7
that maps lines to lines, i.e., acts on the set L.
(a) Prove that the action of G on X is 2-transitive.
(b) Prove that |G| = 168.
(c) Prove that G is simple.
8.4
Groups Acting on Themselves
A fruitful area of investigation comes from considering ways in which groups can act on themselves
or act on their own internal structure. Properties of group actions combined with considering actions
of groups on themselves lead to new results about the internal structure of groups. Two natural
actions of a group G on itself are the action of left multiplication and the action of conjugation.
404 CHAPTER 8. GROUP ACTIONS
g · (h · x) = ghx = (gh) · x
and 1x = x for all x ∈ G. Furthermore, the action of left multiplication if faithful because σg (x) =
gx = x for all x ∈ G is the definition of g = 1.
If G is a finite group with |G| = n, we can label all the group elements as G = {g1 , g2 , . . . , gn }.
Then the left multiplication action corresponds to an injective homomorphism ρ : G → Sn , via
ρ(g) = τ where ggi = gτ (i) .
Example 8.4.1. Example 8.1.5 presented the standard action of Dn on the labeled set of vertices of
the regular n-gon. This action is always different from the action of Dn on itself by left multiplication.
To compare with that example, take n = 6 and label the elements in D6 with integers {1, 2, . . . , 12}
listed in the same order as
1, r, r2 , . . . , r5 , s, sr, . . . , sr5 .
Then the permutation representation of D6 has
ρ(r) = (1 2 3 4 5 6)(7 12 11 10 9 8)
ρ(s) = (1 7)(2 8)(3 9)(4 10)(5 11)(6 12). 4
The group action by left multiplication on itself leads immediately to a powerful result about
symmetric groups.
Proof. Since the permutation representation ρ : G → SG is injective, then by the First Isomorphism
Theorem, G ∼ = G/ Ker ρ ∼
= Im ρ.
Because of this important theorem, the action of G on itself by left multiplication is also called
the Cayley representation.
Cayley’s Theorem is valuable for computational reasons. It is difficult to devise algorithms that
perform group operations for an arbitrary group. However, it is easy to devise algorithms to perform
the group operation in Sn . Cayley’s Theorem guarantees that a group G can be embedded into some
Sn , which reduces computations in G to computations in some corresponding Sn .
Since the action of G on itself by left conjugation is transitive, we are inspired to consider the
possibility of blocks in this action. It is not hard to see that for any subset H ≤ G, the set of left
cosets forms a system of blocks for this action. This is because for any left coset, xH the product
g(xH) = (gx)H is another left coset. The set of left cosets of H forms a partition of G, so the set
of left cosets of H forms a system of blocks in this action. However, the converse holds.
Proposition 8.4.3
Let a group G act on itself by left conjugation. A set Σ of subsets of G is a system of blocks
for this action if and only if Σ is the set of left cosets of some subgroup H ≤ G.
Proof. We have already shown one direction. We now assume that Σ is a system of blocks for this
action. Let H ∈ Σ be the block that contains the identity 1. Note that the set g · H contains the
element g. Hence, if g · H = H then g ∈ H since 1 ∈ H. Conversely, suppose that g ∈ H. Then
g · H is a block that contains the element g · 1 = g. Since g · H ∩ H 6= ∅, we deduce that g · H = H.
8.4. GROUPS ACTING ON THEMSELVES 405
Furthermore, 1 · g = 1g1−1 = g. This shows that conjugation satisfies the axioms of a group action.
The permutation representation is a homomorphism ρ : G → SG . We have already seen that for
each g ∈ G, the function ψg (x) = gxg −1 is an automorphism on G so ρ(G) ≤ Aut(G) ≤ SG . In
fact, we called the image subgroup ρ(G), the group of inner automorphisms of G and denote it by
Inn(G). (See Exercise 3.7.38.)
This action is not faithful in general. The kernel of the action is
Hence, the action of G on itself by conjugation is faithful if and only if Z(G) = {1}.
Note that if A is a subset of a group G, then G might not necessarily act on A by conjugation.
Indeed, gag −1 might not be in A for some a ∈ A. However, the normalizer NG (A) is the largest
subgroup of G that acts on A by conjugation.
If G is not the trivial group, then the action of G on itself by conjugation is not a transitive
action. The orbits are the conjugacy classes of G. The orbit equation for this action turns out to
lead to another identity pertaining to the internal structure of a group that we could not get without
the formalism of group actions.
The fixed elements (singleton orbits) in the action by conjugation are precisely the elements in
the center Z(G). Now suppose that x ∈ / Z(G). The stabilizer of x is
Gx = {g ∈ G | gxg −1 = x} = CG (x)
so by the Orbit-Stabilizer Theorem, the conjugacy class of x has order |G : CG (x)|. This shows the
surprising result that the cardinality of every conjugacy must divide |G|. Furthermore, by grouping
the subset of fixed elements as a single term, the Orbit Equation immediately gives the following
result.
As another example of a group action relevant to group theory, let G be a group and consider
the associated group of automorphisms, Aut(G). Of course, Aut(G) acts on G by ψ · g = ψ(g) but
Aut(G) also acts on the set of subgroups Sub(G) with the pairing Aut(G) × Sub(G) → Sub(G)
defined by ψ · H = ψ(H). Recall the concept of a characteristic subgroup of a group G: a subgroup
H such that ψ(H) = H for all automorphisms ψ ∈ Aut(G). So a characteristic subgroup of G is a
subgroup that remains unchanged by the action of Aut(G) on Sub(G) as we just described.
We can contrast the notion of characteristic subgroup to a normal subgroup by the fact that
normal subgroups are the subgroups that remain unaffected by the action of Inn(G), the subgroup
of Aut(G) that consists of automorphisms of the form ψg , where ψg (x) = gxg −1 .
406 CHAPTER 8. GROUP ACTIONS
z · (g1 , g2 , . . . , gp ) = (g2 , g3 , . . . , gp , g1 ).
An element in X that is fixed by the action of H = Zp on X has the form (g, g, . . . , g). Such elements
have the property that g p = 1. Let us suppose that there are r such fixed elements in X. If an
element x ∈ X is not fixed by the action of Zp , then the stabilizer Hx is a proper subgroup of H,
which implies that the stabilizer is trivial and that the orbit Hx has cardinality |Hx| = |H : Hx | = p.
Let us suppose that there are s such nontrivial orbits in X. The Orbit-Stabilizer Theorem implies
that
r + sp = |G|p−1 .
Since p divides |G|, then p divides r. We know that (1, 1, . . . , 1) is a fixed point of the Zp action
on X so r ≥ 1. Since r is a nontrivial multiple of p, there are at least p − 1 more elements g ∈ G
satisfying g p = 1. All such elements have order exactly p.
We have seen that Lagrange’s Theorem does not have a full converse, in the sense that if d
divides G, there does not necessarily exist a subgroup H ≤ G such that |H| = d. However, Cauchy’s
Theorem gives a partial converse in the sense that if d is a prime number dividing |G|, then there
exists a subgroup H ≤ G such that |H| = d.
Maple 16 Function
with(group); Loads the group theory package. (Many commands.)
permgroup(n,gens); Defines a subgroup of Sn using the list gens of generators.
grouporder(G); Calculates the order of a permutation group G.
groupmember(s,G): Tests whether the permutation s is in the permutation group G.
Starting with version 17, Maple included a much larger group theory package. We only give a
few elementary commands and invite the reader to explore the package further.
8.4. GROUPS ACTING ON THEMSELVES 407
Maple 17 Function
with(GroupTheory); Loads the (new) group theory package. (Many commands.)
Perm(list); Defines a permutation given a list that represents the cycle type.
PermProduct(a,b); Calculates the product of the permutations.
Group(list); Given a few permutations as created from the previous command,
creates the subgroup generated by that list of permutations.
GroupOrder(G); Calculates the order of any group, defined as a permutation group
or some other way.
IsTransitive(G); Returns true or false depending on whether G, as a subgroup of
Sn , acts transitively on {1, 2, . . . , n}.
15. Let p be prime. Use Exercise 8.4.14 to prove that every group of order p2 is abelian. In particular, if
|G| = p2 , then G is isomorphic to Zp2 or Zp ⊕ Zp .
16. If G is a p-group and H is a proper subgroup, show that the normalizer NG (H) properly contains H.
[Hint: Use Exercise 8.4.14.]
17. Let G be a group. Show that the pairing (G ⊕ G) × G → G defined by (g, h) · x = gxh−1 is an action
of G ⊕ G on G. Show that the action is transitive. Also determine the stabilizer of the identity 1.
18. (Cauchy’s Theorem) The original proof to Cauchy’s Theorem did not use the group action described
in the proof we gave, but it relied instead on the Class Equation. Let p be a prime that divides the
order of a finite group G.
(a) Prove Cauchy’s Theorem for finite abelian groups. [Hint: Use induction on |G|.]
(b) Prove Cauchy’s Theorem for finite nonabelian groups by induction on |G| and using the Class
Equation.
19. Suppose that G is a finite group with m conjugacy classes. Show that the number of ordered pairs
(x, y) ∈ G × G such that yx = xy is equal to m|G|.
8.5
Sylow’s Theorem
Sylow’s Theorem is a partial converse to Lagrange’s Theorem in that it states that a group has a
subgroup of a certain order. Sylow’s Theorem leads to a variety of profound consequences for the
internal structure of a group simply based on its order. Therefore, it also provides vital information
in the classification problems (Section 9.4)—theorems that decide what groups exist of a given order.
We present Sylow’s Theorem in this section because it follows from a clever application of a
group action on certain sets of subgroups within the group.
Example 8.5.1. Before presenting the necessary group action and proving the theorem, we illus-
trate Sylow’s Theorem with an example. Consider the group G = S6 . Obviously, |G| = 720 =
24 · 32 · 5. Sylow Theorem will guarantee us that G has a subgroups of order 16, 9, and 5. The
theorem also gives us a condition on how many of such subgroups G has. That such subgroups exist
is not immediately obvious.
• Finding a subgroup of order 5 is easy. Indeed, h(12345)i works. There are 65 4!/4 = 36
• Finding a subgroup of order 9 = 32 is not hard either. H = h(123), (456)i works. In fact, since
there are no 9-cycles in S6 , every subgroup of order 9 must be isomorphic to Z3 ⊕ Z3 . Such
subgroups must be generated by two nonoverlapping 3-cycles. It is easy to show that there
are 21 63 = 10 such subgroups.
It is not hard to show that there are 45 different subgroups of this form. 4
8.5. SYLOW’S THEOREM 409
Definition 8.5.2
Let G be a group and p a prime.
• A group of order pk for some k ∈ N∗ is called a p-group. Subgroups of G that are
p-groups are called p-subgroup.
If p is a prime that does not divide |G|, then the notion of a p-subgroup is not interesting.
However, to be consistent with notation, if p does not divide |G| then trivially Sylp (G) = {h1i} and
np (G) = 1.
Proof. We use (strong) induction on the size of G. If |G| = 1, there is nothing to do, and the
theorem is satisfied trivially. Assume that the theorem holds for all groups of size strictly less than
n. We prove that the theorem holds for all groups G with |G| = n.
Let p be a prime and assume that |G| = pk m with p - m. If p divides Z(G), then by Cauchy’s
Theorem Z(G) contains an element of order p and hence a subgroup N of order p. But then the
group |G/N | = pk−1 m is smaller than G, hence by the induction hypothesis, G/N contains a Sylow
p-subgroup P̄ of order pk−1 . By the Fourth Isomorphism Theorem, there exists a subgroup P of G
such that P̄ = P/N . Then |P | = pk and hence G contains a Sylow p-subgroup.
We are reduced now to the case where p - |Z(G)|. Consider the Class Equation (Proposi-
tion 8.4.4),
Xr
|G| = |Z(G)| + |G : CG (gi )|
i=1
where {g1 , g2 , . . . , gr } is a complete list of distinct representatives of the nontrivial conjugacy classes.
Since p divides |G| but not |Z(G)|, there exists some gi0 such that p does not divide |G : CG (gi0 )|.
Then CG (gi0 ) has order pk ` where p - `. Thus, again by strong induction, CG (gi0 ) has a Sylow
p-subgroup of order pk , which is a subgroup of G.
Before establishing the rest of Sylow’s Theorem, we require two lemmas. The first gives a property
of Sylow-p subgroups concerning intersections with other p-subgroups.
Lemma 8.5.4
Let P ∈ Sylp (G). If Q is any p-subgroup of G, then Q ∩ NG (P ) = Q ∩ P .
SP = {gP g −1 | g ∈ G} = {P1 = P, P2 , . . . , Pr }.
By definition of orbits, G acts transitively on SP . Let H be any subgroup of G. It also acts on
SP by conjugation but perhaps not transitively. Then under the action of H, the set SP may
get partitioned into s(H) distinct orbits {O1 , O2 , . . . , Os(H) }, where s(H) is a positive integer that
depends on H. Obviously, r = |O1 | + |O2 | + · · · + |Os(H) |. The Orbit-Stabilizer Theorem applied to
the action of H on SP states that if Pi is any element in the orbit Oi , then
|Oi | = |H : NH (Pi )|. (8.6)
If H happens to be another p-subgroup, this formula simplifies.
Lemma 8.5.5
Let P ∈ Sylp (G) and let Q be any p-subgroup of G. Suppose that Q acts on the orbit SP
by subgroup conjugation. If the orbit of some Pi ∈ SP is Oi , then
|Oi | = |Q : Q ∩ Pi |.
np ≡ 1 (mod p).
Proof. By Theorem 8.5.3, we know that Sylp (G) is nonempty. Let P be a Sylow p-subgroup of G
and let SP be the orbit of P in the action of G acting on the set of subgroups of G by conjugation.
Let r = |SP |.
We first show that r ≡ 1 (mod p) as follows. Apply Lemma 8.5.5 with Q = P itself. Then
O1 = {P } so |O1 | = 1. Then for all integers i with 1 < i ≤ s(P ), the orbit Oi satisfies
|Oi | = |P : Pi ∩ P |,
which is divisible by p. Thus,
r = |O1 | + |O2 | + · · · + |Os(P ) | ≡ 1 (mod p).
8.5. SYLOW’S THEOREM 411
We prove by contradiction that the action of G by conjugation on Sylp (G) is transitive. As above,
let P be an arbitrary Sylow p-subgroup. Suppose that there exists a Sylow p-subgroup P 0 that is
not conjugate to P . Now consider the action of P 0 on SP by conjugation and apply Lemma 8.5.5
with Q = P 0 .
Then for 1 ≤ i ≤ s(P 0 ), the p-group P 0 ∩ Pi is a strict subgroup of P 0 so by
|Oi | = |P 0 : P 0 ∩ Pi | > 1.
Thus, p divides all |P 0 : P 0 ∩ Pi | which implies that p divides r. Since r ≡ 1 (mod p), we have a
contradiction. Thus, we conclude that there does not exist a Sylow p-subgroup that is not conjugate
to P . These results prove (1) and (2).
For part (3), notice that since the action of G on Sylp (G) by conjugation is transitive, then
r = np so np ≡ 1 (mod p). Also, the Orbit-Stabilizer Theorem tells us that np = |G : NG (P )|. Then
by the chain of subgroups
P ≤ NG (P ) ≤ G,
we deduce that
pk m = |G| = |G : NG (P )| |NG (P ) : P | |P | = np |NG (P ) : P |pk
so np divides |G : P | = m.
By parts (3), np = 1 means that the one Sylow p-subgroup P has NG (P ) = G so it is a normal
subgroup. (Also, in Exercise 4.2.12, we saw that if there is only one subgroup of a given order,
then that subgroup is normal.) In particular, if np = 1 for some prime p that divides |G|, then we
immediately conclude that G is not simple. This result often gives a quick way to determine that
no group of a certain order is simple. The following example illustrates this.
Part 1 of Theorem 8.5.6 implies that for a given prime p, all Sylow p-subgroups are conjugate to
each other. This implies that they are all isomorphic to each other.
Example 8.5.7. We revisit the Example 8.5.1 discussing S6 provided as a motivation at the begin-
ning of this section. Sylow’s Theorem affirms that there exist subgroups of order 16, 9, and 5. Since
for a given p, all Sylow p-subgroups are isomorphic, then every Sylow p-subgroup is isomorphic to
the Sylow p-subgroups that we illustrated for p = 2, p = 3, and p = 5. We had determined that
(1) there are 36 Sylow 5-subgroups, which conforms to n5 ≡ 1 (mod 5); (2) there are 10 Sylow
9-subgroups, which conforms to n3 ≡ 1 (mod 3); (3) there are 45 subgroups of order 16, which
conforms to n2 ≡ 1 (mod 2). 4
Example 8.5.8. Consider groups of order 385. Note that 385 = 5 · 7 · 11. By part (2), n11 ≡ 1
(mod 11) and while by part (3) we also have n11 | 35. The divisors of 35 are 1, 5, 7, and 35. The
only divisor of 35 that satisfies both conditions is n11 = 1. Hence, every group of order 385 has a
normal subgroup of order 11.
If we continue similar analysis with the other prime factors of 385, we notice that n7 ≡ 1 (mod 7)
and n7 | 55. Again, the only possibility is n7 = 1 so groups of order 385 must also possess a normal
subgroup of order 7. However, for the prime p = 5, the conditions give n5 ≡ 1 (mod 5) and n5 | 77.
Here, we have two possibilities, namely that n5 = 1 or 11. 4
The situation in which np (G) = 1 is particularly important for determining the structure of the
group G. We already commented that np (G) = 1 implies that G has a normal Sylow p-subgroup.
However, the converse is also true.
Proposition 8.5.9
Let P be a Sylow p-subgroup of a group G. The following are equivalent:
(1) np (G) = 1;
(2) P E G;
(3) P is a characteristic subgroup of G.
412 CHAPTER 8. GROUP ACTIONS
Proof. (1) =⇒ (3): Since np (G) = 1, there is only one Sylow p-subgroup P of G. Every auto-
morphism ψ ∈ Aut(G) maps subgroups of G back into subgroups of the same cardinality. Hence,
ψ(P ) = P so P is characteristic.
(3) =⇒ (2): Follows from the fact that conjugation by any g ∈ G is an automorphism of G.
(2) =⇒ (1): By part 1 of Sylow’s Theorem, the action of G on Sylp (G) is transitive. Since
gP g −1 = P for all g ∈ G, then Sylp (G) = {P } and np (G) = 1.
Example 8.5.10 (Groups of Order pq). Let G be a group of order pq where p and q are primes.
As an application of the Class Equation, we saw in Exercise 8.4.15 that groups of order p2 are
isomorphic either to Zp2 or to Zp ⊕ Zp . We assume for now on that p < q.
Consider the center of the group Z(G) and, in particular, its order |Z(G)|. If |Z(G)| = pq, then
the group is abelian. By the FTFGAG, we know that G ∼ = Zpq .
In Exercise 4.3.21, we saw that if G/Z(G) is cyclic, then G is abelian. If |G| = pq with p 6= q,
then we cannot have |Z(G)| = p or q because otherwise G/Z(G) would be isomorphic to Zq or Zp
respectively, making G abelian and |Z(G)| = pq, contradicting |Z(G)| = p or q.
Now assume that |Z(G)| = 1. By Sylow’s Theorem, nq = 1 + kq (with k ≥ 0) and nq divides
|G|/q = p. However, if k > 0, then nq > q > p which contradicts nq | p, so we must have nq = 1.
Therefore, G contains one subgroup Q ≤ G of order q and it is normal. Similarly, np ≡ 1 (mod p)
and np must divide q. This leads to two cases.
Let us first suppose that p - (q − 1). Then we must have np = 1 and so G has a normal subgroup
P of order p. Then by the Direct Sum Decomposition Theorem (Theorem 4.3.12), G ∼ = P ⊕ Q, so
G∼ = Zp ⊕ Zq ∼ = Zpq . Hence, G is abelian again, contradicting Z(G) = {1}.
Now suppose that p | (q − 1). Then, a priori, it is possible for np > 1. We now provide a
constructive proof of the existence of a nonabelian group of order pq. Let x be a generator of Q,
so of order q. Also, by Cauchy’s Theorem, G has an element y of order p. Then hyi is a Sylow
p-subgroup and all Sylow p-subgroups are conjugate to P = hyi. By Corollary 4.2.10, P Q ≤ G and
hence P Q = G by Lagrange’s Theorem since |P Q| > q. The subgroup P acts by conjugation on Q
and this action defines a homomorphism of P into Aut(Q). Note that Aut(Q) ∼ = Aut(Zq ) ∼= U (q),
the multiplicative group of units in Z/qZ. (See Exercise 3.7.40.)
Proposition 7.5.2 establishes that U (q) is a cyclic group and hence as a generator of order q − 1.
Since Q is cyclic, with generator x, automorphisms on Q are determined by where they map the
generator, ψk (x) = xk , where gcd(k, q − 1) = 1. Let a be a positive integer such that ψa (x) = xa
has order q − 1. If d = (q − 1)/p, then ψad = ψad has order p. Then the action of conjugation of P
on Q determined by the homomorphism P → Aut(Q) given by y 7→ ψad is a nontrivial action. This
8.5. SYLOW’S THEOREM 413
hx, y | xq = y p = 1, yxy −1 = xα i
where α and β both are elements of order p in the multiplicative group U (q). Now in cyclic groups
there exists a unique subgroup of any given order. Thus, hβi is the unique subgroup of order p in
U (q) and α ∈ hβi so α = β c for some integer 1 ≤ c ≤ p − 1. Consider a function ϕ : G1 → G2 that
maps ϕ(x) = g and ϕ(y) = hc . Obviously, g q = 1 and (hc )p = 1, but also
(3) if p 6= q and p | (q − 1), then G is isomorphic to Zpq or the unique nonabelian group of order
pq. 4
Example 8.5.11 (Groups of Order 39). As a specific illustration of the previous classification
result, consider n = 39 = 3 · 13. The group Z39 is the only abelian group of order 39. Note that
3 | (13 − 1) so by the previous example there also exists a nonabelian group of order 39. The cyclic
group U (13) is generated by 2 because for example in modular arithmetic base 13, the powers of 2
are: 2, 4, 8, 3, 6, 12, 11, 9, 5, 10, 7, 1 . . . We have d = (13 − 1)/3 = 4, so, using a = 2, we have α = ad
mod 13 = 24 mod 13 = 3. The sequence of powers of 3 is 1, 3, 9, 1, 3, 9, . . . As a presentation, the
group
G = hx, y|x13 = y 3 = 1, yxy −1 = x3 i
is a nonabelian group of order 39. It is possible to construct this example even more explicitly as
a subgroup of S13 . Let σ = (1 2 3 . . . 13) and let τ be the permutation such that τ στ −1 = σ α = σ 3 .
Since σ 3 = (1 4 7 10 13 3 6 9 12 2 5 8 11), by Example 4.2.13, we find that the appropriate permuation
τ is τ = (2 4 10)(3 7 6)(5 13 11)(8 9 12). Then hσ, τ i is isomorphic to this nonabelian group of order
39. 4
Example 8.5.12 (Groups of Order 30). We now prove that by virtue of |H| = 30, H must have
a normal (and hence unique) Sylow 5-subgroup and Sylow 3-subgroup. Let Q1 ∈ Syl3 (H) and let
Q2 ∈ Syl5 (H). If either Q1 or Q2 are normal in H then Q1 Q2 is a subgroup of H of order 15. Since
15 is half of 30, then Q1 Q2 E H. Since Q1 and Q2 are characteristic subgroups of Q1 Q2 by the
Corollary to Sylow’s Theorem, then Q1 and Q2 are both normal subgroups of H. Therefore, we have
proven that either both Q1 and Q2 are normal in H or neither are. If neither are, then n3 (H) = 10
and n5 (H) = 6. But this would lead to 10 · 2 + 6 · 4 = 44 elements of order 3 or 5 whereas the group
H has only 30 elements. This is a contradiction. Hence, both Q1 and Q2 are normal in H. 4
The last example illustrates how counting elements of a given order may allow us to gain more
information beyond that given immediately from Sylow’s Theorem.
414 CHAPTER 8. GROUP ACTIONS
Example 8.5.13 (Groups of Order 105). Let G be a group of order 105 = 3 · 5 · 7. Using the
criteria of Sylow’s Theorem, we find that n3 (G) = 1 or 7, that n5 (G) = 1 or 21 and that n7 (G) = 1
or 15. So by divisibility considerations, it would appear that it would be possible for G not to
have any normal Sylow p-subgroups. However, that is not the case. Assume that n3 (G) = 7, that
n5 (G) = 21, and that n7 (G) = 15. Each Sylow 5-subgroup would contain 4 elements of order 5 and
these subgroups would intersect pairwise in the identity (since they are distinct cyclic subgroups of
prime order). Hence, n5 (G) = 21 accounts for 4 × 21 = 84 elements of order 5. By a same reasoning,
if n7 (G) = 15 then G contains 15 distinct cyclic subgroups of order 7, which accounts for 6 × 15 = 60
elements of order 7. However, this count gives us already 84 + 60 = 144 elements of order 5 or 7 but
this number is already greater than the order of the group, 105. Hence, every group of order 105
must contain a normal subgroup of order 5 or a normal subgroup of order 7. 4
Example 8.5.14 (Groups of Order 2115). Let G be a group of order 2115 = 32 · 5 · 47. It is
easy to see that the conditions n47 ≡ 1 (mod 47) and 47 | 45 imply that n47 = 1. Hence, G must
contain a normal subgroup N of order 47. However, because of the numerical relationships in this
case, more can be said about N . Consider the action of G on N by conjugation. Since N is normal,
this conjugation engenders a homomorphism ψ : G → Aut(N ). However, Aut(N ) = Aut(Z47 ),
which has order 46. But gcd(46, 2115) = 1 so the only homomorphism ψ : G → Aut(N ) is the
trivial homomorphism. Thus, the action of conjugation of G on N is trivial and we conclude that
N commutes with all of G so N ≤ Z(G). 4
(b) Use Sylow’s Theorem to conclude that 21 2p ((p − 2)!)2 ≡ 1 (mod p).
p
9. Show that a group of order 418 has a normal subgroup of order 11 and a normal subgroup of order
19.
10. Prove that there is no simple group of order 225.
11. Prove that there is no simple group of order 825.
12. Prove that there is no simple group of order 2907.
13. Prove that there is no simple group of order 3124.
14. Prove that there is no simple group of order 4312.
15. Prove that there is no simple group of order 132.
16. Prove that there is no simple group of order 351.
17. Prove that a group of order 273 has a normal subgroup of order 91.
18. Prove that if |G| = 2015, then G contains a normal subgroup of order 31 and subgroup of order 13 in
Z(G).
19. Prove that if |G| = 459, then G contains a Sylow 17-subgroup in Z(G).
20. Prove that every group of order 1001 is abelian.
21. Prove if |G| = 9163, then G has a Sylow 11-subgroup in Z(G).
8.6. A BRIEF INTRODUCTION TO REPRESENTATIONS OF GROUPS 415
22. How many elements of order 7 must exist in a simple group of order 168?
23. Prove that np (G) = 1 is equivalent to the property that all subgroups of G generated by elements of
order p are p-subgroups.
24. Let p be an odd prime. Show that every group of order 2p is isomorphic to Z2p or to Dp .
25. Suppose that |G| = pm where p is a prime and p - m. Prove that gcd(p, m − 1) = gcd(p − 1, m) = 1 if
and only if G has a normal Sylow p-subgroup in Z(G).
26. Suppose that for every prime p dividing |G|, the Sylow p-subgroups are nonabelian. Prove that |G| is
divisible by a cube.
27. Suppose that |G| = p2 q 2 with p and q distinct primes. Prove that if p - (q 2 − 1) and q - (p2 − 1), then
G is abelian.
28. Suppose that H is a subgroup of G such that gcd(| Aut(H)|, |G|) = 1. Prove that NG (H) = CG (H).
29. Prove that if N E G, then np (G/N ) ≤ np (G).
30. Let P be a normal Sylow p-subgroup of a group G and let H ≤ G. Prove that P ∩ H is the unique
Sylow p-subgroup of H.
31. Let P ∈ Sylp (G) and let N E G. Prove that P ∩ N ∈ Sylp (N ). Prove also that P N/N is a Sylow
p-subgroup of G/N .
32. Let G1 and G2 be two groups, both of which have orders divisible by a prime p. Prove that all Sylow
p-subgroups of G1 ⊕ G2 are of the form P1 ⊕ P2 , where P1 ∈ Sylp (G1 ) and P2 ∈ Sylp (G2 ).
33. Let G be a finite group and let M be a subgroup such that NG (P ) ≤ M ≤ G for some Sylow p-subgroup
P . Prove that |G : M | ≡ 1 (mod p).
34. Let p be a prime dividing |G|. Prove that the intersection of all Sylow p-subgroups is the largest
normal p-subgroup in G.
8.6
A Brief Introduction to Representations of
Groups
At first pass, representation theory of finite groups is the study of how to represent finite groups
using matrices in GLn (F ), where F is a field. This addresses the goal, mentioned in the book’s
preface, of conveniently describing groups. Indeed, we understand matrix multiplication so if every
group can be represented by a subgroup of some GLn (F ) then we can study properties of specific
group elements from this perspective. In this sense, representation theory is not unlike group actions,
in which faithful actions provide permutation representations of a group.
More precisely, in a representation of group G, the group acts on a vector space V (over a field F ),
but in which group elements act not just as bijections on V but as invertible linear transformations.
This requirement brings together the group structure with the structure of a vector space in a way
that uncovers interesting results for both theories.
Like group actions, the collection of representations of a group G has all the characteristics of an
algebraic structure. We sometimes generically call representations of a group an action structure,
in which one structure acts on another algebraic structure. As we will see in Chapter 10, there are
many fruitful and interesting action structures from rings, including representations of rings.
The representation theory of groups is broad subfield of algebra and often stands as a course
in its own right. In this section, we introduce the notion of representations of groups, give some
examples, show how they provide another example of an algebraic structure, and then illustrate
some of the interesting interplay between group theory and linear algebra by establishing two key
reducibility results. In so doing, we hope to whet the reader’s appetite for more.
416 CHAPTER 8. GROUP ACTIONS
With respect to the standard basis of R2 , these matrices correspond respectively to the rotation of
angle 2π/n about the origin and the reflection through the x-axis. The homomorphism ϕ is what
we intend by a representation of Dn . We will call this the standard representation of Dn .
Definition 8.6.1
Let V be a vector space over a field F . The group of invertible linear transformations from
V to V is called the general linear group on V and is denoted by GL(V ).
Definition 8.6.2
A representation of a group G is a homomorphism ρ : G → GL(V ) for some vector space
V over a field F . If dim V = n, then we sometimes call ρ a representation of G over F of
degree n.
Example 8.6.3. Consider the presentation of Q8 described in Exercise 3.8.5. Consider a function
ϕ : Q8 → GL2 (C) such that
i 0 0 −1
ϕ(i) = and ϕ(j) = .
0 −i 1 0
It is not hard to show that this function satisfies the hypotheses of the Generator Extension Theorem
so ϕ extends to a homomorphism from all of Q8 . 4
By now, the reader should be able to define the kernel of a representation for him- or herself.
Definition 8.6.4
Let ρ : G → GL(V ) be a representation of the group G. The kernel of ρ is Ker ρ = {g ∈
G | ρ(g) = idV }. If Ker ρ = {1}, then the representation is called faithful .
Example 8.6.5 (Trivial Representation). Let G be a group and let V be any vector space over
any field F . The trivial homomorphism ρ : G → GL(V ) that maps all group elements to the identity
linear transformation from V to V is called the trivial representation. This representation is the
opposite of faithful. 4
Example 8.6.6 (Regular Representation). Let F be a field and G a group. The group ring
(see Section 5.2.3) F [G] has the structure of a vector space over F with the following addition and
scalar multiplication operators:
X X X
ag g + bg g = (ag + bg )g ,
g∈G g∈G g∈G
X X
r ag g = (rag )g .
g∈G g∈G
def
σ · ~a = (aσ−1 (1) , aσ−1 (2) , . . . , aσ−1 (n) ).
This action of Sn on F n has the effect of making the ith basis vector the σ(i)th basis vector. For
example, if ϕ is the standard representation of S5 on R5 , then with respect to the standard basis,
0 1 0 0 0
0 0 0 1 0
ϕ((1 2 4)(3 5)) =
0 0 0 0 1 .
1 0 0 0 0
0 0 1 0 0 4
418 CHAPTER 8. GROUP ACTIONS
Definition 8.6.9
Let V be a representation of a group G. A subrepresentation of V is a subspace W such
that for all g ∈ G and all w ∈ W , we have gw ∈ W . A subrepresentation of V is also called
a G-invariant subspace of V .
Definition 8.6.10
A representation V of a group G is called irreducible if V 6= {0} and if V has no G-
subrepresentations besides the subspaces {0} and itself.
Example 8.6.12. Following up the previous example, consider the regular representation of S3
described in Example 8.6.6. Let W be the invariant subspace
W = {~v ∈ R6 v1 + v2 + · · · + v6 = 0}
8.6. A BRIEF INTRODUCTION TO REPRESENTATIONS OF GROUPS 419
described in the previous example. Consider the ordered basis B for W with vectors
1 0 0
−1 1 0
0 −1 0
~u1 = ,
~u2 = ,
··· 0 .
~u5 =
0 0
0 0 1
0 0 −1
It is not hard to show that ρ(1 2 3) maps ~u1 , ~u2 , . . . , ~u5 respectively to
0 −1 1 0 0
1 0 0 0 0
−1 1 0 0 0
, , , , ,
0 0 0 −1 1
0 0 0 0 −1
0 0 −1 1 0
which are respectively
~u2 , −~u1 − ~u2 , ~u1 + ~u2 + ~u3 + ~u4 + ~u5 , −~u4 − ~u5 , ~u4 .
Therefore, with respect to the ordered basis B on W , the matrix of ρW ((1 2 3)) is
0 −1 1 0 0
1 −1 1 0 0
0 0 1 0 0 .
[ρW ((1 2 3))]B =
0 0 1 −1 1
0 0 1 −1 0 4
We can give a characterization of subrepresentations in terms of matrices. Let V be a represen-
tation of finite dimension n with homomorphism ρ : G → GL(V ). A subspace W of V with dim W
is a subrepresentation of V if and only if V has an ordered basis B = {v1 , v2 , . . . , vn } such that
B1 = (v1 , v2 , . . . , vm ) is an ordered basis of W and for all g ∈ G, with respect to the basis B,
[ρW (g)]B1 A(g)
[ρ(g)]B = , (8.7)
0 B(g)
Definition 8.6.13
Let G be a group and let V and W be two representations of G. A (representation)
homomorphism from V to W is a linear transformation T : V → W such that
In this definition, the action g ·T (v) involves the action of G on W . More precisely, this definition
states that if ρV : G → GL(V ) is a representation of G and if ρW : G → GL(W ) is another
representation of G, then for all g ∈ G,
T ◦ ρV (g)(v) = ρW (g) ◦ T (v).
We depict this function relationship with the following diagram.
420 CHAPTER 8. GROUP ACTIONS
ρV (g)
V V
T T
W W
ρW (g)
A function diagram of sets (resp. groups, rings, vector spaces, etc.) is a directed graph in which
every vertex corresponds to a set (resp. group, ring, vector space, etc.) and where every directed
edge corresponds to a function (homomorphism, ring homomorphism, linear transformation, etc.).
A directed path of arrows (directed paths) corresponds to the composition of functions associated
to the directed edges involves. We call a function diagram commutative if any two directed paths
from a domain to a codomain correspond to equal compositions. In particular, the above diagram
is commutative because T ◦ ρV (g) = ρW (g) ◦ T .
Example 8.6.14. Consider the regular representation of S3 into R6 as described in Example 8.6.6.
Consider also the function ϕ : R6 → R defined by
ϕ(~x) = x1 + x2 + · · · + x6 .
This is a linear transformation. As pointed out in Example 8.6.12, in the regular representation,
the sum of the coordinates of a vector does not change under the action of S3 on R6 . Thus, for all
σ ∈ S3 ,
ϕ(σ~x) = ϕ(~x) = σϕ(~x),
assuming that R is equipped with the trivial representation structure. Thus, if V is the regular
representation of S3 over the field R and if U is the trivial representation of dimension 1 over R,
then ϕ : V → U is a group representation homomorphism. 4
The reader may notice that the subspace W described in Example 8.6.12 is the kernel of the group
representation homomorphism described in Example 8.6.14. That the kernel of a homomorphism is
a subrepresentation should not come as a surprise. Indeed, a similar result holds with vector spaces,
groups, rings, and other algebraic structures.
Proposition 8.6.15
Let G be a group and let ϕ : V → W be a homomorphism of representations of G.
(1) The kernel Ker ϕ is a subrepresentation of V .
(2) The image Im ϕ is a subrepresenation of W .
In all algebraic structures, two given objects are considered the same if there exists an invertible
homomorphism between them. Again, we define this notion for representations of a group G.
Definition 8.6.16
Let G be a group. An isomorphism between two G-representations V and W is a homo-
morphism of representations that is also a bijection between V and W . If there exists an
isomorphism of representation between V and W , then V and W are called isomorphic or
equivalent representations and we write V ∼
= W.
8.6. A BRIEF INTRODUCTION TO REPRESENTATIONS OF GROUPS 421
It is not hard to show that these matrices satisfy the same relations as r and s in D8 . Therefore,
by the Generator Extension Theorem, the mapping ρ extends to a unique homomorphism, thereby
defining a representation of D8 . It turns out that this representation is equivalent to the standard
representation of D8 on R2 because
√ √ √2 √ !
2
−1
2√ 2 − 52√ 2 2 1 √2 −√ 2 2 1
= 2 2
2 − 2 1 1
2 2
1 1
and −1
3 −4 2 1 1 0 2 1
= .
2 −3 1 1 0 −1 1 1 4
Consequently, ρ2 (r) and ρ2 (s) satisfy the relations of r and s and hence they determine a homomor-
phism from D4 to GL2 (C).
The two representations ρ and ρ2 are equivalent because they both map D4 into the same general
linear group GL2 (C) and because
−1
0 −1 i−1 −i − 1 i 0 i−1 −i − 1
=
1 0 i+1 −i + 1 0 −i i+1 −i + 1
and −1
1 0 i−1 −i − 1 0 i i−1 −i − 1
= .
0 −1 i+1 −i + 1 −i 0 i+1 −i + 1
Consequently, for all g ∈ G,
−1
i−1 −i − 1 i−1 −i − 1
ρ(g) = ρ2 (g) .
i+1 −i + 1 i+1 −i + 1 4
422 CHAPTER 8. GROUP ACTIONS
Lemma 8.6.19
The function of projection projW,U : V → V is a linear transformation with image W .
Proof. Consider the projection function projW,U . Let v1 , v2 ∈ V be two vectors and suppose that
they can be written (uniquely) as
v1 = w1 + u1 and v2 = w 2 + u 2
v1 + v2 = (w1 + u2 ) + (w10 + u2 )
so projW,U (v1 + v2 ) = projW,U (v1 ) + projW,U (v2 ). Similarly, if λ ∈ F and v ∈ V , then writing
v uniquely as v = w + u with w ∈ W and u ∈ U , we have a unique expression for λv, namely
λv = (λw) + (λu). Together these prove that the projection function is a linear transformation.
It is important to note that we used the cumbersome notation projW,U because this projection
depends not only on the subspace W but on the choice of complementary subspace U . Notice from
the construction that projW,U w = w for all w ∈ W . Hence, the composition of projW,U with itself is
again projW,U . This motivates the following definition general definition of a projection, applicable
even if V is not necessarily finite dimensional.
Definition 8.6.20
Let V be a vector space over a field F . A projection is a linear transformation π : V → V
such that π is idempotent, i.e., π ◦ π = π. We say that the linear transformation π projects
from V onto W = Im π.
Proposition 8.6.21
Let π : V → V be a projection. Then V = Im π ⊕ Ker π.
Proof (Theorem 8.6.22). Let π : V → V be any projection of V onto W . Define the function
ψ : V → V by
1 X −1
ψ(v) = g π(gv).
|G|
g∈G
1 X |G|
ψ(w) = w= w = w,
|G| |G|
g∈G
so ψ ◦ ψ = ψ.
Finally, we show that ψ is also a G-representation homomorphism. Let x ∈ G and let v ∈ V .
Then
1 X −1
ψ(v) = g π(gxv).
|G|
g∈G
Note that as g runs through all elements in g, the elements gx also run through all the elements of
G. So we can change the summation variable via h = gx so that g = hx−1 and g −1 = xh−1 . Thus,
1 X 1 X
ψ(v) = x(gx)−1 π(gxv) = x (gx)−1 π(gxv)
|G| |G|
g∈G g∈G
!
1 X −1
=x h π(hv) = xψ(v).
|G|
h∈G
With respect to B, the matrix of ρ(g) is not just block upper triangular but block diagonal.
Definition 8.6.23
A representation V of a group G is called completely reducible if
V = W1 ⊕ W2 ⊕ · · · ⊕ Ws
Corollary 8.6.24
Let G be a finite group and let F be a field of characteristic 0 or such that char F does not
divide |G|. Let V be a finite dimensional representation of V over F . Then V is completely
reducible.
Proof. Let s be the largest integer such that V can be written as a direct sum of subrepresentations
of dimension 1 or greater. Since dim V is finite, by the well-ordering principle, such an integer exists.
Let
V = W 1 ⊕ W 2 ⊕ · · · ⊕ Ws (8.8)
be an expression of V as a direct sum of subrepresentations. Assume that one of the Wi is not
irreducible. Suppose, without loss of generality that Ws has a subrepresentation. Then by Maschke’s
Theorem, Ws = U1 ⊕ U2 , where U1 and U2 are subrepresentations of Ws , and thus are in turn
subrepresentations of V . Hence,
V = W1 ⊕ W2 ⊕ · · · ⊕ Ws−1 ⊕ U1 ⊕ U2 ,
contradicting the maximality of s. Hence, by contradiction, we conclude that in (8.8), all the Wi
subrepresentations are irreducible.
We reiterate that this section merely offered a glimpse into the study of representations of groups.
Indeed, the theory of representations of finite groups offers many interesting surprises about the
internal structure of a group.
defines a representation of Z6 on R2 .
8.6. A BRIEF INTRODUCTION TO REPRESENTATIONS OF GROUPS 425
W = {~
x ∈ V | x1 + x2 + · · · + xn = 0}
is a subrepresentation of V .
11. Let G = GLn (R) and let V = Mn×n (R) be the vector space of n × n matrices of real coefficients.
(a) Prove that the action of G on V defined by g · A = gAg −1 is a representation ρ of GLn (R).
(b) Identifying M2×2 (R) with R4 via the standard basis on Mn×n (R), express
a b
ρ( )
c d
V G = {v ∈ V | gv = v for all g ∈ G}
is a subrepresentation of V .
13. Prove Proposition 8.6.15.
14. Let G be a finite group and let V be a finite-dimensional representation over C. Prove that G is
diagonalizable and that all the eigenvalues of G are roots of unity.
426 CHAPTER 8. GROUP ACTIONS
Show that ρ is a representation that is not completely reducible. (This gives a counterexample to
Maschke’s Theorem when G is infinite.)
17. Let ρ : G → GL(V ) be a representation of a group G. Prove that ρ gives a faithful representation of
the group G/ Ker ρ.
18. (Schur’s Lemma) Let G be a group and let V and W be representations of G over the field C.
(a) Prove that a G-representation homomorphism ϕ : V → W is either trivial (maps to 0) or is an
isomorphism.
(b) Prove all isomorphisms ϕ : V → V are of the form λidV . [Hint: Let λ be an eigenvalue of ϕ and
consider the linear transformation ϕ − λidV .]
19. Use Schur’s Lemma (Exercise 8.6.18) to prove the following result. Let V be a representation of G
over a field C and suppose that V = W1 ⊕ W2 ⊕ · · · ⊕ Ws with Wi irreducible submodules. Prove that
every irreducible U of V is isomorphic to Wi for some i.
20. Use Schur’s Lemma (Exercise 8.6.18) to prove that every irreducible representation over C of a finite
abelian group is one dimensional.
8.7
Projects
2
Project I. Ebbs and Flows. As a first example, consider the differential dx dt = x with x(t) a
real-valued real function. Show how the solutions of this differential equation form a flow on
R, thereby producing an action of the group (R, +) on R. (See Example 8.2.3.) Repeat the
same discussion, emphasizing the flow and action of (R, +), with other (more complicated)
differential equations, including some systems of differential equations.
Project II. Sudoku and Group Actions. Project IV in Chapter 3 discussed the group of
symmetries within the set of permissible Sudoku fillings. Again, let S be the set of all possible
solutions to a Sudoku puzzle, i.e., all possible ways of filling out the grid according to the rules,
and G the group of transformations described in that project. By construction, the set G acts
on S. With the formalism of group actions, we can attempt to address this problem more
effectively. For example, we now call two Sudoku fillings equivalent if they are in the same G
orbit.
Try to answer some of the following questions and any others you can imagine. Is the action
of G on S faithful or transitive or free? If it is not transitive, are the orbits all the same size?
If not, is there a range on what the sizes of the orbits can be? How many orbits are there?
Project III. Young’s Geometry. The Fano plane described in Exercise 8.3.17 is an example
of a finite geometry. In the study of finite geometries, one starts with a set of axioms and
deduces as many theorems as possible. In a finite geometry, the axioms do not explicitly refer
to a finite number of points or lines but lead to there only being a finite number. In Young’s
geometry, it is possible to prove that there are 9 points and 12 lines. Furthermore, a possible
configuration representing the subset structure of points to lines is as follows.
8.7. PROJECTS 427
A B C
D E F
G H I
Study the subgroup of the permutation group S9 acting on the points that preserves the line
structure in Young’s geometry.
Project IV. Invariants in Polynomial Rings. Let X = C[x1 , x2 , . . . , xn ] be the ring of multi-
variable polynomials and consider the action of Sn defined by
Find X σ for various permutations σ. Can you deduce any conclusions about these fixed
subsets? For various subgroups H ≤ Sn , can you determine or say anything about the subset
of X fixed by all of H?
Project V. Quotient G-Sets. Discuss the possibility of taking quotient objects (similar to quo-
tient groups or quotient sets) in the context of G-sets. Given interesting examples. Try to
prove as many interesting propositions that you can about quotient G-sets that illustrate the
connection between this quotient process and the group action.
Project VI. Coloring Tilings. Consider one or more of M. C. Escher’s paintings in the sym-
metry category. (See the official website http://www.mcescher.com/gallery/symmetry/.) Dis-
cuss different strategies of coloring the copies of the fundamental region so that the number
of possible colorings for a given strategy is finite. Discuss the subgroup or quotient group
corresponding to periodicity of a coloring.
Project VII. Coloring the Soccer Ball. Use the Cauchy-Frobenius Lemma to discuss the
number of ways to color the soccer ball using 2, 3, 4, or more colors, where we view two
colorings as the same if one can be obtained from the other by some rigid motion of the ball.
The standard black-and-white coloring of a soccer ball is invariant under the group of rigid
motions of the ball. Discuss the number of colorings that are invariant under some subgroups
of the group of rigid motions.
Project VIII. Sylow p-Subgroup Conjecture. Let G be a finite group. We saw in Section 8.5
that if np = 1 for some prime p that divides |G|, then that unique Sylow p-subgroup is normal.
Therefore, if G is simple then np > 1 for all p dividing |G|. Discuss the conjecture of the
converse: If np (G) > 1 for all primes p dividing |G| then G is simple.
428 CHAPTER 8. GROUP ACTIONS
Project IX. Simple Isn’t So Simple. Write a computer that uses Sylow’s Theorem to eliminate
all orders between 1 and say 1000 (or more) for which groups of that order cannot be simple.
For any orders that could have a simple group, list the np for all p dividing the order.
Project X. Sylow p-Subgroups Action. Sylow’s Theorem affirms that a group G acts tran-
sitively by conjugation on Sylp (G). Explore whether this action is multiply transitive. If it
is not, explore whether the action of G on Sylp (G) system of blocks. Discuss theoretically or
with examples.
Project XI. The S4 Tetrahedron. In (R+ )3 (the first octant in Euclidean 3-space), there are
6 permutations of the inequalities x1 ≤ x2 ≤ x3 , each one corresponding to the action of an
element in S3 on 0 ≤ x1 ≤ x2 ≤ x3 . Each of the inequalities corresponds to a region of (R+ )3 .
Intersected with the plane x1 + x2 + x3 = 1 produces 6 regions in an equilateral triangle.
In this project, consider the action of S4 on (R+ )4 that permutes the coordinates. There are 24
permutations of the inequalities x1 ≤ x2 ≤ x3 ≤ x4 , each one corresponding to a cone-shaped
region in (R+ )4 . The intersection of (R+ )4 with the plane x1 + x2 + x3 + x4 = 1 is a regular
tetrahedron. The regions 0 ≤ xi ≤ xj ≤ xk ≤ x` for different indices cut this tetrahedron into
24 regions, each one corresponding to the action of an element in S4 .
Here are just a few ideas to explore: Physically construct a regular tetrahedron. Trace (at
least on the surface) these 24 regions and show which ones correspond to which element in S4 .
Discuss orbits of subgroups of S4 or quotient groups of S4 in reference to the tetrahedron. Use
this tetrahedron to provide a faithful representation of S4 in R2 .
Project XII. Combinatorial Identities. Example 8.2.15 and Exercises 8.2.13, 8.2.14, and
8.2.15 established known combinatorial formulas from the Orbit Equation. Find a number
of other combinatorial identities in a discrete mathematics or combinatorics textbook and try
to recover these combinatorial identities as the orbit equations of some group actions.
9. Classification of Groups
On a finite set, there can only be a finite number of distinct binary operations. (If the set S has
2
|S| = n, there are only nn binary operations.) Imposing the three axioms for a group restricts
the possibilities considerably, especially when we consider two groups as the same if there exists an
isomorphism between them.
For its usefulness in further study and simply as a research goal, it is a common theme in algebra
to find all objects with a given structure. Such problems are called classification problems. Though
we did not specifically refer to this direction of study in the preface, classification problems arise
naturally in the context of different algebraic structures. This is particularly true when there exists
a finite number of objects or a finite number of infinite families of objects with a certain property.
Some theorems and exercises already addressed some classification results concerning finite
groups. Early on, we presented the classification of groups of order 8 (Example 3.3.11: 3 abelian
groups, D4 and Q8 ) and of order 6 (Exercise 3.3.31: Z6 and D3 ). With Lagrange’s Theorem, we
know immediately that every group of prime order p must be isomorphic to the cyclic group Zp . In
Exercise 4.1.35, we showed that if p is prime then every group of order 2p is isomorphic to Z2p or to
Dp . The Fundamental Theorem of Finitely Generated Abelian Groups is a profound result in group
theory that completely answers the classification problem, but only for finite abelian groups.
In this chapter we present ideas in the project to find all groups of a given finite order. Section 9.1
introduces the Jordan-Hölder Program, a general strategy to find all groups of a given size. From the
perspective of the program, we also introduce the notion of solvable groups. The name “solvable”
groups foreshadows the importance of these groups in the application of Galois Theory (Chapter 11).
The following two sections expand on the program to classify groups. In Section 9.2 on finite
simple groups, we prove the simplicity of a few families of groups and state the classification theorem
of finite simple groups, one of the crowning achievements of group theory in the 20th century. The
second part of the Jordan-Hölder Program involved finding ways to combine small groups into bigger
ones. The direct product is an example of this process of combining smaller groups into a bigger one.
Section 9.3 introduces the operation of a semidirect product—a more general method to combine
small groups of a given size into a single larger group.
Using methods of the semidirect product, we spend Section 9.4 presenting various classification
results about groups. Finally, in Section 9.5 we consider nilpotent groups and solvable groups, two
classes of groups that arise naturally in the Jordan-Hölder Program.
9.1
Composition Series and Solvable Groups
The Lattice or Fourth Isomorphism Theorem of groups states that if N is a normal subgroup of
G, then the quotient group G/N is the group whose structure is the same as G but above N in
the subgroup lattice of G. Similarly, the lattice of G below N is simply the lattice of the group
N . Intuitively speaking, much of the structure of G is contained in the structure of G/N and N .
Furthermore, if G has a normal subgroup N , then it is somehow “made up” from the smaller groups
G/N and N . This intuitive perspective gave rise to the Jordan-Hölder Program, whose ultimate
goal is a method to find all finite groups of a given order.
429
430 CHAPTER 9. CLASSIFICATION OF GROUPS
of finding the prime factorization of a positive integer. In this metaphor, simple groups are similar
to the prime factors because simple groups have no normal subgroups besides the trivial subgroup
and itself.
We need to move away from metaphor to precise notions relevant to group theory.
Definition 9.1.1
Let G be a group and consider a sequence of subgroups
1 = N0 ≤ N1 ≤ N2 ≤ · · · ≤ Nk = G.
Example 9.1.2. Consider the dihedral group D6 with the following two composition series:
D6 /hs, r2 i ∼
= Z3 with generator rhs, r3 i, D6 /hri ∼= Z2 with generator shri,
2 ∼ 2 2 2 ∼
hs, r i/hsi = Z2 with generator r hs, r i, hri/hr i = Z2 with generator rhr2 i,
hsi ∼= Z2 with generator s, hr2 i ∼
= Z3 with generator r2 . 4
However, in this last example there does appear to be some structure, namely that the list of
composition factors is the same after reordering. The following theorem formalizes what can happen
among different composition series of a given group.
1 = M0 E M1 E M2 E · · · E M r = G and
(9.1)
1 = N0 E N1 E N2 E · · · E Ns = G
are two composition series of G, then r = s and there is some permutation σ ∈ Sr such
that
Mi /Mi−1 ∼= Nσ(i) /Nσ(i)−1
for all 1 ≤ i ≤ r.
Before proving this theorem, we will need the evocatively named Butterfly Lemma (also called
Zassenhaus Lemma) because of the diagram of subgroups involved.
9.1. COMPOSITION SERIES AND SOLVABLE GROUPS 431
A K
B(A ∩ K) (A ∩ K)L
A∩K
B(A ∩ L) (B ∩ K)L
B (B ∩ K)(A ∩ L) L
B∩K A∩L
and
B(A ∩ K)/B(A ∩ L) ∼
= (A ∩ K)L/(B ∩ K)L.
Proof. Throughout this proof, we use the fact that for subgroups H, K ≤ G, we have HK ≤ G if
and only if HK = KH as sets. (See Exercise 4.1.36.)
See Figure 9.1 for the relative configuration of subgroups involved. In this diagram, if H3 is the
immediate successor of H1 and H2 , then H3 = H1 H2 and if H3 is the immediate predecessor of
H1 and H2 , then H3 = H1 ∩ H2 . It is not hard to check that all the shown subsets are subgroups
because of B E A and L E K. The only tricky verification occurs at the subgroup where the butterfly
head is (B ∩ K)(A ∩ L). It has a few equivalent expressions:
(B ∩ K)(A ∩ L) = B(A ∩ L) ∩ (B ∩ K)L = B(A ∩ L) ∩ (A ∩ K) = (A ∩ K) = (B ∩ K)L.
The first equality is obvious. However, the second equality holds because if y = bx ∈ A ∩ K
with b ∈ B and x ∈ A ∩ L, then we must have b ∈ K since L is only a subgroup of K. Thus,
B(A ∩ L) ∩ (A ∩ K) ⊆ (B ∩ K)(A ∩ L) and the reverse inclusion is obvious. The fourth equality
holds via the same reasoning as for the third.
Since L E K, then A ∩ L E A ∩ K. Let g ∈ A ∩ K and let bx ∈ B(A ∩ L). Then
g(bx)g −1 = (gbg −1 )(gxg −1 ).
But gbg −1 ∈ B because g ∈ A and B E A and gxg −1 ∈ A ∩ L because A ∩ L E A ∩ K. Hence,
A ∩ K ≤ NG (B(A ∩ L)) and so we can apply the Second Isomorphism Theorem to conclude that
B(A ∩ L)(A ∩ K)/B(A ∩ L) = B(A ∩ K)/B(A ∩ L) ∼
= (A ∩ K)/(B ∩ K)(A ∩ L).
These quotient groups correspond to the wing on the left side of Figure 9.1. Applying the same
reasoning to the right side, we deduce that
B(A ∩ K)/B(A ∩ L) ∼
= (A ∩ K)/(B ∩ K)(A ∩ L) ∼
= (A ∩ K)L/(B ∩ K)L
and the lemma follows.
432 CHAPTER 9. CLASSIFICATION OF GROUPS
Proof (Theorem 9.1.3). We first prove that every nontrivial finite group G has a composition series
by induction on |G|. First, suppose that |G| = 2. Then G = Z2 and G is simple so it has a
composition series of length 1. Now suppose that |G| = n and that all groups H of order |H| < n
have a composition series. If G is simple, then again it has a composition series {1} E G that is of
length 1. On the other hand, if G is not simple then it has a normal subgroup N . By the induction
hypothesis, both N and G/N have composition series
1 =K0 E K1 E K2 E · · · E Kr = N and
1 =M 0 E M 1 E M 2 E · · · E M s = G/N.
By the Fourth Isomorphism Theorem, there exist subgroups Mi with 1 ≤ i ≤ s such that N ≤ Mi ≤
G with Mi−1 E Mi and M i = Mi /N . Hence, G has a chain of successively normal subgroups
1 = K0 E K1 E K2 E · · · E Kr = M0 E M1 E M2 E · · · E Ms = G.
Insert these chains into the composition series given in (9.1) respectively to create two new chains,
each of length rs. We temporarily call these chains expanded composition series because they are
composition series except that quotients between successive groups are simple though not necessarily
nontrivial.
Using the Butterfly Lemma with A = Mi , B = Mi−1 , K = Nj , and L = Nj−1 , we deduce that
Mi /Mi−1 ∼
= Mi,σ(i) /Mi,σ(i)−1 ∼
= Nσ(i),i /Nσ(i),i−1 ∼
= Nσ(i) /Nσ(i)−1 .
Note that if G is a group with a given list of composition factors K1 , K2 , . . . , Kn , then the order
|G| is the product of the orders of Ki . Consequently, knowing all the finite simple groups of a given
order and knowing |G| we can deduce the possible composition factors of G. The second part of
the Hölder Program would then allow us to find all groups with those possible composition factors.
Hence, this additional strategy would allow us to find all groups of any given order.
This program drove much of the research in group theory during the 20th century. One of the
greatest collaborative achievements of group theory is the complete solution of the first part of the
program. We will discuss this in the following section. The second part of the Hölder Program
is much more complicated. However, under some circumstances, there are a few strategies for this
second part, in particular using the semidirect product, discussed in Section 9.3. By virtue of Sylow’s
Theorem and the semidirect product, for many integers n especially if n is not too large, it is possible
to classify all groups of order n.
Definition 9.1.5
A group G is solvable if there is a chain of subgroups
{1} = M0 E M1 E M2 E · · · E Mr = G
For finite groups, this definition has a simpler equivalent. In the metaphor that compared a
composition series of a group to a prime factorization of a positive integer, prime numbers correspond
to simple groups. However, among the simple groups there is a family of groups closely connected
to prime numbers, namely the cyclic groups of prime order Zp .
Proposition 9.1.6
A finite group G is solvable if and only if each of its composition factors is a finite cyclic
group.
By using Sylow’s Theorem for low values of a positive integer n, it is possible to show that n = 60
is the first integer for which there may exist a simple group that is not isomorphic to Zp for some
prime p. Furthermore, we can show that the only simple group of order 60 is A5 . (Exercise 9.2.10.)
Hence, every group of order less than 60 is solvable.
As we will see in Chapter 11, solvable groups are important in the study of solutions of polynomial
equations. Consequently, classifying solvable groups is a key ingredient in the advanced study of
roots of polynomials. However, the effort to classify all solvable groups involves challenging group
theory. For example, using representation theory of groups, it is possible to prove (Burnside’s
Theorem) that if |G| = pa q b for some primes p and q, then G is solvable. A far more challenging
result is that (Feit-Thompson Theorem) if |G| is odd, then G is solvable. When this latter theorem
was first proved, it occupied 255 pages [23].
We conclude this section with a characterization of solvable groups involving commutators.
434 CHAPTER 9. CLASSIFICATION OF GROUPS
Definition 9.1.7
Let G be a group.
(1) The commutator of two group elements x, y ∈ G is [x, y] = x−1 y −1 xy.
(2) If H, K ≤ G, we denote by [H, K] the subgroup generated by commutators of elements
in H and elements in K, namely [H, K] = h[h, k] | h ∈ H and k ∈ Ki.
(3) The commutator subgroup of G is G0 = [G, G].
If x and y commute, the x, y, x−1 and y −1 all commute so [x, y] = x−1 xy −1 y = 1. Consequently,
if G is abelian, then G0 = {1}. However, the converse is also true, so G is abelian if and only if
G0 = {1}. We observe two simple relations among commutators:
and
[x, y] = [x, xy] = [yx, y].
Note that from the form of the commutator, if x, y, a, b ∈ G, there is no reason why [x, y][a, b]
should be the commutator of two other group elements. In fact, by taking the extreme case of a free
group on 4 elements hx, y, a, bi, we see that
does not simplify and therefore cannot be put into the form of a commutator of two elements.
Therefore, the subset of commutators from elements in H and K might not be a subgroup and so
we must consider the subgroup generated by the subset of commutators. This observation holds for
[G, G] as well.
Example 9.1.8. Consider the dihedral group Dn on the regular n-gon. We calculate
Proposition 9.1.9
Let G be a group. Then G0 is a characteristic subgroup of G. Furthermore, G0 is the
smallest (by inclusion) normal subgroup of G such that G/G0 is abelian.
ψ([x, y]) = ψ(x−1 )ψ(y −1 )ψ(x)ψ(y) = ψ(x)−1 ψ(y)−1 ψ(x)ψ(y) = [ψ(x), ψ(y)].
Since G0 is the subgroup generated by all commutators, then ψ applied to any generator (commu-
tator) is again another generator of G0 so ψ(G0 ) = G0 and G0 is characteristic.
Since G0 is characteristic, it is also normal. Let xG0 and yG0 be in the quotient G/G0 . Then
(xG0 )(yG0 ) = (xy)G0 = (xy[y, x])G0 = (xyy −1 x−1 yx)G0 = (yx)G0 = (yG0 )(xG0 ),
so G/G0 is abelian.
Now suppose that N is any normal subgroup such that G/N is abelian. The criterion that
(xN )(yN ) = (yN )(xN ) is equivalent to
N = (xN )−1 (yN )−1 (xN )(yN ) = x−1 y −1 xyN = [x, y]N,
The characterizing property to G0 can be restated to say that G/G0 is the largest abelian quotient
group of G in the sense that if N E G with G/N abelian, then G0 ≤ N .
Definition 9.1.10
Let G be a group. The commutator series of G is the chain of subgroups
Example 9.1.12. Consider the group S4 . This group contains only two normal subgroups, namely
A4 and K = h(1 2)(3 4), (1 3)(2 4)i. However, S4 /K is not abelian so S40 6= K. On the other hand,
S4 /A4 ∼
= Z2 , which is abelian so A4 = S40 . In Exercise 9.1.13, we prove that the commutator subgroup
of A4 is precisely K. Finally, K ∼ = Z2 ⊕ Z2 so it is abelian. This shows that the commutator series
of A4 is
{1} ≤ K ≤ A4 ≤ S4 . 4
Example 9.1.13. Let G = S5 . The symmetric group S5 only has one normal subgroup, namely
A5 . By Exercise 4.2.27, A5 is simple. Since S5 /A5 ∼ = Z2 is abelian, then A5 is the commutator
subgroup. Note that from the definition, it is obvious that the commutator of any two elements in
S5 must be written using an even number of transpositions and hence S50 ≤ A5 . Then, since A5 is
simple, its commutator subgroup is itself. Hence, the commutator series of S5 is
A5 ≤ S5 ,
without the subgroup {1} in the chain. Then G(i) = A5 for all i ≥ 1. 4
If we compare properties of commutator series to those of composition series, it may seem strange
that a commutator series does not terminate at {1}. As the following proposition shows, this is
precisely when a group is solvable.
Proposition 9.1.14
A group G is solvable if and only if G(s) = {1} for some positive integer s.
9.2
Finite Simple Groups
The classification of finite simple groups “is generally regarded as a milestone of twentieth-century
mathematics” [31]. How the theorem came about exemplifies the collaborative nature of mathemat-
ical investigation. Hundreds of mathematicians contributed to the effort.
Because of the extreme length and the number of disparate results necessary for a full classifica-
tion, the realization that the classification of finite simple groups was within reach arose slowly. In
1972, when the classification felt close, Gorenstein laid out a 16-step program to break the project
down into cases covering all possibilities [30]. In 1986, Gorenstein wrote a summary article declaring
at long last that all finite simple groups had been found [31]. It was estimated at the time that the
work spanned 15,000 pages of articles both published and unpublished.
However, as group theorists labored to synthesize the work, it became apparent that a gap
remained in some unpublished material related to so-called quasithin groups. Aschbacher and Smith
began working to rectify this gap and completed their work in 2004 in a pair of monographs [4, 5].
The form itself of the classification theorem is surprising. In retrospect, it is almost more sur-
prising that so much work can be summarized in so brief a statement. Of course, to understand all
parts of the theorem requires considerable effort and advanced study. As with any fruitful problem,
the project to classify all finite simple groups drove investigations in many areas, which in turn
produced many theorems not directly related to the classification theorem.
9.2. FINITE SIMPLE GROUPS 437
In this section, we state the Classification Theorem and offer some explanation of the terms. (A
complete treatment is outside the scope of this book. Whole books have been written about finite
simple groups and we encourage the reader to consult [14, 61]). Subsequently, we remind the reader
of a few necessary conditions for a group to be simple. Then we give proofs that two of the families
of groups mentioned in the theorem are simple.
Some explanation is in order. We already know that cyclic groups of the form Zp with p prime
are simple. Indeed, by Lagrange’s Theorem, the groups Zp have no proper nontrivial subgroups, let
alone normal subgroups. We are also familiar with alternating groups. In Exercise 4.2.27, we proved
that A5 is simple. We will show below that An is simple for all n ≥ 5. (A3 ∼ = Z3 is simple as well.)
Without going into detail, a Lie group is a group that has both the structure of a group and
the differential geometric structure of a manifold. Lie groups can often be viewed as certain groups
of matrices, subgroups of GLn (F ), where F is R or C. If F is replaced with a finite field, then the
resulting groups are no longer Lie groups but are called groups of Lie type.
Finally, the Classification Theorem states that there are precisely 26 other finite simple groups.
Since they are not in one of the 18 families, they are given the name sporadic groups. The discovery of
these groups spanned over a century from the Mathieu groups denoted by M11 , M12 , M22 , M23 , and
M24 , discovered in 1861, to the Fischer-Griess Monster Group M whose existence was conjectured
in 1973 but only proven in 1989. The sizes of sporadic groups start with the Mathieu group M11 ,
with order 7920, and rise quickly, ending with the Baby Monster Group B of order
|B| = 4,154,781,481,226,426,191,177,580,544,000,000
and the Monster Group, which tops the scales with order
|M | = 808,017,424,794,512,875,886,459,904,961,710,757,005,754,368,000,000,000.
The monster group is so complicated that, using representation theory to describe it, the minimal
degree of a faithful representation over the field C is 196,883. In other words, if we faithfully represent
M as a group of invertible linear transformations on some Cn , we would need the dimension n to
be 196,883 (or greater)!
Classification theorems are common in algebra. We encountered a few simple classification
theorems in group theory. The Fundamental Theorem of Finitely Generated Abelian Groups is
an important classification theorem. And yet, the classification of finite simple groups is indeed a
milestone; in intuitive language, it provides a complete list of all finite irreducible (simple) patterns
of symmetry.
Proposition 9.2.2
Let G be a group with conjugacy classes K1 , K2 , . . . , Kr with K1 = {1}. If the only subsets
S ⊆ {1, 2, . . . , r} such that X
|Ki |
i∈S
Proof. By Sylow’s Theorem (Theorem 8.5.6), the number of Sylow p-subgroups np divides m and
satisfies np ≡ 1 (mod p). So if the only divisor d of m satisfying d ≡ 1 (mod p) is d = 1, then
np = 1. Hence, there is only one Sylow p-subgroup. Being the only subgroup of a given order, that
Sylow p-subgroup is normal.
Proposition 9.2.4
Let p be the smallest prime dividing the order of a group G. If G has a subgroup H
satisfying |G : H| = p, then H E G so G is not simple.
Proof. Consider the action of G on the set X of left cosets of H by left multiplication and let
ρ : G → SX be the associated homomorphism. Since gH = H if and only if g ∈ H, then Ker ρ ≤ H
and in particular Ker ρ ( G. The order of the permutation group SX is |G : H|!. By the First
Isomorphism Theorem, G/ Ker ρ isomorphic to a subgroup of SX so |G|/| Ker ρ| divides |G : H|!.
Since |G| does not divide |G : H|!, then we must have | Ker ρ| > 1. Hence, Ker ρ is a nontrivial,
proper, normal subgroup of G.
Proposition 9.2.6
If (G, X, ρ) is a faithful, transitive, primitive group action such that no subgroup of G acts
transitively on X, then G is simple.
Theorem 9.2.7
For all n ≥ 5, the alternating group An is a simple group.
9.2. FINITE SIMPLE GROUPS 439
Proof. By Theorem 3.4.16, An consists of all permutations that can be expressed as a product of an
even number of transpositions. A simple calculation gives
Case 1. Suppose that N contains a permutation that can be written with disjoint cycles as σ =
(a1 a2 · · · ar )τ with r ≥ 4. Since N is normal, it contains the element (a1 a2 a3 )−1 σ(a1 a2 a3 ).
Hence, N also contains the element
Case 2. Suppose that N contains a permutation that can be written with disjoint cycles as σ =
(a1 a2 a3 )(a4 a5 a6 )τ . Since N is normal, it contains the element (a1 a2 a4 )−1 σ(a1 a2 a4 ). Hence,
N also contains the element
Since N contains a 5-cycle, then by the previous case, N also contains a 3-cycle.
Case 3. Suppose that N contains a permutation that can be written with disjoint cycles as σ =
(a1 a2 a3 )τ , where τ is a product of transpositions. Then σ 2 = (a1 a3 a2 ).
Case 4. Suppose that N contains a permutation that can be written with disjoint cycles as σ =
(a1 a2 )(a3 a4 )τ , where τ is a product of transpositions. Since N is normal it also contains the
element (a1 a2 3)−1 σ(a1 a2 3). Hence, it also contains
Since n ≥ 5, then N contains (a1 a2 a5 )−1 (a1 a4 )(a2 a3 )(a1 a2 a5 ) and also
From the above four cases, we conclude that N contains a 3-cycle, say (a b c). Then for d distinct
from a, b, or c, N contains the conjugate
In a similar fashion, we find that N contains all 3-cycles that include exactly 2 of the three integers
a, b, or c. Furthermore, if {d, e} ∩ {a, b, c} = ∅, then N contains
By the same reasoning, N contains all 3-cycles that include exactly 1 of the three integers a, b, or
c. Finally, if {d, e, f } ∩ {a, b, c} = ∅, then N contains (a d e) and also
Intuitively, we remove the ~0 from F n and then consider any two vectors equivalent if they are
multiples of each other. The (n − 1)-dimensional projective space over F , denoted P(F n ) is the
set of ∼-equivalence classes. Without going into the geometry details, the dimensional difference
arises because the parameter λ removes a degree of freedom. We denote the ∼-equivalence class of
(a1 , a2 , . . . , an ) by (a1 : a2 : · · · : an ). If F = R, then P(Rn ) classifies the concept of direction or
n-dimensional slope.
Definition 9.2.8
Let F be a field and n ≥ 2 and integer. The projective general linear group or order n,
denoted by PGLn (F ), is
The first family of Lie type listed in the Classification Theorem of Finite Simple Groups is the
projective special linear group.
Theorem 9.2.9
Let F be a finite field of order q and n ≥ 2. The group PSLn (F ) is simple, except for
PSL2 (F2 ) and PSL2 (F3 ).
Lemma 9.2.10
If i 6= j, define the matrix Xij (λ) as 1 on the diagonal, λ in the (i, j)th entry, and 0
elsewhere. The group SLn (F ) is generated by matrices Xij (λ) with λ ∈ F and 1 ≤ i, j ≤ n
and i 6= j.
Proof. We will temporarily call the matrices Xij (λ) elementary matrices. We will also interpret
SL1 (F ) as the trivial group of 1 × 1 matrix with entry 1.
9.2. FINITE SIMPLE GROUPS 441
Note first that Xij (λ)−1 = Xij (−λ). It suffices to show that for all A ∈ SLn (F ), there are
sequences X1 , X2 , . . . , Xr and Y1 , Y2 , . . . , Ys of elementary matrices such that
X1 X2 · · · Xr AY1 Y2 · · · Ys = I (9.2)
since then
A = Xr−1 · · · X2−1 X1−1 Ys−1 Ys−1
−1
· · · Y1−1 .
Let A ∈ SLn (F ). Suppose first that a21 6= 0. Then X11 (−a11 /a21 )A is a matrix with 1 in the
(1, 1)th entry. Suppose that a21 = 0. Then there is some row i with ai1 6= 0. Then X2i (1)A has
ai1 in the (2, 1)th entry. Then, repeating the previous step, we can multiply by another elementary
matrix and get 1 in the (1, 1)th entry.
Suppose now that A has a11 = 1. By multiplying on the left by Xi1 (−ai1 ) and on the right by
X1j (−a1j ) we obtain a matrix of the form
1 0
,
0 B
where B ∈ SLn−1 (F ). By induction on n, we obtain every matrix A ∈ SLn (F ) as (9.2) and the
lemma follows.
Lemma 9.2.11
If n ≥ 3 or if n = 2 and |F | ≥ 4, then the matrices Xij (λ) are commutators of elements in
SLn (F ).
(α2 − 1)β
α 0 1 β 1
, = .
0 α−1 0 1 0 1
For any λ ∈ F , the equation λ = β(α2 − 1) can be solved for β as long as there exists α ∈ U (F )
such that α 6= ±1. This is possible for all fields F that contain more than 3 elements.
Corollary 9.2.12
Let F be a field. The following are equivalent:
(1) SLn (F ) is solvable;
(2) the commutator subgroup SLn (F )0 is a proper subgroup of SLn (F );
(3) (n, F ) is (2, F2 ) or (2, F3 ).
Proof. By Lemmas 9.2.11 and 9.2.10, if n ≥ 3 or if n = 2 and |F | > 4, then all the generators
of SLn (F ) are in the commutator subgroup of SLn (F ). Hence, SLn (F )0 = SLn (F ). In particular,
the commutator subgroups have SLn (F )(s) 6= {1} for all s, so by Proposition 9.1.14, SLn (F ) is not
solvable.
In the remaining two cases, SL2 (F2 ) ∼= Z3 so is solvable. For SL2 (F3 ), note that Z(SL2 (F3 )) has
order 2. Then PSL2 (F3 ) has order 12 and it is easy to check by cases that all groups of order 12 are
solvable. By Exercise 9.1.9, SL2 (F3 ) is solvable.
With this result about SLn (F ), we are in a position to prove the theorem.
442 CHAPTER 9. CLASSIFICATION OF GROUPS
Proof (Theorem 9.2.9). In Example 8.2.8 we saw that the group GLn (F ) acts transitively on F n −
{(0, 0, . . . 0)}. By multiplying the second column of M2 in Example 8.2.8 by an appropriate factor,
we can select M2 M1−1 ∈ SLn (F ). Hence, SLn (F ) acts transitively on F n − {(0, 0, . . . 0)}. This action
does have a system of blocks, namely the linear subspaces Span(v) − {0} = {λv | λ ∈ F ∗ }. These
blocks are precisely the elements in the projective space P(F n ).
We claim that the action of G = SLn (F ) on P(F n ) is 2-transitive. Let (a1 , a2 ) and (b1 , b2 ) be
two pairs of nonparallel vectors in F n − {(0, 0, . . . , 0)}. Each pair of vectors represents a pair of
distinct points in P(F n ). Let M2 ∈ GLn (F ) be any matrix whose first two columns are a1 and a2 ,
respectively, and let M1 ∈ GLn (F ) be any matrix whose first two columns are b1 and b2 . Note that
in P(F n ), two elements are equivalent if they are multiples of each other. Hence, if necessary, we
may replace ~b2 with a scalar multiple of itself so that M2 M1−1 ∈ SLn (F ). The matrix M1−1 maps a1
and a2 to
1 0
0 1
e1 = 0 and e2 = 0
.. ..
. .
0 0
where a ∈ F ∗ , w is a row vector in F n−1 , and B ∈ GLn−1 (F ) with det(B) = a1 . By Proposition 8.3.8,
P is a maximal subgroup of SLn (F ). Furthermore, if v ∈ F n − {(0, 0, . . . , 0)} with v = ge1 for some
g ∈ SLn (F ), then the stabilizer of v is the conjugate group gP g −1 .
We now claim that every normal subgroup N E SLn (F ) is in the center Z(SLn (F )). We consider
two cases.
Case 2. Suppose that N 6≤ P . Then since N is normal, making P N is a group strictly larger than
the maximal subgroup P so P N = SLn (F ). Take K to be the subgroup of P of matrices
1 w
,
0 In−1
= sign σ. Let g ∈ SLn (F ) be the matrix with ge1 = ei and gei = eσ(i) for i ≥ 2. Then
gX12 (λ) = g e1 λe1 + e2 e3 . . . en
= ei λei + ej eσ(3) . . . eσ(n)
= Xij (λ) ei ej eσ(3) . . . eσ(n)
= Xij (λ)g.
Hence, Xij (λ) = gX12 (λ)g −1 . However, X12 (λ) ∈ K ⊆ KN and since KN E SLn (F ), then
Xij (λ) ∈ KN . Since these elementary matrices generate SLn (F ), we deduce that KN =
SLn (F ).
By the Second Isomorphism Theorem, SLn (F )/N = ∼ K/(K ∩ N ). But K/(K ∩ N ) is abelian
since K = ∼ F n−1 is abelian. By Proposition 9.1.9, the commutator subgroup SLn (F )0 ≤ N
is a proper subgroup. By Corollary 9.2.12, we conclude that the assumption that N is not
contained in P can only occur if n = 2 and F = F2 or F3 .
Suppose now that (n, F ) is neither (2, F2 ) or (2, F3 ). Then we are only in Case 1. Then every
normal subgroup of SLn (F ) is in the center. By the Fourth Isomorphism Theorem, the quotient
group
PSLn (F ) = SLn (F )/Z(SLn (F ))
has no normal subgroups. Hence, PSLn (F ) is a simple group. On the other hand, if (n, F ) is either
(2, F2 ) or (2, F3 ), then by Corollary 9.2.12, SLn (F ) is nonabelian and solvable and therefore not
simple.
(c) Show that the space P(F32 ) along with its set of lines has the geometry of the Fano plane.
(d) The group of symmetries is the subgroup of S7 (acting on the points Fano plane) that preserves
collineations. Prove that this group of symmetries is PGL3 (F2 ) and observe that this group is
equal to PSL3 (F2 ).
9.3
Semidirect Product
The Classification Theorem of Finite Simple Groups completely solves the first part of the Hölder
Program. The second part involves classifying all the groups that have a given set of composition
factors. Taking the direct sum of all the composition factors is one way to find a group with those
given factors. However, it is not the only way, even if the group is abelian.
The second part of the Hölder Program turns out to be very challenging. In this section, we
introduce the semidirect product, a construction that creates a larger group from smaller ones that
generalizes the direct sum construction. Furthermore, we discuss under what circumstances this
construction is sufficient to find all the groups with a given set of composition factors.
We begin this section with a brief comment on terminology.
Definition 9.3.1
The set of all choice functions is called the direct product of the collection {Gi }i∈I , and it
is denoted by Y
Gi .
i∈I
Proposition 9.3.2
The direct product of a collection of groups is a group under the operation ·, where f1 · f2
is the choice function defined by
Furthermore, the direct product is abelian if and only if Gi is abelian for all i ∈ I.
Q
Proof. Let f1 , f2 , f3 ∈ i∈I Gi . Then for all i ∈ I,
(f1 · (f2 · f3 ))(i) = f1 (i) (f2 (i)f3 (i)) = (f1 (i)f2 (i)) f3 (i) = ((f1 · f2 ) · f3 )(i).
Hence, the operation · is associative. The choice function f (i) = 1i ∈ Gi for all i ∈ I serves as the
identity. The inverse of an element f in the direct product is f −1 with f −1 (i) = f (i)−1 taken in
each Gi .
9.3. SEMIDIRECT PRODUCT 445
It is easy to see that if each Gi is abelian, then the operation · on the direct product is abelian.
Conversely, suppose that · is abelian. Then f g = gf for all choice functions in the direct product.
Fix an index i0 ∈ I and let x, y ∈ Gi0 . Consider the choice functions f and g such that f (i0 ) = x,
g(i0 ) = y and f (i) = g(i) = 1 for all i ∈ I − {i0 }. Then since f g = gf , we deduce that xy = yx
in Gi0 . Since the choices of indices and elements was arbitrary, we deduce that every group Gi is
abelian.
If I is finite with |I| = n ≥ 1, then I is in bijection with {1, 2, . . . , n}. A choice function from
I = {1, 2, . . . , n} is tantamount to an n-tuple (f1 , f2 , . . . , fn ) ∈ G1 × G2 × · · · × Gn with fi = f (i)
for 1 ≤ i ≤ n.
Definition 9.3.3
The direct sum of a collection {Gi }i∈I of groups is the subset of the direct product consisting
of choice functions f such that f (i) = 1 ∈ Gi for all but a finite number of indices i ∈ I.
The direct sum is denoted by M
Gi .
i∈I
If I is a finite set then the subset condition Definition 9.3.3 is trivially satisfied for all choice
functions. Hence, the direct sum and the direct product are the same whenever I is a finite set. On
the other hand, if I is infinite, then the direct sum is a strict subgroup of the direct product.
Proposition 9.3.4
Let H and K be groups and let ϕ : K → Aut(H) be a homomorphism. The Cartesian
product H × K, equipped with the operation
is a group. Furthermore, if H and K are finite, then this group has order |H||K|.
Proof. Let (h1 , k1 ), (h2 , k2 ), and (h3 , k3 ) be three elements in the Cartesian product H × K. Then
(h1 , k1 ) · ((h2 , k2 ) · (h3 , k3 ))
= (h1 , k1 ) · (h2 ϕ(k2 )(h3 ), k2 k3 )
= (h1 ϕ(k1 )(h2 ϕ(k2 )(h3 )), k1 (k2 k3 ))
= (h1 ϕ(k1 )(h2 )ϕ(k1 )(ϕ(k2 )(h3 )), (k1 k2 )k3 ) because ϕ(k1 ) is a homomorphism
= (h1 ϕ(k1 )(h2 )ϕ(k1 k2 )(h3 ), (k1 k2 )k3 ) because ϕ is a homomorphism
= (h1 ϕ(k1 )(h2 ), k1 k2 ) · (h3 , k3 )
= ((h1 , k1 ) · (h2 , k2 )) · (h3 , k3 ).
This proves associativity.
The element (1, 1) serves as the identity because
(1, 1) · (h, k) = (1ϕ(1)(h), k) = (h, k) and
(h, k) · (1, 1) = (hϕ(k)(1), k) = (h, k),
446 CHAPTER 9. CLASSIFICATION OF GROUPS
where ϕ(1)(h) = h because ϕ(1) is the identity function and because ϕ(k)(1) = 1 since any homo-
morphism maps 1 to 1.
Let (h, k) ∈ H × K. We prove that (h, k)−1 = (ϕ(k −1 )(h−1 ), k −1 ). Indeed
and
(ϕ(k −1 )(h−1 ), k −1 ) · (h, k) = ϕ(k −1 )(h−1 )ϕ(k −1 )(h), k −1 k = (ϕ(k −1 )(h−1 h), 1) = (1, 1).
Definition 9.3.5
Suppose that H and K are groups such that there exists a homomorphism ϕ : K → Aut(H).
The Cartesian product H × K, equipped with the operation defined in (9.3), is called the
semidirect product of H and K with respect to ϕ and is denoted by H oϕ K.
Proposition 9.3.6
Suppose that G = H oϕ K. Then H̃ = {(h, 1) | h ∈ H} ∼= H and K̃ = {(1, k) | k ∈ K} are
subgroups with H̃ ∼
= H and K̃ ∼
= K. Furthermore, H̃ E H oϕ K and G/H̃ ∼ = K.
Because of this proposition, we will often abuse notation and write H EH oϕ K and K ≤ H oϕ K
instead of H̃ E H oϕ K and K̃ ≤ H oϕ K
Before developing this example further and presenting more examples, it is useful to explore the
the relationship of H and K inside H oϕ K. However, implicit in Example 9.3.7 is that there always
exist a homomorphism K → Aut(H), namely the trivial homomorphism, which maps all elements
in K to the identity automorphism. The following proposition describes this situation.
Proposition 9.3.8
The following are equivalent.
Proof. (1) =⇒ (2) If ϕ is trivial then ϕ(k) : H → H is the identity function. Hence,
Thus, H oϕ K ∼ = H ⊕ K.
(2) =⇒ (3) We know that K E H ⊕ K.
(3) =⇒ (1) Suppose that K E H oϕ K. Then for all h2 ∈ H and all k1 , k2 ∈ K, the following
element is in K:
This calculation shows that ϕ(k)(h) corresponds to conjugation of the subgroup K on the normal
subgroup H.
This inspires us to provide a characterization of groups that arise as semidirect products.
Proposition 9.3.9
Suppose that a group G contains a normal subgroup H and a subgroup K such that
G = HK and H ∩ K = {1}. Then G ∼ = H oϕ K, where ϕ : K → Aut(H) is the
homomorphism defined by conjugation ϕ(k)(h) = khk −1 .
Proof. Consider the function f : H oϕ K → G given by f (h, k) = hk. Since G = HK, this function
is a surjection. If f (h1 , k1 ) = f (h2 , k2 ) then h1 k1 = h2 k2 so h−1 −1
2 h1 = k2 k1 . Since H ∩K = {1} then
−1 −1
h2 h1 = 1 = k2 k1 so (h1 , k1 ) = (h2 , k2 ). Thus, this function is injective and therefore bijective.
Let (h1 , k1 ), (h2 , k2 ) ∈ H o K. Then
and
f (h1 , k1 )f (h2 , k2 ) = h1 k1 h2 k2 = h1 (k1 h2 k1−1 )k1 k2 = h1 ϕ(k1 )(h2 )k1 k2 .
We conclude that f is a homomorphism and thus an isomorphism.
Example 9.3.10. Let us revisit Example 9.3.7 in light of these propositions. We now give presen-
tations for all three of the semidirect products.
Case 2. If ϕ2 is such that ϕ2 (x) = ψ2 , then a presentation for the semidirect product is
Case 3. If ϕ3 is such that ϕ3 (x) = ψ4 , then a presentation for the semidirect product is
Z7 oϕ3 Z3 = hu, v | v 3 = v 7 = 1, uvu−1 = v 4 i.
Again, this is a nonabelian group.
Now consider the mapping f : Z7 oϕ2 Z3 → Z7 oϕ3 Z3 defined by f (x) = u2 and f (y) = v. It is
easy to to check that (u2 )3 = v 7 = 1 but we also have
u2 vu−2 = u(uvu−1 )u−1 = uv 4 u−1 = (uvu−1 )4 = (v 4 )4 = v 16 = v 2 .
Hence, u2 and v satisfy the same relations as x and y so f defines a homomorphism by the Generator
Extension Theorem. It is not hard to see that f is bijective and we deduce that the groups obtained
from Case 2 and Case 3 are isomorphic. Consequently, we refer to the group with the simplified
notation of Z7 o Z3 because it is the only nondirect semidirect product.
Proposition 9.3.9 allows us to take this example one step further to a classification result. Let
G be a group of order 21. By Sylow’s Theorem, we deduce that n7 (G) = 1 so G contains a normal
subgroup H of order 7. G must also contain a subgroup K of order 3. The action of K on H by
conjugation corresponds to a homomorphism ϕ : K → Aut(H). We have seen that there are only
two ϕ that lead to nonisomorphic groups, which are Z21 and Z7 o Z3 . 4
Example 9.3.11. A group H is abelian if and only if the inversion function h 7→ h−1 is an automor-
phism. As an automorphism, the inversion has order 2. Consider the cyclic group Z2 = hx | x2 = 1i
and the map ϕ : Z2 → Aut(H) such that ϕ(x)(h) = h−1 , and of course, ϕ(1)(h) = h. This allows
us to construct the semidirect H oϕ Z2 . However, the function ϕ might not give the only nondirect
semidirect product. 4
Example 9.3.12 (Dihedral Groups). A particular example of the previous construction occurs
with dihedral groups. Every dihedral group Dn is defined as Dn = Zn oϕ Z2 where ϕ : Z2 → Aut(Zn )
is defined by ϕ(y)(x) = x−1 where x generates Zn and y generates Z2 . This leads to the presentation
Zn oϕ Z2 = hx, y | xn = y 2 = 1, yxy −1 = x−1 i = Dn .
However, if n is not prime, there may be other nontrivial homomorphisms ϕ : Z2 → Aut(Zn ).
For example, if n = 15, then Aut(Z15 ) = U (15). We need to determine U (15). By the Chinese
Remainder Theorem, Z/15Z = Z/3Z ⊕ Z/5Z so
Aut(Z15 ) = U (Z/3Z ⊕ Z/5Z) = U (Z/3Z) ⊕ U (Z/5Z) = U (3) ⊕ U (5) ∼
= Z2 ⊕ Z4 .
Hence, U (15) contains three elements of order 2, namely 4, 11, and 14 = −1. Writing Z15 as Z3 ⊕Z5 ,
we see that these elements of order 2 in Aut(Z15 ) correspond to inversion on the Z5 component alone,
inversion on the Z3 component alone, or inversion on both. The three resulting nondirect semidirect
products of Z15 with Z2 are Z3 ⊕ D5 , Z5 ⊕ D3 , and D15 . 4
Remark 9.3.13. It is not always a simple task to determine if H oϕ1 K and H oϕ2 K are isomorphic,
given two different homomorphisms ϕ1 , ϕ2 : K → Aut(H). Exercises 9.3.3 and 9.3.4 show two general
situations in which there does exist an isomorphism between semidirect products. 4
Implicit in the second part of the Hölder program, is the ability to classify all groups G that
have a known normal subgroup N along with a known quotient group G/N . We reiterate that
the semidirect product of two groups captures situations in which G contains a subgroup K that
is isomorphic to G/N and G = N K. It is possible that a group G does not contain a subgroup
isomorphic to G/N . The simplest example occurs with Q8 . Every subgroup of Q8 is normal.
Consider first the normal N = h−1i with quotient group Q8 /h−1i ∼ = Z2 ⊕ Z2 . However, Q8 has no
subgroup isomorphic to Z2 ⊕ Z2 , so Q8 does not arise as the semidirect product over N . Consider
now the normal subgroup N 0 = hii with quotient group Q8 /hii ∼ = Z2 . The only element of order 2
in Q8 is in N 0 so there is no subgroup K such that N 0 K = G and N 0 ∩ K = {1}, because such a K
would need to contain an element of order 2. A same reasoning holds for N 0 = hji and hki. Hence,
Q8 does not arise as a semidirect product of its subgroups. In this case, in order to find such groups
G given N and G/N , we need group cohomology, a theory that is beyond the scope of this textbook.
9.3. SEMIDIRECT PRODUCT 449
Proposition 9.3.14
If n = pk with p and odd prime and k ∈ N∗ , then Aut(Zpk ) ∼
= U (pk ) is a cyclic group of
k−1
order p (p − 1).
Proof. (Left as a guided exercise for the reader. See Exercise 9.3.10.)
Proposition 9.3.15
The group U (2) is trivial. For k ≥ 2, the automorphism group of the cyclic group is
U (2k ) ∼
= Z2 ⊕ Z2k−2 .
Proof. The proposition is obvious for U (2k ) with k = 1, 2. We will suppose henceforth that k ≥ 3.
We show that U (2k ) = h5i ⊕ h−1i.
k−3
We first claim that 52 ≡ 1 + 2k−1 (mod 2k ). This is obvious for k = 3. Suppose that it is
true for some k. Note that
(1 + 2k−1 + c2k )2 = (1 + 2k−1 )2 + 2(1 + 2k−1 )c2k + c2 22k ≡ (1 + 2k−1 )2 (mod 2k+1 )
Therefore,
k−2
52 ≡ (1 + 2k−1 )2 ≡ 1 + 2k + 22k−2 ≡ 1 + 2k (mod 2k+1 ),
where the last congruence holds because 2k − 2 ≥ k + 1 for all k ≥ 3. This establishes the claim by
induction.
We see from the claim that 5 has order 2k−2 . Hence, the order |5| divides 2k−2 but if does not
divide 2k−3 . So |5| = 2k−2 .
Assume that −1 ≡ 5b (mod 2k ) for some b. This would imply that −1 ≡ 1 (mod 4), which is a
contradiction. Hence, −1 ∈ / h5i. By the Direct Sum Decomposition Theorem for groups, we deduce
that U (2k ) ∼
= h−1i ⊕ h5i. Knowing the order of 5, the result follows.
The previous two propositions, coupled with the Chinese Remainder Theorem, give a complete
description of the automorphism groups of cyclic groups. However, if a group is not cyclic, even if
it is abelian, the automorphism groups can become rather complicated. The following proposition
begins to show this.
Proposition 9.3.16
Let p be any primes and let Zpn be the elementary abelian group Zp ⊕ Zp ⊕ · · · ⊕ Zp with
n copies of Zp . Then Aut(Zpn ) ∼
= GLn (Fp ).
Proof. Let V be the vector space of dimension n over the finite field Fp . The group with addition
(V, +) is isomorphic to Zpn . An automorphism ϕ of (V, +) is an invertible homomorphism with
ϕ(a + b) = ϕ(a) + ϕ(b) for all vectors a and b in (V, +). Furthermore, for any positive integer k,
k times
z }| {
ϕ(k · a) = ϕ(a) + ϕ(a) + · · · + ϕ(a) = k · ϕ(a).
Since this holds for all 1 ≤ k ≤ p, then we see that ϕ is an invertible linear transformation on
V = Fnp . Hence, Aut(Zpn ) = Aut(Fnp , +) ∼
= GLn (Fp ).
450 CHAPTER 9. CLASSIFICATION OF GROUPS
Example 9.3.17. From the previous proposition, Aut(Z5 ⊕Z5 ) = GL2 (F5 ). Note that | GL2 (F5 )| =
(25 − 5)(25 − 1) = 480. By Cauchy’s Theorem, GL2 (F5 ) contains an element of order 3. One such
element is
1 2
g= .
1 3
Now Z5 ⊕ Z5 is isomorphic to the additive group (F25 , +) under the isomorphism xa y b ↔ (a, b). Since
a a + 2b
g = ,
b a + 3b
As a last example of an automorphism group, we prove in the exercises (Exercise 9.3.13) that if
n 6= 6, then Aut(Sn ) = Sn .
It consists of permutations that cycle within the blocks {1, 2, 3}, {4, 5, 6}, and {7, 8, 9} and permu-
tations that cycle through the three blocks. The action of H that stays within the blocks is the
subgroup
H = h(1 2 3), (4 5 6), (7 8 9)i
σ(1 2 3)σ −1 = (4 5 6), σ(4 5 6)σ −1 = (7 8 9), and σ(7 8 9)σ −1 = (1 2 3).
Hence, H E G. Setting K = hσi, the subgroups also satisfy G = HK. By Proposition 9.3.9,
G = H oϕ K where ϕ corresponds to K acting on H by conjugation. If x is a generator of Z3 , we
can describe G as
(Z3 ⊕ Z3 ⊕ Z3 ) oϕ Z3
n times
z }| {
L ⊕ L ⊕ ··· ⊕ L
by
σ · (x1 , x2 , . . . , xn ) = (xσ−1 (1) , xσ−1 (2) , . . . , xσ−1 (n) ),
Definition 9.3.18
The wreath product of K on L by the homomorphism ρ : K → Sn is the semidirect product
L oρ K = (L ⊕ L ⊕ · · · ⊕ L) oϕ K
We point out that the order of the wreath product is |L oρ K| = |L|n |K| and that elements of a
wreath product are (n + 1)-tuples in the set Ln × K.
It is possible to give an alternative approach to the wreath product. Set Γ = {1, 2, . . . , n}.
Consider the isomorphism between Fun(Γ, L) and Ln defined by f 7→ (f (1), f (2), . . . , f (n)) and
where the group operation on functions f, g ∈ Fun(Γ, L) is
(f · g)(i) = f (i)g(i),
where the latter operation is in the group L. Then elements of the wreath product L oρ K are pairs
(f, k) ∈ Fun(Γ, L) × K. The operation between elements in the wreath product is
Example 9.3.19. The motivating example for this subsection is a wreath product of Z3 by Z3 .
The homomorphism ρ : Z3 → S3 sends the generator w of Z3 to the 3-cycle (1 2 3). So
We leave it as an exercise for the reader to prove that up to isomorphism there is only one nonabelian
wreath product of Z3 on Z3 and that it has a presentation of
We saw earlier that given a group N and a group K, the semidirect product does not give us all
groups G that have a normal subgroup N with an associated quotient group K = G/N . However,
the following theorem gives a structural upper bound on groups that have a known normal subgroup
with a known associated quotient group.
fg (k) = t−1
k gtπ(g)−1 k .
so for all g ∈ G and all k ∈ G/N , the element fg (k) is in ker π = N . Second, expressing the wreath
product N o K in its functional form, the function Ψ : G → N o K defined by
= t−1
k xytπ(xy)−1 k
= fxy (k).
k−2
(a) Prove that if k ≥ 2 and a ∈ Z with p - a, then (1 + ap)p ≡ 1 + apk−1 (mod pk ).
(b) Deduce that for any a with p - a, the element 1 + ap has order pk−1 in U (pk ).
(c) By Proposition 7.5.2, U (Fp ) = U (p) is a cyclic group. Show that there exists g ∈ Z that is a
generator of U (p) and such that g p−1 6≡ 1 (mod p2 ).
(d) Prove that a g found in the previous part generates U (pk ) to deduce that U (pk ) is cyclic.
[Hint: Recall that p | pj for all j with 1 ≤ j ≤ p − 1.]
11. Determine the isomorphism type of Aut(Z40 ) and express the result in invariant factors form.
12. Determine the isomorphism type of Aut(Z210 ) and express the result in invariant factors form.
13. This exercise guides a proof that Aut(Sn ) = Sn for all n 6= 6.
(a) Prove that for all ψ ∈ Aut(Sn ) and all conjugacy classes K of Sn , the subset ψ(K) is another
conjugacy class.
(b) Let K be the conjugacy class of transpositions and let K0 be another conjugacy class of elements
of order 2 (e.g., cycle type like (a b)(c d)). Prove that |K| 6= |K0 |, unless possibly if n = 6.
(c) Prove that for each ψ ∈ Aut(Sn ) and for all k with 2 ≤ k ≤ n, we have ψ((1 k)) = (a bk ) for
some distinct integers a, b2 , b3 , . . . , bn in {1, 2, . . . , n}.
(d) Show that the transpositions (1 2), (1 3), . . . , (1 n) generate Sn .
(e) Deduce that Aut(Sn ) = Inn(Sn ) ∼ = Sn .
14. Let G be a group. Consider the homomorphism ϕ : G → Aut(G) defined by ϕ(g)(x) = gxg −1 . Prove
that the resulting semidirect product G oϕ G is equal to G ⊕ G if and only if G is abelian. Find a
presentation for D3 oϕ D3 .
15. Give a presentation for a nonabelian semidirect product (Z7 ⊕ Z7 ) oϕ Z3 .
16. Let ϕ : Z4 → S5 be the homomorphism that sends the generator x of Z4 to ϕ(x) = (1 2 3 4). Exer-
cise 9.3.13 showed that Aut(S5 ) = Inn(S5 ) = S5 . Let G = S5 oϕ Z4 . Perform the following calculations
in G.
(a) ((1 4 3)(2 5), x2 ) · ((2 4 5 3), x).
(b) ((1 4 3)(2 5), x2 )−1 .
(c) ((1 3 5 2 4), x3 ) · ((1 3)(2 4), 1) · ((1 3 5 2 4), x3 )−1 .
17. Let G be any group. We define the holomorph of G as the group Hol(G) = G o Aut(G), where
the semidirect product is the natural one where ϕ : Aut(G) → Aut(G) is the identity (not trivial)
homomorphism.
(a) Prove that the holomorph of Zp is a nonabelian group and give a presentation of it.
∼ S4 .
(b) Prove that Hol(Z2 × Z2 ) =
18. Let ρ be the standard permutation representation of S3 acting on {1, 2, 3} and let G = Z5 oρ S3 . Use
the presentation of Z5 = hx | x5 = 1i.
(a) Calculate the product in G of (x, x2 , x, (1 2)) · (x3 , 1, x2 , (1 2 3)).
(b) Calculate the inverse (x, x2 , x4 , (1 3))−1 .
(c) Calculate the general conjugate (xa , xb , xc , σ) · (xp , xq , xr , 1) · (xa , xb , xc , σ)−1 .
19. Give a presentation for the group Z5 o Z3 .
20. Let n be a positive integer and let d be a nontrivial divisor. Show that the largest subgroup of Sn
acting naturally on X = {1, 2 . . . , n} that has n/d blocks of size d is a wreath product Sd o Sn/d .
Calculate the order of this group.
21. Prove that Zp o Zp is a nonabelian group of order pp+1 that is isomorphic to the Sylow p-subgroup of
Sp2 . (See Exercise 8.5.3.)
9.4
Classification Theorems
This section consists primarily of examples of classification of groups of a given order. For certain
integers n, Sylow’s Theorem may allow us to find a normal subgroup of a given order. Then, it may
454 CHAPTER 9. CLASSIFICATION OF GROUPS
be possible to use a semidirect product construction and determine all possible groups of a given
order.
Example 9.4.1 (Groups of Order pq). Example 8.5.10 and Exercise 9.3.5 already established
this classification. We restate the results here. Let |G| = pq.
So there exists a nontrivial homomorphism ϕ of Z3 = hxi into Aut(Z2 ⊕ Z2 ) that sends the
generator x to one of the above automorphisms, described by the matrices. With the first
matrix, we have
So we can create the group (Z2 ⊕ Z2 ) oϕ Z3 . However, this is isomorphic to a group we already
know, namely A4 . Furthermore, the two different homomorphisms into Aut(P ) given by the
two different matrices both produce semidirect products that are isomorphic to A4 .
Case 2. Suppose now that Q is a normal subgroup of G of order 3. The quotient group G/Q is
isomorphic to Z4 or Z2 ⊕ Z2 and we look for nontrivial homomorphisms of each of these groups
into Aut(Z3 ) ∼
= U (3) ∼
= Z2 . The only nontrivial element of Aut(Z3 ) is inversion which we will
call λ.
If G/Q ∼= Z4 = hxi, then the only nontrivial homomorphism into Aut(Z3 ) has ϕ(x) = λ, or in
other words ϕ(x)(h) = h−1 for all h ∈ Q. This is a new semidirect product with a presentation
of
Z3 o Z4 = hx, y | x4 = y 3 = 1, xyx−1 = y −1 i.
9.4. CLASSIFICATION THEOREMS 455
On the other hand, if G/Q ∼ = Z2 ⊕ Z2 , then we have three choices for ϕ depending on which
two (nontrivial) elements of Z2 ⊕ Z2 get sent to λ. One can easily check that the resulting
semidirect products are all isomorphic to S3 ⊕ Z2 ∼
= D6 .
In conclusion, if |G| = 12, then G is isomorphic to one of the following (nonisomorphic) groups:
Z12 , Z6 ⊕ Z2 , D6 , A4 , Z3 o Z4 . 4
Example 9.4.3 (Groups of Order 1225). Let G be a group of order |G| = 1225 = 52 · 72 . We
prove that G is abelian. The index n5 must satisfy n5 | 49 and n5 ≡ 1 (mod 5). The divisors of
49 are 1, 7, and 49. Only 1 satisfies the second condition so n5 = 1. Hence, G has a normal Sylow
5-subgroup P . Let Q be any Sylow 7-subgroup of G. Then P Q ≤ G and
|P | |Q| 52 · 72
|P Q| = = = 1225,
|P ∩ Q| 1
Example 9.4.4 (Groups of Order 286). Let G be a group of order |G| = 286 = 2 · 11 · 13. By
Sylow’s Theorem, n11 (G) divides 26 and n11 (G) ∼ = 1 (mod 11). This implies that n11 (G) = 1 so G
has a normal Sylow 11-subgroup P . Similarly, by Sylow’s Theorem n13 (G) divides 22 and n13 (G) ∼
=1
(mod 13). This implies that n13 (G) = 1 so G has a normal Sylow 13-subgroup Q. The subgroup
P Q is a group of order 143. It is normal since |G : P Q| = 2. By Example 9.4.1, P Q ∼
= Z143 .
Write P Q = hxi with |x| = 143. By Cauchy’s Theorem, G has an element of order 2, say the
element y. Obviously, P Qhyi = G so G is a semidirect product Z143 oϕ Z2 .
Since as rings Z/143Z = Z/11Z ⊕ Z/13Z, the automorphism group is
The group Z10 ⊕ Z12 has three elements of order 2. In U (143), these elements are 12, −1 = 142,
and −12 = 131. Each of these elements leads to three homomorphisms ϕi : Z2 → Aut(Z143 ) with
ϕ1 (y)(x) = x12 , ϕ2 (y)(x) = x131 , and ϕ3 (y)(x) = x−1 . These give 3 semidirect products
We could have approached the classification somewhat differently and we do so now to show the
benefit. First note that Z143 = Z11 ⊕ Z13 and that Aut(Z143 ) = U (11) ⊕ U (13) = Aut(Z11 ) ⊕
Aut(Z13 ). Setting g1 and g2 as generators for Z11 and Z13 respectively, the homomorphisms ϕi
correspond to
( ( (
g1 7→ g1 g1 7→ g1−1 g1 7→ g1−1
ϕ1 (y) : ϕ 2 (y) : ϕ 3 (y) :
g2 7→ g2−1 , g2 7→ g2 , g2 7→ g2−1 .
456 CHAPTER 9. CLASSIFICATION OF GROUPS
Consequently,
G1 ∼
= D13 ⊕ Z11 , G2 ∼
= D11 ⊕ Z13 , and G3 ∼
= D143 .
Furthermore, we easily determine that no two of these groups are isomorphic by counting the ele-
ments of order 2: 13 in G1 , 11 in G2 , and 143 in G3 . 4
In some of the examples above, we encountered a few common situations that we delineate in
the following propositions.
Proposition 9.4.5
Suppose that G is a group with |G| = pa q b where p and q are distinct primes and a, b ≥ 1
are integers. Suppose also that np (G) = 1 or nq (G) = 1. Then G is a semidirect product
between a Sylow p-subgroup and a Sylow q-subgroup.
Proof. If np (G) = 1 or nq (G) = 1 then there is a normal Sylow p-subgroup or a normal Sylow
q-subgroup. Let P be a Sylow p-subgroup and let Q be a Sylow q-subgroup. Then P ∩ Q is a
subgroup of P and of Q so its order must divide gcd(pa , q b ) = 1. Hence, P ∩ Q = {1}. Also, since
P or Q is normal, P Q is a subgroup of G and it has order |P ||Q|/|P ∩ Q| = pa q b . Thus, P Q = G.
By Proposition 9.3.9, G is a semidirect product P oϕ Q or Q oϕ P .
Example 9.4.6 (Groups of Order p2 q, with p 6= q). Let p and q be primes and let G be a
group of order p2 q. Let P ∈ Sylp (G) and let Q ∈ Sylq (G). We break this example into cases.
Case 1: p > q. Since np divides q and np ≡ 1 (mod p) then we must have np = 1. Thus, P E G
and G = P oϕ Q for some homomorphism ϕ : Q → Aut(P ). We now have two subcases:
• P ∼= Zp ⊕ Zp . Then Aut(P ) =∼ GL2 (Fp ) and | Aut(P )| = p(p − 1)2 (p + 1). There exist
nontrivial homomorphisms ϕ when q|(p − 1) or q|(p + 1). Note that if q 1 is the highest
power of q dividing p + 1, then by Sylow’s Theorem applied to Aut(P ), all Sylow q-
subgroups of Aut(P ) are conjugate to each other. Hence, by Exercise 9.3.3, there exists
a unique (up to isomorphism) semidirect product P oϕ Q.
• P ∼= Zp2 . Then Aut(P ) = U (p2 ) ∼
= Zp(p−1) . There exist nontrivial homomorphisms ϕ
when q|(p − 1).
Case 2: p < q. Then nq = 1 + kq and since nq divides p2 then nq must be 1, p or p2 . If nq = 1,
then Q E G. Since q > p then if nq 6= 1, we cannot have nq = 1 + kq = p. Thus, nq = p2 .
We then have kq = p2 − 1 = (p − 1)(p + 1). Hence, q divides p − 1 or p + 1. Since q > p then
q = p + 1 which leads us to the case p = 2 and q = 3, so we are left with discussing the case
|G| = 12. This case was settled in Example 9.4.2. We suppose now that |G| = 6 12.
We know from Proposition 9.4.5 that G = Q oϕ P for some homomorphism ϕ : P → Aut(Q).
We have two subcases.
• P ∼= Zp ⊕ Zp . Since Aut(Q) = U (q) ∼ = Zq−1 , then if p|(q − 1), there exist nontrivial
homomorphisms ϕ : P → Aut(Q).
• P =∼ Zp2 . Again, Aut(Q) = U (q) ∼ = Zq−1 . A homomorphism ϕ : Zp2 → Aut(Zq ) is
determined by where it maps the generator x of Zp2 . By Lagrange’s Theorem |ϕ(x)|
divides gcd(p2 , q − 1), which, depending on p and q might be 1, p, or p2 . 4
Example 9.4.7 (Groups of Order p3 ). Let p be an odd prime and let G be a group of order
p3 . (If p = 2, it is easy to find all the groups of order 8 so we refer the reader to the table in
Section A.2.1.)
For this example, we cannot play Sylow p-subgroups off themselves since the group itself is a
p-group. By FTFGAG, we know that there are three nonisomorphic abelian groups of order p3 ,
namely Zp3 , Zp2 ⊕ Zp , and Zp3 .
9.4. CLASSIFICATION THEOREMS 457
From now on assume G is nonabelian. By the Class Equation, it is possible to prove (Exer-
cise 8.4.14) that every p-group has a nontrivial center. Also, we know that if G/Z(G) is cyclic, then
G is abelian. (See Exercise 4.3.21.) So we must have Z(G) ∼ = Zp . Then G/Z(G) is a p group of
order p2 , which contains a normal N subgroup of order p. By the Fourth Isomorphism Theorem, G
contains a normal subgroup N of order such that N/Z(G) = N .
There are two cases for the isomorphism type of N .
Case 1. G has a normal subgroup N that is isomorphic to Zp2 . Let N = hxi. Assume that G − N
does not contain an element of order p so that all elements of order p are in N . In general, if
g is an element of order p2 , then hgi contains p(p − 1) other generators (of order p2 ). Also, if
g1 and g2 are both of order p2 , then g1k = g2` with k and ` bother relatively prime to p2 , then
hg1 i = hg2 i. Hence, the assumption that N is the only subgroup that contains elements of order
p implies that, if there are k distinct subgroups of order p2 , then p3 = 1 + (p − 1) + kp(p − 1).
Thus, p2 = k(p − 1). This is a contradiction. Hence, G − N contains an element y of order p.
Thus, by Proposition 9.3.9, G is a semidirect product of Zp2 by Zp .
We know that Aut(N ) = ∼ (Z/p2 Z)× = ∼ Zp(p−1) . Hence, again by Cauchy’s Theorem Aut(N )
contains an element of order p. By Exercise 9.3.10, the element 1 + p has order p modulo p2 .
A nontrivial homomorphism ϕhyi → Aut(N ) has ϕ(y)(x) = x1+p . As a group presentation,
2
Zp2 oϕ Zp = hx, y | xp = y p = 1, yxy −1 = x1+p i.
Again, though there are choices for the homomorphism ϕ : hyi → Aut(H), because of the
options for generators of H, the different nontrivial semidirect products are all isomorphic.
Case 2. G does not contain a normal subgroup that is isomorphic to Zp2 . Note that if |x| = p2 ,
then Proposition 9.2.4 hxi E G. Hence, this case implies that all the nonidentity elements in
G have order p. So the normal subgroup N of order p2 has N ∼ = Zp ⊕ Zp and G − N contains
an element z of order p. So G = N hzi and by Proposition 9.3.9, G is a semidirect product
(Zp ⊕ Zp ) oϕ Zp .
By Proposition 9.3.16, Aut(N ) ∼= GL2 (Fp ), which has order (p2 − p)(p2 − 1). By Cauchy’s
Theorem, since p divides (p − p)(p2 − 1), then Aut(N ) contains an automorphism ψ of order
2
But p is the highest order of p dividing | GL2 (Fp )| so, by Sylow’s Theorem, all subgroups of
order p in GL2 (Fp ) are conjugate. Consequently, Exercise 9.3.3, all nontrivial homomorphisms
ϕ : Zp → Aut(Ap ⊕ Zp ) produce isomorphic semidirect products. 4
We reiterate that the methods at our disposal up to this point to classify groups of a given order
n often involve using Sylow p-subgroups, proving that all groups of order n must be a combination
semidirect product of their Sylow p-subgroups, and then determining how many such products are
distinct. Remark 9.3.13 pointed out that it is not always easy to tell when certain semidirect
products are isomorphic. However, there are a few situations in which it is possible to tell, as seen
in Exercises 9.3.3 and 9.3.4. In particular, in Exercise 9.3.3, the desired condition when comparing
H oϕ1 K and H oϕ2 K is that Im ϕ1 and Im ϕ2 are conjugate subgroups in Aut(H). As an additional
strategy using the result of this exercise, if Im ϕ1 and Im ϕ2 happen to be Sylow subgroups of Aut(H),
then by Sylow’s Theorem, they are conjugate subgroups.
458 CHAPTER 9. CLASSIFICATION OF GROUPS
9.5
Nilpotent Groups
We finish this chapter with a brief section on specific class of groups, called nilpotent groups.
Nilpotent groups have classifications that are essentially as easy as one can expect. In essence, the
only complexity of the groups structure comes from the Sylow subgroups. Consequently, we first
explore in more detail properties of p-groups.
9.5. NILPOTENT GROUPS 459
Proposition 9.5.1
Let P be a p-group. If N E P , then N ∩ Z(P ) is nontrivial.
so pk = s + pm for some integer m. Since s = pk − pm, we see that p|s and in particular s 6= 1.
Singleton conjugacy classes correspond to elements in the center. Thus, K1 ∪ K2 ∪ · · · ∪ Ks =
N ∩ Z(P ) and |N ∩ Z(P )| = s. Since s > 1, the intersection N ∩ Z(P ) is a nontrivial subgroup.
In the classification of groups of order p3 (Example 9.4.7), we repeatedly used the fact that
Z(P ) 6= {1} for all p-groups P . This generalizes to the following more profound property.
Proposition 9.5.2
Let P be a p-group of order pk . Then P contains a normal subgroup of order pa for all
0 ≤ a ≤ k.
Proof. We first claim that every p-group of order pk has subgroups of all orders pa for 0 ≤ a ≤ k. We
prove the claim by induction on k. The case k = 1 is trivial. Suppose that the claim is true for k; we
will prove the claim for k + 1. Let P be a p-group of order pk+1 . The center is nontrivial p-subgroup
so by Cauchy’s Theorem, Z(P ) contains an element x of order p. Then hxi E P since x ∈ Z(P ).
But P/hxi is a p-group with order pk so by the induction hypothesis, it contains subgroups H i with
|H i | = pi for i = 1, 2, . . . , k. However, by the Fourth Isomorphism Theorem, P contains subgroups
Hi such that H i = Hi /hxi. But in P , the subgroups have order |Hi | = pi+1 for 1 ≤ i ≤ k, so
together with hxi, we deduce that P has subgroups of order pj for 1 ≤ j ≤ k + 1.
We now prove that a p-group has normal subgroups for all orders pa . Again we prove this by
induction. The case k = 1 is trivial. Suppose the proposition is true for all ` < k; we prove it is true
for k. Let P be a p-group of order pk . Then Z(P ) = pm for some m ≥ 1. Then P/Z(P ) has order
pk−m and k − m < k. By our first claim, Z(P ) has subgroups of all orders,
1 = K0 ≤ K1 ≤ K2 ≤ · · · ≤ Km = Z(P )
with |Ki | = pi . Since they are in Z(P ), they are all normal in P . The quotient group P/Z(P ) is
a p-group of order strictly less than pk . By the induction hypothesis, this group contains normal
subgroups H j of all orders pj with 0 ≤ j ≤ k − m. By the Fourth Isomorphism Theorem, P contains
normal subgroups Hj such that Hj /Z(P ) = H j . Furthermore, |Hj | = pj+m . Hence, the list
1 = K0 , K1 , K2 , . . . , Km = Z(P ) = H0 , H1 , H2 , . . . , Hk−m = P
Cauchy’s Theorem and Sylow’s Theorem gave partial converses to Lagrange’s Theorem; this
proposition gives us yet another partial converse. By Sylow’s Theorem, every group G possesses a
460 CHAPTER 9. CLASSIFICATION OF GROUPS
Sylow p-subgroup P but by Proposition 9.5.2, every group possesses a subgroup of order pi for all
1 ≤ i ≤ k, where |G| = pk m and p - m.
Combining the previous two propositions leads to a yet slightly stronger result about normality.
Proposition 9.5.3
Let P be a p-group and let N E P with |N | = pa . In N there exist normal subgroups in P
of all orders pi with 0 ≤ i ≤ a.
Proof. We prove this by induction on k, where pk is the order of P . If k ≤ 1, the result is trivial.
Suppose that the proposition is true for k; we will show the proposition holds for k + 1. Let P have
order pk+1 and suppose that N is a nontrivial normal subgroup of P of order pa . By Proposition 9.5.1,
N ∩ Z(P ) is nontrivial and by Cauchy’s Theorem contains an element x of order p. By the Third
Isomorphism Theorem, N = N/hxi E P/hxi = P and N has order pa−1 . The group P has order k so
by the induction hypothesis, N contains subgroups N j normal in P of orders pj for all 0 ≤ j ≤ a − 1.
By the Fourth Isomorphism Theorem, for each j, there is a normal subgroup group Nj EP satisfying
hxi ≤ Nj ≤ N
and |Nj | = pj+1 for all 0 ≤ j ≤ a − 1. This proves the claim for k + 1. By induction, the proposition
holds for all positive integers k.
The point about this proposition is that not only does P have normal subgroups of all possible
orders by Proposition 9.5.2, but every normal subgroup also contains normal subgroups in P of all
possible orders dividing |N |.
In the above proofs, we used Cauchy’s Theorem and the Fourth Isomorphism Theorem to prove
results by “working our way up” from an element of order p in Z(G). However, it is also possible to
deduce some properties about the subgroup lattice of a p-group from the top down.
Lemma 9.5.4
If H is a proper subgroup of a p-group P , then H NP (H), i.e., H is strictly contained in
its normalizer.
Proof. The proof of this lemma is similar to the above proofs and uses induction on k = logp |P |.
For k = 1 or 2, the lemma holds trivially since P is abelian. Suppose the lemma is true for all ` ≤ k;
we will prove it for k. Since Z(P ) commutes with all elements in P , it satisfies Z(P ) ≤ NP (H). But
H ≤ NP (H) so we have two cases. Case 1: If Z(P ) is not contained in H, then hZ(P ), Hi ≤ NP (H)
and H is a strict subgroup of hZ(P ), Hi, thereby establishing the induction. Case 2: If Z(P ) ≤ H
then by the Fourth Isomorphism Theorem, H corresponds uniquely with a subgroup H in P =
P/Z(P ), via H = H/Z(P ). Since Z(P ) 6= {1}, then |P/Z(P )| < pk so by the induction hypothesis
H NP (H). It follows from the Fourth Isomorphism Theorem that NP (H) = NP (H)/Z(P ) and
also that H NP (H).
A priori, maximal subgroups of a group are maximal by containment (not order). Hence, maximal
subgroups do not have to all be of the same order. The lattice subgroup of D6 in Example 3.6.8
shows that D6 has maximal subgroups of order 6 and of order 4. With p-groups the situation is
more limited.
Proposition 9.5.5
Every maximal subgroup of a p-group has index p and is a normal subgroup.
Definition 9.5.6
The chain of subgroups
Z0 (G) ≤ Z1 (G) ≤ Z2 (G) ≤ · · ·
is called the upper central series of G.
Definition 9.5.7
A group G is called nilpotent if Zc (G) = G for some index c. The minimum index c such
that Zc (G) = G is called the nilpotence class of G.
If G is abelian, then Z1 (G) is abelian and hence Zi (G) = G for all i ≥ 1. In fact, a group is
abelian if and only if it is nilpotent of nilpotence class 1. In contrast, since Z(Sn ) = 1 whenever
n ≥ 3, so Z1 (Sn ) = {1} and also Zi (Sn ) = {1} for all i ≥ 1. If G is a finite group, then the upper
central series has a maximal element but from the two extreme examples just mentioned, it need
not terminate at G.
Proposition 9.5.8
A nilpotent group is solvable. In contrast, not every solvable group is nilpotent.
is a chain of successively normal subgroups of G in which Zi (G)/Zi−1 (G) is the center of G/Zi−1 (G).
The center of any group is abelian so the quotients are all abelian. Hence, G is solvable.
For the second part of the proposition, recall that the group S4 is solvable. (See the commutator
series in Example 9.1.12.) However, S4 is not nilpotent. Hence, though all nilpotent groups are
solvable, the reverse is not true.
Example 9.5.9. We propose to calculate the upper central series of D12 . Recall that Z(Dn ) is
hrn/2 i is n is even and {1} if n is odd.
definition Zi (D12 )
Z1 (D12 ) = hr6 i =⇒ D12 /Z1 (D12 ) ∼
= D6
Z2 (D12 )/hr6 i = Z(D12 /Z1 (D12 )) ∼
= Z(D6 ) = hr3 i =⇒ Z2 (D12 ) = hr3 i =⇒ D12 /Z2 (D12 ) ∼
= D3
Z3 (D12 )/hr3 i = Z(D12 /Z2 (D12 )) ∼
= Z(D3 ) = {1} =⇒ Z3 (D12 ) = hr3 i
The upper central series now terminates. Note that Zi (D12 ) = hr3 i for all i ≥ 2. Hence, D12 is not
a nilpotent group. 4
The following proposition shows why nilpotent groups generalize p-groups. We leave the proof as
an exercise for the reader since it is similar to many of the proofs for the propositions on p-groups.
462 CHAPTER 9. CLASSIFICATION OF GROUPS
Proposition 9.5.10
Let p be a prime and let P be a p-group of order pk , with k ≥ 2. Then P is nilpotent of
nilpotence class at most k − 1. (If k = 1, then the nilpotence class is 1.)
The class of nilpotent groups is important because of the characterization provided in the follow-
ing theorem. Most importantly, part (4), along with knowledge of possibilities for p groups, gives a
classification of nilpotent groups.
Theorem 9.5.11
Let G be a finite group of order pα 1 α2 αs
1 p2 · · · ps , with pi distinct primes. For all i = 1, 2, . . . , s,
let Pi ∈ Sylpi (G). The following are equivalent:
(1) G is nilpotent;
(2) Every proper subgroup of G is a proper subgroup of its normalizer;
(3) npi (G) = 1 for all pi ;
(4) G ∼
= P1 ⊕ P2 ⊕ · · · ⊕ P s .
Proof. (1) =⇒ (2): Similar to the proof of Lemma 9.5.4. (Left as an exercise for the reader. See
Exercise 9.5.4.)
(2) =⇒ (3): Call Hi = NG (Pi ) for all ≤ i ≤ s. Since Pi E Hi , Pi is in fact characteristic in
Hi by Proposition 8.5.9. Furthermore, Pi is characteristic in the possibly larger group NG (Hi ). By
definition of the normalizer, this entails that Hi = NG (Hi ). The hypothesis (2) implies that Hi is
not a proper subgroup, so Hi = G for all i, which means that Pi E G.
(3) =⇒ (4): This implication follows immediately from the Direct Sum Decomposition Theorem.
(4) =⇒ (1): This follows from the property that the direct sum of nilpotent groups is again
nilpotent. We leave this as an exercise for the reader. (Exercise 9.5.5.)
9.6
Projects
Project I. Automorphism Groups. Section 9.3 gave the isomorphism type of Aut(G) for a few
groups G. Can you come up with others? For example, can you determine the automorphism
group of Z3 o Z4 , Z5 o Z4 , Z7 o Z3 , (Z3 × Z3 ) o Z2 , or others?
Project II. Semidirect Products Zq o Zp in R3 . From a geometric perspective, the dihedral
group Dn = Zn oZ2 can be viewed as a group of transformations in R3 generated by a rotation
by π around one axis and a rotation by 2π/n around a perpendicular axis. Are there primes
p < q such that the nontrivial semidirect product Zq oZp can be viewed as a group of rotations
in R3 ? (In other words, is there an injective homomorphism f : Zq o Zp → GL3 (R)?) If so,
what is the angle between the generating axes? For a specific example, give rotation matrices
for the generators of Zq o Zp .
Project III. Abelian by Number. Explore conditions on integers n such that a group of order
n must be abelian.
Project IV. Groups of Order p4 . Explore the automorphism groups of groups of order p3 . Use
this information to try to classify groups (or find as many groups as possible) of order p4 .
Project V. Groups of Order p3 q. Revisit Exercise 9.4.14. Explore how much of the exercise
generalizes to groups of order 33 · 37. Explore how much of the exercise generalizes to groups
of order 8p, where p is an odd prime. Generalize the explorations as much as possible.
Project VI. Big & Nasty Groups. Be creative and use semidirect products to come up with
large groups whose centers are trivial. Give presentations for your groups.
Project VII. Unipotent Matrices. If F is a finite field, the group of unipotent triangular
matrices is the group Un (F ) of matrices in GLn (F ) of upper triangular matrices with 1s on
the diagonal. Obviously, Un (Fp ) is a p-group of order pn(n−1)/2 . Try to determine the upper
central series of Un (Fp ); also study the commutator series.
Project VIII. Exploring the Universal Embedding Theorem. Recall the Universal Embed-
ding Theorem (Theorem 9.3.20). Explore this theorem with some explicit examples of groups
G with a normal subgroup N such that G is not a semidirect product of N and G/N (i.e.,
such that G does not contain a subgroup isomorphic to G/N ). The quaternion group Q8 with
N = h−1i satisfies this property, so determine the appropriate wreath product and explicitly
describe the embedding of Q8 in this wreath product. Find and study the same questions with
a least one other group besides Q8 to which this criterion applies.
10. Modules and Algebras
Chapter 8 presented the concept of the action of a group on a set. Recall that if a group G acts on a
set X, the elements of G behave as functions on X in such a way that the group operation behaves
as the function composition on X.
The general concept of an action of one algebraic structure on another structure is similar. The
elements of a first structure behave as functions/morphisms on the second structure but in such
a way that operations in the first structure behave as operations on the relevant set of functions.
Section 8.6 began to explore properties of groups acting on vector spaces. There, we emphasized
that the resulting action becomes a new algebraic structure in its own right. Furthermore, we also
saw how fruitful the study of action structures is to the understanding of both structures separately.
This chapter introduces action structures of rings. These are called modules and algebras de-
pending on the structure upon which the ring acts. Sections 10.1 and 10.2 present specific examples.
Boolean algebras are important in their own right and could be studied independently of our theme
of action structures. In Section 10.2, we review concepts of vector spaces, emphasizing the points
listed in the preface.
Sections 10.3 through 10.5 introduce the structure of a module over a ring. This includes most
of the guiding themes in the Outline for each algebraic structure: modules, submodules, examples,
homomorphisms, quotient modules, and convenient ways of describing the internal structure.
As an application of the theory of modules and their decompositions, we discuss finitely generated
modules over a PID in Sections 10.6 and 10.7. This leads to a generalization of the Fundamental
Theorem of Finitely Generated Abelian Groups. In turn, applying theorems for decomposition of
modules to F [x]-modules leads us to classification results for linear transformations. Section 10.8
presents the method of applying the Fundamental Theorem for Finitely Generated Modules over a
PID to linear transformations and presents the rational canonical form. Section 10.9 introduces the
Jordan canonical form of a matrix, an important result for many applications of linear algebras.
Finally, Section 10.11 introduces path algebras and their modules. Though there are many other
important modules and algebras that arise in various branches of mathematics (e.g., Lie algebras,
differential graded algebras, etc.), path algebras and their modules have drawn some interest in
research recently and have interesting consequences for linear algebra.
10.1
Boolean Algebras
Logic underlies mathematical reasoning. The epistemological strength of mathematical theorems rely
on the strict rules of logic. It may come as a surprise to some readers (though not to philosophers)
that there exist various types of logic [51]. Mathematical reasoning follows Boolean logic.
Boolean logic presupposes incontrovertible notions of true and false and involves statements,
called propositions, that are decidably true or false. A philosopher would view the previous sentence
as riddled with deficiencies. The word “incontrovertible” is problematic; the linguistic construct of
“statement” would require a clear definition; the bond between a string of symbols and meaning
ascribed to it by a reader would need careful investigation; and the process of deciding whether a
proposition is true or false poses yet more challenges. These issues and many more are the purview
of the philosophy of mathematics. In fact, a few philosophers of mathematics propose the use of
alternate logics, perhaps most notably intuitionist logic.
Boolean logic holds many of the deep problems of philosophy at bay by proposing an algebraic
model for logical reasoning. This algebra provides a method to deduce whenever one proposition is
equivalent to another or when one implies the other.
465
466 CHAPTER 10. MODULES AND ALGEBRAS
The propositional logic of Boolean logic is just one example of an algebraic structure called
Boolean algebra.
Definition 10.1.1
A Boolean algebra is a quadruple (B, ∧, ∨, ) where B is a set, ∨ and ∧ are binary operations
B × B → B, and is a function B → B such that there exist elements 0, 1 ∈ B and for all
x, y, z ∈ B the following identities hold:
We pronounce the operations ∧ as “wedge” and ∨ as “vee” and as the complement. Because
of applications to logic, which we will explain below, we also call these operations ∧ as AND, ∨ as
OR, and as NOT.
Boolean algebras satisfy a number of other “laws.” However, we do not list them in the definition
because these other laws can be proven from the given set of axioms.
Proposition 10.1.2
Let (B, ∧, ∨, ) be a Boolean algebra, the following properties also hold for all x, y ∈ B.
( (
0=1 0∧x=0
Basic negations Dominance laws
1=0 1∨x=1
(
x∧x=x
Double negative x = x Idempotent laws
x∨x=x
( (
x∧y =x∨y x ∧ (x ∨ y) = x
DeMorgan laws Absorption laws
x∨y =x∧y x ∨ (x ∧ y) = x
Proof. Let us first prove the basic negations. Applying the the first complement law to 1 gives
1 ∧ 1̄ = 0. However, by the first identity law we get 1̄ = 0. The second basic negation is proved
similarly.
Next, we prove the idempotent laws. For any x ∈ B,
To prove the first DeMorgan law, we prove a ∧ b = 0 and a ∨ b = 1 for a = x ∨ y and b = x ∧ y. First,
(x ∨ y) ∧ (x ∧ y) = (x ∨ y) ∧ (x ∧ y) = (x ∧ x ∧ y) ∨ (y ∧ y ∧ x) = (0 ∧ y) ∨ (0 ∧ x) = 0 ∨ 0.
Second,
(x ∨ y) ∨ (x ∧ y) = (x ∨ y) ∨ (x ∧ y) = (x ∨ x ∨ x) ∧ (y ∨ x ∨ y) = (1 ∨ y) ∧ (1∨) = 1 ∧ 1 = 1
Using the claim, we conclude that
(x ∨ y) = x ∧ y.
A similar argument establishes the second DeMorgan law.
Example 10.1.3 (Boolean Logic). Boolean logic served as a motivating example for this section.
We can view Boolean logic as a Boolean algebra in the following sense.
Let S be the set of all logical propositions, statements that are decidably true or false. Note that
this set is enormous. The conjunction AND, the disjunction OR, and the negation NOT, give the
quadruple (S, AND, OR, NOT) the structure of a Boolean algebra. This is precisely the algebraic
structure of Boolean logic with one slightly unnatural adjustment. A Boolean algebra must contain
identities 0 and 1, which for the purpose of Boolean logic, are statements that are respectively always
true and always false. It is not uncommon to denote these respectively by T and F.
Boolean logic involves a variety of other symbols and notational habits. As in algebra, a propo-
sition is denoted with letters, such as p and q. For example, we might say that p=“Springfield is the
capital of Illinois” and q =“2+2=5.” Then p ∧ q, read “p and q” is the proposition
“Springfield is the capital of Illinois and 2 + 2 = 5.”
Among some notational habits, in the propositional calculus of Boolean logic, we typically use
the symbol ≡ for logically equivalent even though in a generic Boolean algebra we retain the use
of =. In this context, logicians often write ¬p instead of p. Also in Boolean logic, the implication
operation p → q is important as is the biconditional p ↔ q. These are defined algebraically by
p→q ≡p∨q or ¬p ∨ q
p ↔ q ≡ (p ∧ q) ∨ (p ∧ q) or (p ∧ q) ∨ (¬p ∧ ¬q).
Recall that in propositional logic, a tautology is a propositional form that is always true, i.e., a
Boolean algebra expression that reduces to T. 4
468 CHAPTER 10. MODULES AND ALGEBRAS
The definitions allow for a trivial Boolean algebra that consists of a single element 0 = 1. Then
the binary operations and the ) function are all trivial and all the axioms hold trivially. The basic
Boolean algebra is more important.
Example 10.1.4 (Basic Boolean Algebra). From the definition, a Boolean algebra must con-
tain at least two distinct elements. Interestingly enough, there exists a Boolean algebra with exactly
two elements and the operations on the elements are completely determined by the axioms of a
Boolean algebra:
0 ∧ 0 = 0, 0 ∧ 1 = 0, 1 ∧ 0 = 0, 1 ∧ 1 = 1,
0 ∨ 0 = 0, 0 ∨ 1 = 1, 1 ∨ 0 = 1, 1 ∨ 1 = 1,
0 = 1, 1 = 0.
Because of its application to Boolean logic, it is also common to consider the basic Boolean algebra
as the pair B = {F, T}, where we read F as the English “false” or Boolean 0, and T as the English
“true” and Boolean 1. Then the operators ∧, ∨, and correspond to AND, OR, and NOT applied
to these logical values. 4
Example 10.1.5 (Set Theory). Another important application of Boolean algebra is the set of
subsets and the standard operations on them. Though we did not dwell on the algebraic properties
of the operations on sets in Section 1.1, it is not hard to show that for any set S, the quadruple
(P(S), ∩, ∪, ) is a Boolean algebra in which the subset S serves as the 1 and the empty set ∅ serves
as the 0. 4
Example 10.1.6 (Boolean Functions). Let B be any Boolean algebra and let X be any set.
Consider the set of functions from X to B denoted by Fun(X, B).
Define binary operations ∧, ∨ and the function on Fun(X, B) by
def
(f ∧ g)(x) = f (x) ∧ g(x) for all x ∈ X,
def
(f ∨ g)(x) = f (x) ∨ g(x) for all x ∈ X,
def
f (x) = f (x) for all x ∈ X.
These operations make the quadruple (Fun(X, B), ∧, ∨, ) into another Boolean algebra. 4
Just as in high school algebra and elementary ring theory, a common exercise in Boolean algebras
involves simplifying an expression in a Boolean algebra as much as possible. Some of the section
exercises practice this skill but the reader can find examples of such simplification in the proofs of
various propositions and theorems in this section.
Definition 10.1.7
Let (B1 , ∧1 , ∨1 , ) and (B2 , ∧2 , ∨2 , ) be two Boolean algebras. A function ϕ : B1 → B2 is
called a Boolean algebra homomorphism if
Example 10.1.8. Let P be the set of logical propositions and let B = {F, T} be the basic Boolean
algebra. The function ϕ : P → B that associates to each proposition in Boolean logic its truth value
is a Boolean algebra homomorphism. 4
Example 10.1.9. Let S be a set and let B = {0, 1} be the basic Boolean algebra. Consider the
function ϕ : Fun(S, B) → P(S) by
The function ϕ is a Boolean algebra isomorphism. We might suspect this by viewing functions in
Fun(S, B) as deciding whether a given element is in a subset or not. We first point out that ϕ is a
bijection with ϕ−1 (A) = χA , where χA is the characteristic function defined by
(
1 if s ∈ A
χA (s) =
0 if s ∈/ A.
(See Exercise 1.1.28.) We show that ϕ is a homomorphism directly. Observe first that in the basic
Boolean algebra, x ∧ y = 1 if and only if x = y = 1. Let f, g ∈ Fun(S, B). Then
The property that ϕ(f ∨ g) = ϕ(f ) ∪ ϕ(g) follows in an identical manner from the fact that in the
basic Boolean algebra x ∨ y = 1 if and only if x = 1 or y = 1. Finally,
By now, having seen and worked with subobjects in the contexts of groups and rings, the reader
should be able to define the concepts of a Boolean subalgebra and the direct sum of two Boolean
algebras. We provide it for completeness.
Definition 10.1.10
Let (B, ∧, ∨, ) be a Boolean algebra. A Boolean subalgebra of B is a nonempty subset B 0
such that B 0 is closed under ∧, under ∨ and under . In other words, for all x, y ∈ B 0 , the
elements x ∧ y, x ∨ y and x are in B 0 .
Definition 10.1.11
Let (B1 , ∧1 , ∨1 , ) and (B2 , ∧2 , ∨2 , ) be two Boolean algebras. The direct sum of these
Boolean algebras, denoted by B1 ⊕ B2 is the quadruple (B1 × B2 , ∧, ∨, ), where
As with other structures, it is possible to define the n-tuple direct sum of Boolean algebras. Also,
we denote by B n the n-tuple direct sum of a Boolean algebra B with itself.
470 CHAPTER 10. MODULES AND ALGEBRAS
Theorem 10.1.12
Let (B, ∧, ∨, ) be a Boolean algebra. Define the binary operation + : B × B → B by
def
x + y = (x ∧ y) ∨ (x ∧ y).
Then (B, +, ∧) is a commutative ring with identity 1 6= 0. The additive identity is 0 and
for all x ∈ B is x + x = 0.
Proof. The axioms of a Boolean algebra give us that ∧ is associative, commutative, and has the
identity of 1. We need to prove the other ring axioms.
Let x, y, z ∈ B. To prove commutativity,
y + x = (y ∧ x) ∨ (y ∧ x) = (y ∧ x) ∨ (y ∧ x)
= (x ∧ y) ∨ (x ∧ y) = x + y.
For associativity, we first simplify
(x + y) + z = (x ∧ y) ∨ (x ∧ y) ∧ z ∨ (x ∧ y) ∨ (x ∧ y) ∧ z
= (x ∧ y ∧ z) ∨ (x ∧ y ∧ z) ∨ (x ∨ y) ∧ (x ∨ y) ∧ z
= (x ∧ y ∧ z) ∨ (x ∧ y ∧ z) ∨ (x ∧ x) ∨ (x ∧ y) ∨ (y ∧ x) ∨ (y ∧ y) ∧ z
= (x ∧ y ∧ z) ∨ (x ∧ y ∧ z) ∨ 0 ∨ (x ∧ y) ∨ (y ∧ x) ∨ 0 ∧ z
= (x ∧ y ∧ z) ∨ (x ∧ y ∧ z) ∨ (x ∧ y ∧ z) ∨ (x ∧ y ∧ z).
We see that this result is symmetric in x, y and z, so we deduce that (x + y) + z = (y + z) + x =
x + (y + z), by commutativity of +. Hence, + is associative.
It is easy to see that for all x ∈ B,
x + 0 = (x ∧ 1) ∨ (x ∧ 0) = x ∨ 0 = x,
so 0 is the identity for +. Furthermore, + has inverses since for all x ∈ B,
x + x = (x ∧ x) ∨ (x ∧ x) = 0 ∧ 0 = 0.
Finally, we need to show that ∧ is distributive over +. Let x, y, z ∈ B. Then
(x + y) ∧ z = (x ∧ y) ∨ (x ∧ y) ∧ z = (x ∧ y ∧ z) ∨ (x ∧ y ∧ z)
whereas
(x ∧ z) + (y ∧ z) = ((x ∧ z) ∧ (y ∧ z)) ∨ ((x ∧ z) ∧ (y ∧ z))
= (x ∧ z) ∧ (y ∨ z) ∨ x ∨ z) ∧ (y ∧ z)
= (x ∧ z ∧ y) ∨ (x ∧ z ∧ z) ∨ (x ∧ y ∧ z) ∨ (z ∧ y ∧ z)
= (x ∧ z ∧ y) ∨ (x ∧ 0) ∨ (x ∧ y ∧ z) ∨ (y ∧ 0)
= (x ∧ z ∧ y) ∨ 0 ∨ (x ∧ y ∧ z) ∨ 0
= (x ∧ y ∧ z) ∨ (x ∧ y ∧ z).
This shows right-distributivity. Since, ∧ is commutative, left-distributivity also holds.
10.1. BOOLEAN ALGEBRAS 471
Exercise 5.1.35 presented the concept of a Boolean ring as a ring R in which r2 = r for all r ∈ R.
In that exercise, one is asked to show that r + r = 0 for all r ∈ R and that R is commutative.
Consequently, the ring (B, +, ∧) in Theorem 10.1.12 is a Boolean ring. A converse is also true.
Theorem 10.1.13
Let (B, +, ∧) be a Boolean ring. Then defining the operations,
x ∨ y = (x ∧ y) + (x + y) and x = 1 + x,
It is not uncommon to use abstract ring notation and not to write the symbol ∧ so that the
expression xy means x ∧ y, or xyz means x ∧ y ∧ z. In this notation, the operator ∧ takes precedence
over + so that xy ∨ z stands for (x ∧ y) ∨ z. Some authors carry this habit over to Boolean algebras
so that the expression xyz ∨ xyz stands for
(x ∧ y ∧ z) ∨ (x ∧ y ∧ z).
x x
y xy y xyz
z
The electronic circuit diagram for the OR gate with two inputs and also with three inputs is
x x
y x∨y y x∨y∨z
z
The electronic circuit diagram for the NOT gate (with a single input) is
x x
Example 10.1.15. Typically, to represent a Boolean function with logic gates, we draw the inputs
to the left with lines emanating from them representing wires from the inputs. The wires may split
in the circuit that is necessary. For example, a logic gate diagram for the function F (x, y) = xȳ ∨ x̄y
is the following.
F (x, y)
The bump on one wire is used to indicate that the two crossing wires do not intersect. 4
(x ∧ y) ∨ (x ∧ (y ∨ x)).
2. Use rules of Boolean algebra to simplify the following expression as much as possible:
3. Prove that x̄y ∨ ȳz ∨ z̄x = ȳx ∨ z̄y ∨ x̄z for all x, y, z in a Boolean algebra B.
10.1. BOOLEAN ALGEBRAS 473
4. Consider the expression in a Boolean algebra (x → y) → ¬(y → z), where we are borrowing symbols
from propositional (Boolean) logic.
(a) Rewrite only using ∧, ∨, and x for negation.
(b) If we regard this expression as a Boolean function F (x, y, z), evaluate F (1, 0, 0).
5. Prove that the Boolean ring associated to the basic Boolean algebra B = {0, 1} is the ring (Z/2Z, +, ×).
6. On the interval of real numbers [0, 1], define the binary operators x ∧ y = min(x, y) and x ∨ y =
max(x, y), and the function x = 1 − x, using 0 as 0 and 1 as the 1. Prove that ([0, 1], ∧, ∨, ) satisfies
all the axioms of a Boolean algebra except the complement laws. Prove also that ([0, 1], ∧, ∨, ) satisfies
all the supplementary laws in Proposition 10.1.2.
7. Use the axioms of a Boolean algebra to prove that the following propositional forms from Boolean
logic are tautologies.
(a) p → (p ∨ q)
(b) ((p → q) ∧ ¬q) → ¬p
(c) ((p → q) ∧ (q → r)) → (p → r)
8. Prove that for all x, y in a Boolean algebra B, x ∧ y = y if and only if x ∨ y = x.
9. Let (B, ∧, ∨, ) be a Boolean algebra. Define the relation 4 on B by x 4 y when x ∧ y = x.
(a) Prove that 4 defines a partial order on B.
(b) Show this partial order 4 on the Boolean algebra P(S), where S is a set, is precisely the con-
tainment partial order ⊆.
10. Let S be a set and let A be a fixed subset of S. Consider the Boolean algebras of P(S) and P(A). Prove
that the function ϕ : P(S) → P(A) defined by ϕ(X) = X ∩ A is a Boolean algebra homomorphism.
[Hint: Distinguish the complement in S from the complement in A.]
11. Let (B, ∧, ∨, ) be a Boolean algebra.
(a) Show that (B, ∨, ∧, ) is also a Boolean algebra.
(b) Prove that the function ϕ : B → B defined by ϕ(x) = x is an isomorphism from (B, ∧, ∨, ) to
(B, ∨, ∧, ).
12. Let f : S → T be a function between sets. Prove that the function ϕ : P(T ) → P(S) defined by
ϕ(X) = f −1 (X) is a Boolean algebra homomorphism.
13. Let (B1 , ∧1 , ∨1 , ) and (B2 , ∧2 , ∨2 , ) be two Boolean algebras and let ϕ : B1 → B2 be a Boolean
algebra homomorphism. Define the kernel and image of ϕ
19. Draw the logic gate diagrams corresponding to each of the following Boolean functions.
(a) x̄ ∨ y
(b) (x ∨ y) ∧ x
20. Draw the logic gate diagrams corresponding to each of the following Boolean functions.
(a) xyz ∨ x̄ȳz̄
(b) (x̄ ∨ z)(y ∨ z̄)
10.2
Vector Spaces
Many students of natural sciences and even some in social sciences must study linear algebra. Indeed,
any problem that involves some form of analysis of functions involving more than one variable—
whether they be prices, quantities of commodities, coordinates, populations of various species, or
any type of data—requires linear algebra. Just like any branch of mathematics that has many
applications in science and mathematics, linear algebra encompasses a variety of standard topics,
theorems, and algorithms that are specific to it. However, at the heart of linear algebra lies the
algebraic structure of vector spaces.
This textbook assumes linear algebra as a prerequisite. Vector spaces over R, should be familiar
to the reader. Consequently, this section neither serves as an introduction to vector spaces nor
attempts to mention all the interesting applications. Instead, as another motivation for modules,
this section serves as a summary and reminder of various definitions and theorems of vector spaces,
with an emphasis on a structuralist view, following the Outline provided in the preface.
Definition 10.2.1
Let F be a field. A vector space over F (or F -vector space) is a nonempty set V equipped
with a binary operation + and a function F × V → V , called scalar multiplication, denoted
(c, v) 7→ cv, such that (V, +) is an abelian group with identity 0 and such that the scalar
multiplication satisfies the following identities:
Elements of V are generically called vectors and the elements of F are called the scalars.
It is valuable to compare the axioms of this definition to those of a group action. The scalar
multiplication means that each scalar in F acts on the vectors (elements of the abelian group V )
as functions V → V . However, axiom (1) states that each scalar acts not just as a function but as
a group homomorphism on (V, +). Axiom (2) requires the group (F, +) behaves like a subgroup of
(Fun(V, V ), +). Finally, axioms (3) and (4) precisely require that scalar multiplication is a group
action of (U (F ), ×) on V .
From a historical perspective it might have taken time for the axioms of a vector space to emerge
in this way. In retrospect they should seem natural. Vector spaces over a fixed field F are an action
structure in which the field F acts on the structure of an abelian group. As with group actions,
vector spaces over different fields are different algebraic structures.
10.2. VECTOR SPACES 475
Recall that the additive identity in V is denoted by 0. Also recall that one of the first propositions
in vector spaces shows that 0v = 0 for all vectors v ∈ V (where the 0 on the left is the additive
identity in (F, +) and the 0 on the right is the additive identity in V ). A corollary thereof is that
(−1)v = −v, where −1 ∈ F and −v is the additive inverse of the vector v.
Large portions of linear algebra courses usually serve as motivation for defining abstract vector
spaces. The ability to solve a system of linear equations (along with all the usual applications to
natural and social sciences), applications to transformation geometry, computer graphics, and many
others motivate the study of vectors and matrices. Introductions to real vector spaces show how not
only Rn satisfies the axioms of a vector space but so do polynomials of degree n or less with real
coefficients; all real polynomials, R[x]; n × m matrices; the set of complex numbers C; the set of real
functions Fun(R, R); the set C 0 ([a, b], R) of continuous real-valued functions over the interval [a, b];
the set `∞ (R) of real valued sequences; and many more.
Example 10.2.2 (Trivial Vector Space). The simplest vector space consists of the abelian group
V of a single element V = {0}. All the axioms of a vector space are trivially verified. 4
Example 10.2.3. For any vector space over a field F , the first example is that V = F is itself
a vector space. Note that (F, +) is an abelian group. The scalar multiplication is simply the
usual multiplication in the field F . Axioms (1) and (2) correspond to left- and right-distributivity,
respectively. Axiom (3) is associativity of multiplication in F and axiom (4) is the identity axiom in
(F, ×). For the purposes of showing that F is itself an F -vector space, that the multiplication has
inverses is irrelevant. 4
Proposition 10.2.4
Let F be a field and let V and W be vector spaces. The Cartesian product V × W equipped
with addition and scalar multiplication
def def
(v1 , w1 ) + (v2 , w2 ) = (v1 + v2 , w1 + w2 ) and c(v, w) = (cv, cw)
is a vector space.
Definition 10.2.5
The vector space defined in Proposition 10.2.4 is called the direct sum of V and W and is
denoted V ⊕ W .
It is equally possible to define the direct sum of n different vector spaces. The direct sum of a
vector space V with itself n times is denoted by V n . In particular, for all positive integers n, the
Cartesian product F n has the structure of a vector space over F in which the addition of n-tuples
and scalar multiplication on n-tuples is performed component-wise.
As with groups, the concept of direct sum generalizes to an arbitrary collection of vector spaces.
Definition 10.2.6
Let V = {Vi }i∈I be a collection of vector spaces over the field F , indexed by the set I.
The collection of all choice functions from I into the collection {Vi }i∈I is called the direct
product of this collection and is denoted
Y
Vi .
i∈I
476 CHAPTER 10. MODULES AND ALGEBRAS
We remind the reader that the existence of choice functions is the content of the Axiom of Choice.
They are not functionsQin the usual sense, i.e., from a given set to another given set. It is convenient
to denote elements in i∈I Vi with letters such as f or g, to designate choice functions.
Definition 10.2.7
The direct sum of the collection {Vi }i∈I is the subset of the direct product consisting of
choice functions that map to the 0 element in Vi for all but a finite number of indices i ∈ I.
The direct sum is denoted M
Vi .
i∈I
If I is finite with I = {1, 2, . . . , n}, then the direct product and the direct sum are both V1 ⊕
V2 ⊕ · · · ⊕ Vn . If I is countable, then we can think of the direct product as sequences of elements,
with each one taken from a different Vi . However, if I is uncountable, the intuition of sequences no
longer applies.
Proposition 10.2.8
Let V = {Vi }i∈I be a collection of vector spaces over the field F . The direct product
Q and
the direct sum of V are vector spaces under the operations f + g and cf for f, g ∈ i∈I Vi
and c ∈ F defined by
def def
(f + g)(i) = f (i) + g(i) and (cf )(i) = c(f (i)) for all i ∈ I.
Q
Proof. Proposition 9.3.2 tells us that i∈I Vi is an abelian group with respect to the addition of
choice functions. To prove axiom (1) in Definition 10.2.1, let c ∈ F and let f, g be two choice
functions. For all i ∈ I, the choice function c(f + g) satisfies
(c(f + g))(i) = c ((f + g)(i)) = c(f (i) + g(i)) = c(f (i)) + c(g(i)) = (cf )(i) + (cg)(i).
Hence, c(f + g) = (cf ) + (cg) and (1) holds. The proofs of the remaining axioms are similar.
Definition 10.2.9
Let V be a vector space over a field F and let S be a subset of V . A linear combination of
elements in S is any vector of the form
c1 v1 + c2 v2 + · · · + cr vr
where ci ∈ F and vi ∈ S for i = 1, 2, . . . , r and where r is a positive integer. The set of all
linear combinations of elements in S is called the span of S and is denoted Span(S).
It is important to note that linear combinations only involve a finite number of vectors from S,
even if S is an infinite set. This requirement avoids concerns over convergence.
10.2. VECTOR SPACES 477
Proposition 10.2.10
Let S be any subset of a vector space V over a field F . The span Span(S) is a vector
subspace of V .
v = c1 v1 + c2 v2 + · · · + cr vr and w = d1 w1 + d2 w2 + · · · + ds ws .
Then
v + w = c1 v1 + c2 v2 + · · · + cr vr + d1 w1 + d2 w2 + · · · + ds ws ,
which is again a linear combination of elements in S. Also, if λ ∈ F , then
Definition 10.2.11
Let V and W be vector spaces over a field F . A function T : V → W is called a linear
transformation from V to W if
(1) T (v1 + v2 ) = T (v1 ) + T (v2 ) for all v2 , v2 ∈ V ;
(2) T (cv) = cT (v) for all v ∈ V and all c ∈ F .
Definition 10.2.12
If a linear transformation T : V → W is also a bijective function, then T is called an
isomorphism. If an isomorphism exists between two vector spaces V and W , then we say
they are isomorphic and we write V ∼
= W.
Some linear algebra books define an isomorphism between vector spaces V and W as a bijection
T : V → W such that T and T −1 are both linear transformations. This is indeed correct. In fact,
an equivalent definition is valid for groups and rings as well. However, as with groups and rings, it
is easy to show that if T : V → W is a bijection and a linear transformation, then T −1 : W → V is
also a linear transformation. Consequently, Definition 10.2.12 suffices.
10.2.4 – Subclasses
The Outline given in the preface mentions subclasses, generally described as interesting objects
obtained by adding additional structure. For the algebraic structure of vector spaces, there are
many interesting and useful subclasses: vector spaces with bilinear forms, topological vector spaces,
Hilbert spaces, Banach spaces, and Lie algebras just to name a few. Many of these subclasses become
new algebraic structures of their own and deserve full books for a proper introduction.
Definition 10.2.13
Let V be a vector space over a field F . A nonempty subset S ⊆ V is called linearly
independent if for any finite subset {v1 , v2 , . . . , vr } ⊆ S
This is the most general definition of linear independence and it applies to subsets S that are
not necessarily finite. Note that again, every linear combination is always assumed to be a finite.
Proposition 10.2.14
Let V be a vector space over a field F and let S be a linearly independent set. Then for
any strict subset S 0 ( S the span Span(S 0 ) is a strict subspace of Span(S).
v = c1 v1 + c2 v2 + · · · + cr vr
v − c1 v1 + c2 v2 + · · · + cr vr = 0
with v, v1 , v2 , . . . , vr ∈ S. Since the coefficient in front of v is nonzero, this contradicts the assumption
that S is linearly independent. By contradiction, v ∈ / Span(S 0 ).
Definition 10.2.15
Let V be a vector space over a field F . A subset B of V is called a basis of V if B is linearly
independent and if Span(B) = V .
The condition that Span(B) = V means that V is generated by the elements of B. In contrast
by Proposition 10.2.14, the requirement that B be linearly independent means that no strict subset
of B generates all of V .
There are two crucial but sticky points concerning bases that, in introductory courses on linear
algebra, are treated cursorily. This is because to prove them in their full generality requires Zorn’s
Lemma, which is equivalent to the Axiom of Choice. Proofs for the finite-dimensional cases are
easier. These crucial properties are the content of the next two theorems.
Theorem 10.2.16
Let V be a nontrivial vector space over a field F . Let Γ ⊆ V with V = Span(Γ) and let
S ⊆ Γ be a linearly independent set. Then there exists a basis B of V such that S ⊆ B ⊆ Γ.
Proof. Let I be the set of subsets A such that S ⊆ A ⊆ Γ and A is linearly independent. The
collection I is not empty since it contains S. Subset containment ⊆ is a partial order on I. Let
{Aj }j∈J be a chain (a totally ordered subset of I).
We claim that [
A= Aj
j∈J
c1 v1 + c2 v2 + · · · + cr vr = 0
with c1 6= 0. Then
v = −(c−1 −1
1 c2 v2 + · · · + c1 cr vr ),
Corollary 10.2.17
Every nontrivial vector space V over a field F has a basis.
Proof. Let v be a nonzero vector in V . The corollary follows from Theorem 10.2.16 with S = {v}
and Γ = V .
It is important to underscore that Theorem 10.2.16 establishes two useful results: (1) it is always
possible to complete a set of linearly independent vectors S in V to a basis of B; (2) it is always
possible to select a basis from any generating set Γ. Both of these arise often in proofs.
Proposition 10.2.18
Let V be a vector space over a field F and let B be a basis of V . Every element v ∈ V can
be written in a unique way as a linear combination of elements in B.
Proof. That v ∈ V is a linear combination of elements in B follows simply from the fact that B spans
V . Now suppose that
v = c1 u1 + c2 u2 + · · · + cr ur , (10.1)
v= d1 u01 + d2 u02 + ··· + ds u0s (10.2)
are two linear combinations of v with u1 , u2 , . . . , ur , u01 , u02 , . . . , u0s ∈ B. The sets {u1 , u2 , . . . , ur } and
{u01 , u02 , . . . , u0s } may or may not be disjoint so write
with all the wi distinct. Note that t ≤ r + s so t is finite. Let us now rewrite the linear combinations
(10.1) and (10.2) as
v = c1 w1 + c2 w2 + · · · + ct wt ,
v = d1 w1 + d2 w2 + · · · + dt wt
480 CHAPTER 10. MODULES AND ALGEBRAS
allowing for some of the ci and some di to be 0 so that these expressions are identical to those in
(10.1) and (10.2).
Then
0 = c1 w1 + c2 w2 + · · · + ct wt − d1 w1 − d2 w2 − · · · − dt wt
= (c1 − d1 )w1 + (c2 − d2 )w2 + · · · + (ct − dt )wt .
Anyone who has studied linear algebra knows that bases are not at all unique for a given vector
space. The following theorem establishes an important invariant property about bases.
Theorem 10.2.19
Let V be a vector space over a field F and let B and B 0 be two bases of V . Then B and B 0
have the same cardinality.
Proof. The case when one of the bases is finite is proved in most introductory linear algebra courses.
(A more general proof for free modules is also given in Theorem 10.5.11.) Consequently, we prove
the situation when B and B 0 are both infinite.
Since B 0 is a basis, for each v ∈ B,
v = c1 w1 + c2 w2 + · · · + cr wr
for some positive integer r, for some ci ∈ F − {0}, and some wi ∈ B 0 . Furthermore, by Proposi-
tion 10.2.18, this expression is unique. As v varies through B, each vector w ∈ B 0 appears as one of
the linear combinations associated to elements in B. Otherwise, B 0 − {w} would span all of V and
by Proposition 10.2.14, B 0 would be linearly dependent.
Let f : B 0 → B be a function that satisfies f (w) = v, where v is one of the vectors in B for which
w is used to represent it. If v ∈ f (B 0 ), then since the expression of v as a linear combination of
elements in B 0 is unique and finite, the fiber f −1 (v) is finite. Note that
[
B0 = f −1 (v),
v∈f (B0 )
so since each fiber is finite, f (B 0 ) must be infinite since B 0 is infinite. Moreover f (B 0 ) and B 0 have
the same cardinality. Consequently, since f (B 0 ) ⊆ B,
|B 0 | = |f (B 0 )| ≤ |B|.
Interchanging the roles of B and B 0 , we also deduce that |B| ≤ |B 0 |. By the Schröder-Bernstein
Theorem, |B 0 | = |B|.
Definition 10.2.20
Let V be a vector space over a field F . If B is a basis of V , then the dimension of V ,
denoted by dimF V is the cardinality |B|.
If a vector space V over a field F has a finite basis of n vectors, then V is said to be finite
dimensional with dimension n. We denote this briefly by dimF V = n. If the field F is understood
by context, then this is abbreviated to dim V . If V does not have a finite basis, we say that V is
infinite dimensional. Though we do not discuss the notion of a basis for the trivial vector space
V = {0}, for completeness, we say that dim{0} = 0.
10.2. VECTOR SPACES 481
Example 10.2.21. Let F be a field and let V = F n . The standard basis of V consists of vectors
{e1 , e2 , . . . , en }, where ei is the n-tuple in V that is 0 in all components except for 1 in the ith
component. Obviously, dim F n = n. 4
Example 10.2.22. Recall that the set Mm×n (R) of m×n matrices with entries in R is a real vector
space and the standard basis consists of the matrices Eij with 1 ≤ i ≤ m and 1 ≤ j ≤ n defined as 0
in all entries except for a 1 in the (i, j)th entry. Since a basis of Mm×n (R) consists of mn matrices,
dim Mm×n (R) = mn. 4
Example 10.2.23. Let F be a field. The polynomial ring F [x] is a vector space with basis B =
{1, x, x2 , x3 , . . .}. One of the key points in this example is that every polynomial has a term of
highest degree. Therefore, every polynomial consists of a finite number of terms. Similarly, elements
in Span(B) consist of finite linear combinations of the xi . Thus, B spans F [x]. The vector space
F [x] is infinite dimensional but we have presented a basis that is countable. By Theorem 10.2.19,
every basis of F [x] is countable and we can write dimF F [x] = |N|. 4
Example 10.2.24. Contrast the previous example with V = `∞ (R), the vector space of real se-
quences. For each i, consider the sequences ei that consists of 1 in the ith output and 0 for all other
outputs. The set {ei | i ∈ N} is not a basis of `∞ (R) because not every sequence in R is a (finite)
linear combination of the ei sequences. Indeed, any finite linear combination of the ei sequences
would be 0 for all but a finite number of terms in the sequence. The vector space V is also finite
dimensional but we have not presented a basis. 4
Definition 10.2.25
Let V be a vector space over a field F with dim V = n. Let B = (u1 , u2 , . . . , un ) be an
ordered basis of V . (By ordered basis we mean an n-tuple of distinct vectors, the set of
whose entries forms a basis of V .) The coordinates of a vector v ∈ V with respect to B
consist of the unique ci ∈ F such that
v = c1 u1 + c2 u2 + · · · + cn un .
The concept of coordinates defines an isomorphism T : V → F n via T (v) = [v]B . This implies
the important classification theorem about vector spaces that if dim V = dim W , then V ∼ = W.
Consequently, unlike groups or rings, there is a simple property that classifies all vector spaces: the
dimension.
482 CHAPTER 10. MODULES AND ALGEBRAS
Coordinates also lead to the concept of matrices. Most linear algebra courses introduce matrices
as rectangular arrays of numbers and first define multiplication of matrices in a manner than seems
somewhat arbitrary. However, the purpose of defining matrix multiplication as it is done comes from
the theorem that every linear transformation T : F m → F n can be expressed as
v1
v2
T (v) = A .
..
vm
for some n × m matrix A and where we have expressed a vector v ∈ F m as the m-tuple v =
(v1 , v2 , . . . , vm ). This theorem, along with the isomorphism between a vector space and its coordi-
nates with respect to a basis, lead to the following useful notion.
Definition 10.2.26
Let V and W be vector spaces over a field F and let T : V → W be a linear transformation.
Suppose that dim V = m and that dim W = n. Let B be a basis of V and let B 0 be a basis
of W . The (B, B 0 )-matrix of T , is the unique n × m matrix [T ]B
B0 such that for all v ∈ V ,
B
T (v) B0 = T B0 v B .
This matrix representing T with respect to the bases makes it possible to perform all calculations
pertaining to linear transformations between finite-dimensional vector spaces using matrix algebra
and operations.
Proposition 10.2.27
The set of linear transformations HomF (V, W ), equipped with the operations as defined
in (10.3), is vector space over F . Furthermore, if dim V = m and dim W = n, then
dim Hom(V, W ) = mn.
Proof. We must first show that the addition and scalar multiplication defined in (10.3) are actually a
binary operation and a scalar multiplication, in other words that (T1 + T2 ) is a linear transformation
10.2. VECTOR SPACES 483
Because of Proposition 10.2.27, we often refer to Hom(V, W ) briefly as the hom-space between
V and W (where “hom” is reminiscent of homomorphism).
There are two particularly important instances of hom-spaces: the endomorphism ring of a vector
space and the dual of a vector space.
The set of linear transformations of V into itself (an endomorphism on V ) is denoted by End(V ).
This vector space has the additional structure of being a ring with composition as the multiplication.
(See Exercise 10.2.13.)
The group of units of End(V ), consists of invertible linear transformations of V into itself and
is called the general linear group of V , denoted GL(V ). Previously, we denoted by GLn (F ) the
group of n × n invertible matrices with entries in the field F . We see that GLn (F ) ∼ = GL(F n ).
The notation GL(V ) is more general than our previous notation as the data concerning dimension
and field is contained in the properties and data of the vector space V . However, GL(V ) refers to
invertible linear transformations without immediate reference to a given basis. In particular, any
isomorphism GLn (F ) ∼ = GL(F n ) is determined in reference to some basis on F n .
Definition 10.2.28
Let V be a vector space over a field F . The dual of V , denoted V ∗ , is the vector space of
homomorphisms from V to F , i.e., V ∗ = Hom(V, F ).
Example 10.2.29. Let V = C 0 ([a, b], R) be the vector space over R of continuous real-valued
functions over the interval [a, b]. For some c ∈ [a, b], the evaluation function evc : V → R defined
by evc (f ) = f (c) is an element of the dual. Also, by linearity properties of integral, the function
F : V → R defined by
Z b
F (f ) = f (t) dt
a
is a linear transformation and hence is an element of V ∗ . 4
Suppose that V is finite dimensional with ordered basis B = (e1 , e2 , . . . , en ). Then by considering
coordinates of vectors in V with respect to B defines an isomorphism between V and F n . As usual,
we assume we write the coordinates of vectors as column vectors. We know from linear algebra that
any linear transformation F n → F is represented by a a × n matrix, which corresponds to a row
vector. So row vectors have an interpretation as representing elements in V ∗ (with respect to the
ordered basis B). The elements of V are technically vectors since V ∗ is a vector space in its own
right, we often call an element of V ∗ a covector of V .
Associated to an ordered basis B, there is a standard ordered basis B ∗ = (e∗1 , e∗2 , . . . , e∗n ) of the
dual vector space defined by (
∗ 1 if i = j
ej (ei ) = δji =
0 if i 6= j.
484 CHAPTER 10. MODULES AND ALGEBRAS
Since the e∗j form a basis of V ∗ , we see that if V is finite dimensional, then dim V = dim V ∗ and the
mapping ei ↔ e∗i sets up an isomorphism between V and V ∗ . However, it is proper to distinguish
between vectors and covectors because their coordinates do not change in the same way under a
change of basis.
Proposition 10.2.30
Let B = (e1 , e2 , . . . , en ) and B 0 = (f1 , f2 , . . . , fn ) be two ordered bases of a vector space V .
Let B ∗ and B 0∗ be the respective associated dual bases. If the coordinate change matrix
from B to B 0 coordinates is the n × n matrix M , then the coordinate change matrix from
B ∗ to B 0∗ coordinates is M −1 .
Proof. From linear algebra, we know that the matrix M = (mij ) is defined by
n
X
ei = mij fj . (10.4)
j=1
Consequently, for any vector a ∈ V , if [a]B = (ai ) and [a]B0 = (a0i ) as a column vectors, then
n
X
[a]B0 = M [a]B ⇐⇒ a0i = mij aj .
j=1
Let us call M = (mij ) the coordinate change matrix from B ∗ to B 0 ∗ coordinates. Then
n
X
e∗k = f`∗ m`k .
`=1
The order of writing is changed as compared to (10.4) to reflect the fact that coordinates of vectors
in V ∗ are row vectors. Then we have
n
X Xn
δki = e∗k (ei ) = f`∗ mij fj m`k
`=1 j=1
n X
X n X
X
= mij f`∗ (fj )m`k = mij δ`j m`k
`=1 j=1 `=1 j=1
n
!
X X X
= mij δj` m`k = mij mjk .
j=1 `=1 j=1
This last summation corresponds to the entries of the matrix product M M so we have proven that
I = M M . Hence, we conclude that M = M −1 .
superscripts for the coordinates of vectors and subscripts for coordinates of covectors. More precisely,
we usually refer to an ordered basis of V as B = (e1 , e2 , . . . , en ) and the associated dual basis of
V ∗ as B ∗ = (e∗1 , e∗2 , . . . , e∗n ). For a vector v ∈ V , we denote its coordinates with superscripts as
(v 1 , v 2 , . . . , v n ) ∈ F n so that
n
X
1 2 n
v = v e1 + v e2 + · · · + v en = v i ei .
i=1
We emphasize that these are superscripts and not powers on the symbol v. As a matrix, the
>
coordinates of v are depicted by a column vector v 1 v 2 · · · v n . For a covector µ ∈ V ∗ , we
denote the coordinates with subscripts as (µ1 , µ2 , . . . , µn ) ∈ F n so that
n
X
µ = µ1 e∗1 + µ2 e∗2 + · · · + µn e∗n = µi e∗i .
i=1
As a matrix, the coordinates of µ are depicted by a row vector µ1 µ2 · · · µn . It is useful to
note in the above summations, superscript indices are paired up with subscript indices.
14. Let T : V → W be a linear transformation between vector spaces. Let U be a subspace of V . Prove
that T (U ) = {T (u) | u ∈ U } is a subspace of W . [This generalizes the fact that Im T is a subspace of
W since Im T = T (V ).]
15. Let T : V → W be a linear transformation between vector spaces. Let U be a subspace of W . Define
the set
T −1 (U ) = {v ∈ V | T (v) ∈ U }.
Prove that T −1 (U ) is a subspace of V . [This generalizes the fact that Ker T is a subspace of V because
Ker T = T −1 ({0}).]
16. Consider the vector space V = C 0 ([a, b], R) as in Example 10.2.29. Let g ∈ V .
(a) Prove that the function Ψg : V → R defined by
Z b
Ψg (f ) = f (t)g(t) dt
a
∗
is an element of V .
(b) Prove that the mapping g 7→ Ψg is an injective linear transformation from V to V ∗ .
(c) Let c ∈ [a, b]. Prove that the evaluation evc ∈ V ∗ is not equal to Ψg for any function g ∈ V .
10.3
Introduction to Modules
Vector spaces are a particular instance of a more general structure. As we saw in the previous
section, a vector space is an action structure in which a field acts on an abelian group. Modules are
“simply” an algebraic structure in which a ring acts on an abelian group.
Because modules generalize vector spaces, the reader should anticipate that modules have even
more uses than vector spaces. It is initially useful for intuition to think of modules as vector
spaces where the scalars come from a ring. However, since the ring of scalars could have zero
divisors, might not have inverses of nonzero elements, might not have unique factorization, might
be noncommutative and so on, not all the familiar and useful theorems in vector spaces have an
equivalent in the context of modules. Instead, other concepts, which are trivial (uninteresting) for
vector spaces, take on greater importance.
Definition 10.3.1
Let R be a ring. A left R-module is an abelian group (M, +) and a function R × M → M ,
called scalar multiplication denoted (r, m) 7→ rm or r · m, that satisfies the following
identities:
(1) ∀r ∈ R, ∀m, n ∈ M, r(m + n) = rm + rn;
As with vector spaces, the elements in R are called scalars in relation to the elements in M (but
we do not call elements of M vectors).
The adjective “left” in this definition refers to the fact that the elements of R act on the left of
elements of M . We could change the axioms as appropriate to define a right R-module. If R is not
10.3. INTRODUCTION TO MODULES 487
commutative, then a left R-module might not be a right R-module. Example 10.3.11 illustrates this
difference. However, if R is commutative, every left R-module is in one-to-one correspondence with
a right R-module. (See Exercise 10.3.10.) Consequently, whenever R is commutative, we simply
refer to an R-module without distinguishing between right or left.
Nearly all theorems for left R-modules carry over with appropriate changes to right R-modules.
Consequently, unless we specify otherwise, we will state theorems in reference to left R-modules with
the implicit understanding that they are equally valid for right R-modules with appropriate changes
of left- to right-actions.
Proposition 10.3.2
Let 0R be the additive identity in a ring R and let 0M be the additive identity in a left
R-module M .
(1) 0R · m = 0M for all m ∈ M .
(2) r0M = 0M for all r ∈ R.
where the last equality holds by (1). Thus, (−r)m is the additive inverse to rm so (−r)m = −(rm).
Furthermore,
rm + r(−m) = r(m − m) = r0M = 0M ,
where the last equality holds by (1). Thus, r(−m) is the additive inverse to rm so r(−m) = −(rm).
Example 10.3.3 (Vector Spaces). Section 10.2 served as a motivation for modules. The reader
should notice how the axioms of vector spaces over a field F exactly match up to the axioms of a
module over the ring F . Hence, vector spaces are precisely modules over fields. 4
Example 10.3.4 (Trivial Module). Let R be any ring. The trivial abelian group of one element
{0} is both a left R-module with the action defined by r · 0 = 0 and similarly for a right R-module.4
Example 10.3.5. Let R be a ring. If we set M = R, then R is a left- or right-module over itself,
depending on whether we consider the multiplication on the left or the multiplication on the right
of a given element. 4
Example 10.3.6 (Z-Modules). Consider the ring of integers Z and let M be a Z-module. (We
need not refer to a left- or right-module since Z is commutative.) If r is a positive integer, then the
axioms of the definition give
By Proposition 10.3.2, 0m = m and for any negative integer r, we have rm = −(|r|m). With
this notation, all the axioms of modules are simply the power rules for an abelian group (Propo-
sition 3.2.13). Consequently, with the ring Z, the axioms (1–4) in Definition 10.3.1 do not impose
any further conditions beyond the requirement that M is an abelian group. Hence, the algebraic
structure of Z-modules is precisely the collection of abelian groups. 4
The above example shows how Z-modules corresponds to an algebraic structure that we encoun-
tered, abelian groups. In a similar way, it is not uncommon that for a given ring R, the associated
structure of left R-modules corresponds to a different algebraic structure, first encountered in an-
other way. Because of this regular occurrence, the algebraic structures of modules provide a common
context for many different algebraic structures.
Proposition 10.3.7
Let R be a ring and let M and N be left R-modules. The Cartesian product M × N
equipped with addition and scalar multiplication
def def
(m1 , n1 ) + (m2 , n2 ) = (m1 + m2 , n1 + n2 ) and r(m, n) = (rm, rn)
is a left R-module.
Definition 10.3.8
The left R-module constructed in Proposition 10.3.7 is called the direct sum of M and N
and is denoted M ⊕ N .
As with vector spaces, the notion of a direct sum of two modules generalizes to the direct sum
of a finite number of modules and also to the direct sum and the direct product of any collection of
modules.
Example 10.3.9. Let R be a ring and let M = Rk = R ⊕ R ⊕ · · · ⊕ R, be the direct sum of R with
itself k-times (where k is a positive integer). Any element m ∈ M is an k-tuple m = (r1 , r2 , . . . , rk ).
We define addition component-wise and the left-action of R on M by
This makes Rk into a left-module over R. Likewise, Rk also has the structure of a right R-module if
we take multiplication of the k-tuple on the right. If R is not commutative, the corresponding left-
and right-modules are not necessarily in a natural one-to-one correspondence with each other. 4
Thus, M is a free abelian group of rank 3 with basis {(6, 0, 0), (−1, 1, 0), (−2, 0, 1)}. 4
Example 10.3.11 (Matrix Multiplication). Let F be a field and let n be a positive integer.
Consider the ring Mn×n (F ) of n × n matrices with coefficients in F . Consider also the abelian group
10.3. INTRODUCTION TO MODULES 489
M = F n , with the operation of addition. Think of elements in M as column vectors. The usual
properties of of matrix-vector multiplication satisfy axioms (1–4) in Definition 10.3.1. Therefore, the
ring Mn×n (F ) acts by left-multiplication on F n in such as way that F n is a left Mn×n (F )-module.
However, it is not possible to multiply an n × n matrix on the right of a n × 1 column vector. Hence,
in this setup F n is not a right-module. 4
Example 10.3.12 (Ideals; Quotient Rings). Let R be a ring and let I be a left ideal in R. Then
I satisfies all the axioms of a left R-module. If I is a right ideal in R, then I is a right R-module.
If I is a (two-sided) ideal, then I is both a left and right R-module. Furthermore, the quotient ring
R/I is also an R-module under the action
Example 10.3.13 (F [x]-modules). Let F be a field and let R be the polynomial ring R = F [x].
We propose to determine what left F [x]-module may be. Let M be a left F [x]-module.
First note that F (as a subring of F [x]) must act on M according to the left R-module axioms.
Thus, M is a vector space over the field F . Now consider the action of the element x on M . The
axioms for a left-module, show that the element x behaves as a linear transformation T on the F -
vector space M . Furthermore, the axioms impose no further restrictions on the linear transformation
T we use for x. Now for any polynomial p(x) = ad xd + · · · + a1 x + a0 ∈ F [x] and for any v ∈ M ,
p(x) · v = (ad xd + · · · + a1 x + a0 )v
d times
z }| {
= ad T (T · · · T (v) · · · )) + · · · + a1 T (v) + a0 v. (10.5)
Consequently, an F [x]-module consists of two data: an F -vector space V and a linear transformation
T : V → V . The above equation gives the action of any polynomial on a vector in V .
As a specific example, let F = R and consider the R[x] module that consists of the vector space
V = R2 and with the associated linear transformation T : R2 → R2 that, with respect to the
standard basis, has the matrix
3 4
A= .
2 3
Then, for example,
2
2 4 3 4 4 3 4 4 4
(3x − 2x + 4) · =3 −2 +4
−1 2 3 −1 2 3 −1 −1
51 72 4 −6 −8 4 16
= + +
36 51 −1 −4 −6 −1 −4
132
= . 4
79
Example 10.3.14 (Group Representations). Section 8.6 offered a brief introduction to group
representations. Let G be a finite group and let F be a field. Consider the group ring F [G] and
let V be a left F [G]-modules. As in the previous example, F acts on V (as a subring of F [G]) and
hence V must be an F -module and therefore a vector space over the field F . Let g ∈ G. Axiom (1)
of Definition 10.3.1 shows that g(v1 + v2 ) = gv1 + gv2 for all v1 , v2 ∈ V . Furthermore, by axiom (3),
for any c ∈ F ,
c(gv) = (c1)(gv) = (c1g)v = (g(c1))v = g(c1v) = g(cv).
Hence, g acts on V as linear transformation, which is invertible with inverse g −1 . Hence, an F [G]-
module defines a function ρ : G → GL(V ). However, Axiom (3) in Definition 10.3.1 shows that ρ is
a homomorphism. A homomorphism ρ : G → GL(V ) is precisely a representation of G in the vector
space V . With this data, the remaining axioms for an F [G]-module are satisfied.
490 CHAPTER 10. MODULES AND ALGEBRAS
Conversely, any group representation ρ : G → GL(V ) defines an F [G]-module via the left action
X X
ag g · v = ag ρ(g)(v).
g∈G g∈G
(The details of this proof are left as an exercise. See Exercise 10.3.7.)
In conclusion, representations of a group into vector spaces over a field F are in a one-to-one
correspondence with left F [G]-modules. 4
Though it is valuable to compare modules to vector spaces (since after all vector spaces are just
modules over a field), modules generalize vector spaces so much that they differ in many significant
ways. It may be instructive for a student to review the proofs of theorems about vector spaces with
a view of clearly noting what properties of fields are required at each step.
10.3.3 – Submodules
Definition 10.3.15
Let R be a ring and let M be a left R-module. A submodule of M is a nonempty subset
N such that (N, +) is a subgroup of (M, +) and rN ⊆ N for all r ∈ R.
Every R-module M (left or right) always contains at least two submodules, namely the trivial
submodule {0} and M itself.
Example 10.3.16. Example 10.3.5 pointed out that R is a left R-module over itself. A submodule
of R is a subgroup of the additive group (R, +) that is also closed under scalar multiplication. This
is precisely the definition of a left ideal. In particular, not all subrings of R are submodules but only
left ideals are submodules of R. 4
Example 10.3.17. More generally than the previous example, if M is a left R-module and I is a
left ideal of R, then the set
IM = {a1 m1 + a2 m2 + · · · + ak mk | ai ∈ I, mi ∈ M }
is a submodule of M . Indeed, the sum of two linear combinations in IM is again a linear combination
in IM . Furthermore, for all α = a1 m1 + a2 m2 + · · · + ak mk ∈ IM and all r ∈ R, we have
Similar to the One-Step Subgroup Criterion for the contexts of groups, the following proposition
is often convenient for proofs that certain subsets of a module are submodules.
Proof. (=⇒) Suppose that N is a left submodule of N . Then for all x, y ∈ N and for all r ∈ R, we
have ry ∈ N since N is closed under scalar multiplication and hence x + ry ∈ N .
(=⇒) Suppose that N is a nonempty subset of M such that for all x, y ∈ N and all r ∈ R,
we have x + ry ∈ N . Then setting r = −1, we deduce that x − y ∈ N . Hence, by the One-Step
Subgroup Criterion (N, +) is a subgroup of (M, +). In particular, N contains 0. Then for all y ∈ N
and all r ∈ R, ry = 0 + ry ∈ N . Hence, N is a submodule of M .
10.3. INTRODUCTION TO MODULES 491
Example 10.3.19 (F [x]-Submodules). Let F be a field. Recall from Example 10.3.13 that F [x]-
modules consist of a vector field V over F along with a linear transformation T : V → V , where
the action of polynomials on V is given by (10.5). Let W be an F [x]-submodule. Since W is closed
under addition and under scalar multiplication by elements in F , which is a subring of F [x], then W
is a vector subspace of V . However, W must also be closed under the action of the element x. For
all w ∈ W , we have x · w = T (w). Thus, W must satisfy the additional condition that T (w) ∈ W
for all w ∈ W , i.e., invariant under T . Finally, the action (10.5) shows that any subspace invariant
under T satisfies p(x) · W ⊆ W .
Suppose that V is a finite-dimensional F -vector space with a basis B = {v1 , v2 , . . . , vk } such that
B 0 {v1 , v2 , . . . , v` }, where ` ≤ k is a basis of W . Then the matrix of T with respect to B has the form
a11 a12 · · · a1`
a21 a22 · · · a2`
.. .. . ∗
. . .
. . . .
B
[T ]B = a`1 a`2 · · · a`` ,
0 ∗
0
where ∗ represents unconstrained entries and where [T ]B
B0 is the matrix (aij ).
As a first specific example, consider the R[x]-module V = R2 equipped with the linear transfor-
mation T whose matrix with respect to the standard basis is
0 −1
A= .
1 0
Suppose W is a nontrivial submodule of V . Then for any nonzero vector given in coordinates
~ = ab ∈ W satisfies xw~ = −b
(with respect to the standard basis) by w a . Since these vectors are
orthogonal to each other, they are linearly independent. But then Span(w,~ xw)~ = R2 so W = R2 .
Consequently, the only submodules of this R[x]-module are {0} and V .
In contrast, consider the R[x]-module V = R2 equipped with the linear transformation T whose
matrix with respect to the standard basis is
1 3
B= .
3 1
Suppose that W is a nontrivial submodule of V . The only nontrivial strict subspaces of V are
one-dimensional so suppose that W = Span(w). ~ Then W is a R[x]-submodule of V if and only if
~ ∈ W , so T (w)
T (w) ~ = λw~ for some constant λ. Hence, in this case where V is two-dimensional,
nontrivial R[x]-submodules are the eigenspaces. With the matrix B, these are
−1 1
W−2 = Span and W4 = Span ,
1 1
where Wλ is the eigenspace for the eigenvalue λ. 4
Example 10.3.20 (Fractional Ideals). Let R be an integral domain and let F be its field of
fractions. The field F is a R-module via the scalar multiplication r · f = rf in F for all r ∈ R and
all f ∈ F . A fractional ideal of R is defined as an R-submodule of F .
For example, if R = Z, then for example I = a7 | a ∈ Z is a Z-submodule of Q. Following
the notation for usual ideals, we can write I = 71 , indicating that I is generated by the set of
elements { 17 } as a Z-submodule of Q. The submodule I is not a ring, since it is not closed under
multiplication. Indeed, 71 · 17 = 491
∈
/ I.
It is interesting to note that the set of nonzero fractional ideals associated to an integral domain R
forms a group under ideal multiplication. As with usual ideals, define the multiplication of fractional
ideals by the set of linear combinations,
IJ = {a1 b1 + a2 b2 + · · · + ak bk | ai ∈ I, bi ∈ J}.
492 CHAPTER 10. MODULES AND ALGEBRAS
I(JK) = {a1 b1 c1 + a2 b2 c2 + · · · + ak bk ck | ai ∈ I, bi ∈ J, ci ∈ K}
and likewise for the ideal (IJ)K. Hence, multiplication of fraction ideals is associative. Since
integral domains have an identity, the ring R serves as the identity for ideal multiplication because
RI = IR = I. We claim that the inverse of a fractional ideal I is
I −1 = {x ∈ F | xI ⊆ R}.
First, we show that I −1 is a fractional ideal. Let x, y ∈ I −1 . Then for all a ∈ I, (x − y)a = xa − ya.
Since xa, ya ∈ R, then (x − y)a ∈ R as well, so x − y ∈ I −1 . If r ∈ R, then for all a ∈ I we
have (rx)a = r(xa), but since xa ∈ R, (rx)a is also in R so rx ∈ I −1 . This shows that I −1 is an
R-submodule of F . Furthermore,
I −1 I = {a1 b1 + a2 b2 + · · · + ak bk | ai ∈ I −1 , bi ∈ I}.
In Example 10.3.12 where the R-module is M = R/I, for all a ∈ I and all m ∈ M , we have
am = 0. This cancellation resembles zero division but is different since a is an element of the ring
R and m is an element of the module. This is an example of a more general concept. However, it
works both ways: thinking of R acting on M and of M being acted on by R.
Definition 10.3.21
Let R be a ring. Let I be a right ideal in R. The (left-)annihilator of I in M is the set
For example, if R = Z, and M = Z/6Z, then the annihilator of M is the ideal 6Z.
Proposition 10.3.22
Let M be a left R-module.
(1) If I is a right ideal in R, then Ann(I) is a submodule of M .
Proof. (1) Let m1 , m2 ∈ Ann(I) and let r ∈ R. Let a ∈ I. Then a(m1 + rm2 ) = am1 + arm2 . Since
m1 ∈ Ann(I), then am1 = 0. Since I is a right ideal, ar ∈ I and thus (ar)m2 = 0 as well. Hence,
a(m1 + rm2 ) = 0 and by the One-Step Submodule Criterion, Ann(I) is a submodule of M .
(2) Let a, b ∈ Ann(N ) and let r ∈ R. For all n ∈ N , we have (a−b)n = an−bn = 0−0 = 0. Thus,
by the One-Step Subgroup Criterion, (Ann(N ), +) is a subgroup of (R, +). Also, (ra)n = r(an) = 0
so ra ∈ Ann(N ). Since N is a submodule of M , (ar)n = a(rn) = 0 because rn ∈ N . Thus,
ar ∈ Ann(N ), and we deduce that Ann(N ) is a two-sided ideal of R.
10.3. INTRODUCTION TO MODULES 493
Also associated to the issue of zero divisors and annihilators are torsion elements. An element
m in a left R-module M is called a torsion element if rm = 0 for some nonzero r ∈ R. The subset
of torsion elements in M is denoted by
The torsion subset is not always a submodule of M . See Exercise 10.3.20. We also say that a module
is a torsion module if every element in M is a torsion elements, i.e., M = Tor(M ).
10.3.4 – Algebras
Occasionally, we encounter examples of modules that have additional structure. For example, R[x]
is an R-module with the usual multiplication of polynomials by scalars; but R[x] is also a ring and
so has a multiplication that behaves well with multiplication by scalars.
Definition 10.3.23
Let R be a commutative ring and let M, N, P be three R-modules. A function ϕ : M ×N →
P is called bilinear or R-bilinear if
The definition of a bilinear function generalizes in two ways. By changing the axioms (2) and
(4) to multiplication on the right, the definition can apply to right R-modules. Furthermore, it is
also possible to talk about multilinear functions but we delay discussing such functions until later.
Definition 10.3.24
Let R be a commutative ring. An R-algebra is a pair (M, [, ]) where M is an R-module M
and where [, ] is a R-bilinear map [, ] : M × M → M .
By axioms (1) and (3) in Definition 10.3.23, the bilinear map behaves like a product in that it
distributes over the addition + in the module M . Intuitively speaking, axioms (2) and (4) require
that this bilinear map (product) on M behaves well with respect to the scalar multiplication.
We conclude this section with a few examples of R-algebras.
Example 10.3.25 (Trivial Algebra). Let R be a commutative ring. Every R-module can be
given a trivial R-algebra structure with [m1 , m2 ] = 0 for all m1 , m2 ∈ M . 4
Example 10.3.26 (Polynomial Algebra). The ring R[x], along with polynomial multiplication
as the bilinear map, has the structure of an R-algebra. Consequently, that is why it is technically
proper to talk about the “polynomial algebra” over R. 4
Example 10.3.27 (Vector Cross Product). Obviously R3 is a vector space over the field R.
However, in analytic geometry, we encounter the cross product of vectors in R3 . The algebraic
properties of the vector cross product include
(~u + ~v ) × w
~ = ~u × w
~ + ~v × w
~ (r~u) × w
~ = r(~u × w)
~
~u × (~v + w)
~ = ~u × ~v + ~u × w
~ ~u × (rw)
~ = r(~u × w)
~
~ ∈ R3 and all r ∈ R. Consequently, equipped with vector cross product, not only is R3
for all ~u, ~v , w
an R-module (i.e., vector space over R), it is also an algebra over R. It is interesting to recall that
(R3 , +, ×) is not a ring because × is not associative. 4
494 CHAPTER 10. MODULES AND ALGEBRAS
Example 10.3.28 (Matrix Algebra). Let R be a commutative ring and consider the set Mn (R)
of n × n matrices with coefficients in R. The scalar multiplication R × Mn (R) → Mn (R) of a matrix
rA equips Mn (R) with the structure of an R-algebra. We already know that Mn (R) is a ring with
the addition and multiplication of matrices. This set Mn (R) is also an R-algebra.
The distributivity axiom for the ring Mn (R) states that
but scalar-matrix multiplication also satisfies c(AB) = (cA)B = A(cB). Hence, the ring Mn (R) is
an R-algebra. 4
Example 10.3.29 (Boolean Algebra). By Theorems 10.1.12 and 10.1.13, Boolean algebras are
Boolean rings and vice versa. It is useful to see how Boolean algebras are algebras by this new
definition. The property that x + x = 0 for all x in a Boolean ring, along with the other axioms,
implies that every Boolean ring is an F2 -algebra. Hence, Boolean algebras are F2 -algebras along
with some other axioms. Hence, not every F2 -algebra is a Boolean algebra, since for example, a
trivial F2 -algebra is not a Boolean algebra. 4
Polynomial algebras and matrix algebras are a specific instance of another class of algebras.
Definition 10.3.24 only required the pairing on M to be bilinear. In particular, [, ] need not be
associative. If the pairing is associative, then M also carries the structure of a ring. We given this
situation a specific name.
Definition 10.3.30
Let R be a commutative ring and let (A, [, ]) be an R-algebra. If [, ] is associative, then
A is called an associative R-algebra. If in addition A contains an identity for [, ], then the
associative algebra is called unital.
An associative R-algebra A is a ring in its own right but in which the ring R serves as set of scalars.
Exercise 10.3.34 gives another construction for unital associative R-algebras. Polynomial algebras
over a commutative ring R and a matrix algebra over a commutative ring R are unital associative
R-algebras. In contrast, R3 equipped with the addition and the cross product is a nonassociative
R-algebra.
11. Let R and S be rings. Let M be a left R-module and let N be a left S-module. Prove that M × N is
a left R ⊕ S-module when equipped with the following addition and scalar multiplication:
def
(m1 , n1 ) + (m2 , n2 ) = (m1 + m2 , n1 + n2 )
def
(r, s) · (m, n) = (rm, sn).
12. Let F be a field and consider the polynomial ring F [x, y].
(a) Prove that an F [x, y]-module consists of vector space V over F along with commuting linear
transformations T1 and T2 .
(b) Explicitly describe the action of a polynomial p(x, y) on a vector in V .
13. Let R be a commutative ring and let M be a left R-module. Suppose that D is a multiplicatively
closed subset of R not containing 0. Define the equivalence relation ∼ on D × M by
N1 + N2 = {m1 + m2 | m1 ∈ N1 , m2 ∈ N2 }.
Prove that this scalar multiplication equips M with the structure of an R-module.
(b) Determine the annihilator Ann(M ).
23. Let M be a left R-module. Let I be a right ideal of R. Prove that I ⊆ Ann(Ann(I)). Give an example
to show that this containment might be strict.
496 CHAPTER 10. MODULES AND ALGEBRAS
24. Let M be a left R-module. Let N be a submodule of M . Prove that N ⊆ Ann(Ann(N )). Give an
example to show that this containment might be strict.
25. Consider the R[x]-module V = R2 , equipped with the linear transformation on V corresponding
to
projection onto the y = x line. Prove that the only submodules of V are Span 11 and Span −1
1
.
26. Consider the C[x]-module V = C2 , equipped with the linear transformation T : V → V given by
z1 0 1 z1
T = .
z2 −1 0 z2
Determine all the R[x]-submodules of V . [Hint: There are precisely eight of them.]
28. Consider the R[x]-module V = R3 , equipped with the linear transformation T : V → V given by
x 0 1 0 x
T y = 0 0 1 y .
z 0 0 0 z
Also define the 0 on End(M ) as the 0 function and the 1 in End(M ) as the identity function on M .
(a) Prove that (End(M ), +, ·) is a ring with additive identity 0 and multiplicative identity 1.
(b) Prove that any structure that makes M into a left R-module defines a ring homomorphism from
R into (End(M ), +, ·).
30. Let R be a ring with identity. Let M be the abelian group (R, +).
(a) Prove that the left-multiplication of R on itself makes M into a left R-module.
(b) Use Exercise 10.3.29 to show that R is isomorphic to a subring of End(M ).
[The result of this exercise shows that every ring is isomorphic to a subgroup of some endomorphism
ring of an abelian group.]
31. Section 6.7 introduced the notion of an algebraic integer and in particular the concept of algebraic
closure. (See Definition 6.7.7.) Let R be a subring of a commutative ring S and suppose that S has
an identity 1 that is also in R. Prove that the following three statements are equivalent:
(a) s ∈ S is integral over R;
(b) R[s] is a finitely generated R-module;
(c) s is an element in some subring T , where R ⊆ T ⊆ S, and such that T is a finitely generated
R-module.
[Hint: For showing (3) ⇒ (1), let {t1 , t2 , . . . , tk } be a finite generating subset of T as an R-module.
Then
Xk
sti = aij tj
j=1
for some aij ∈ R. Prove that s solves some associated characteristic equation whose polynomial is
monic and in R[x].]
32. Use Exercise 10.3.31 to prove the following facts about integrality in ring extensions. Let R and S be
as in Exercise 10.3.31.
(a) If s and s0 are in S and integral over R, then s ± s0 and ss0 are also integral over R.
10.4. HOMOMORPHISMS AND QUOTIENT MODULES 497
10.4
Homomorphisms and Quotient Modules
Following the Outline in the preface, the next logical topic in the presentation of the algebraic
structure of modules is that of homomorphisms. This section also includes quotient modules since
they are closely related.
Definition 10.4.1
Let R be a ring and let M and N be left R-modules. A function ϕ : M → N is called an
R-module homomorphism if
Example 10.4.2 (Vector Spaces). If F is a field, then F -modules are vector spaces over F . Fur-
thermore, the axioms for R-module homomorphisms are precisely those for a linear transformation
between vector spaces. 4
Example 10.4.3. Example 10.3.5 pointed out that a ring R is a left R-module by left-multiplication.
An R-module homomorphism ϕ : R → R has different conditions than a ring homomorphism. Ax-
iom (2) in Definition 10.4.1 requires ϕ(rx) = rϕ(x) for all r, x ∈ R, whereas a ring homomorphism
f : R → R requires ϕ(rx) = ϕ(r)ϕ(x). 4
Example 10.4.4 (Z-Modules). Example 10.3.6 established that Z-modules are precisely the al-
gebraic structure of abelian groups. Let M and N be an abelian groups and let ϕ : M → N be
a function. Axiom (1) of Definition 10.4.1 requires that an R-module homomorphism be a group
homomorphism between (M, +) and (N, +). Now let ϕ : M → N be a group homomorphism. Then
for all positive integers n,
n times n times
z }| { z }| {
ϕ(n · x) = ϕ(x + x + · · · + x) = ϕ(x) + ϕ(x) + · · · + ϕ(x) = n · ϕ(x).
By other properties of group homomorphisms, we also have ϕ(0 · x) = 0ϕ(x) and ϕ((−n)x) =
(−n)ϕ(x). Hence, homomorphisms between abelian groups automatically satisfy the second axiom
of R-module homomorphisms. We conclude that Z-module homomorphisms are precisely group
homomorphisms between abelian groups. 4
The following two definitions are standard for algebraic structures. The set in the third definition
(Definition 10.4.10) has particular value in the theory of modules.
Definition 10.4.5
An R-module homomorphism ϕ : M → N is called an isomorphism if it is also a bijective
function. If there exists and isomorphism between two R-modules, then we say M and N
are isomorphic and we write M ∼ = N.
Example 10.4.6 (F [x]-modules). Let F be a field and let V and W be two (left) F [x]-modules.
Suppose that x acts on V according to a linear transformation T : V → V while x acts on W
according to a linear transformation S : W → W . By considering Axiom (2) for the subring F in
F [x], an F [x]-module homomorphism must be a linear transformation ϕ : V → W . However, we
also need that ϕ(xv) = xϕ(v). Hence, an F [x]-module homomorphism is a linear transformation
ϕ : V → W such that ϕ ◦ T = S ◦ ϕ. In other words, the following diagram of functions commutes.
ϕ
V W
T S
V ϕ W
By Definition 10.4.5, two F [x] modules are isomorphic if the linear transformation ϕ is a bijection.
Consequently, two F [x]-modules V with T and W with S are isomorphic as F [x]-modules if and only
ϕ
if V ∼
= W as vector spaces and T and S are similar linear transformations with S = ϕ ◦ T ◦ ϕ−1 . 4
10.4. HOMOMORPHISMS AND QUOTIENT MODULES 499
Definition 10.4.7
Let ϕ : M → N be an R-module homomorphism.
(1) The kernel of ϕ is Ker ϕ = {m ∈ M | ϕ(m) = 0}.
(2) The image of ϕ is Im ϕ = {n ∈ N | n = ϕ(m) for some m ∈ M }.
The following theorem is similar in many algebraic structures, so we omit the proof.
Proposition 10.4.8
Let ϕ : M → N be an R-module homomorphism.
(1) The kernel of ϕ is a submodule of M .
Example 10.4.9. Consider the Z-module Z3 and consider the subset M = {(x, y, z) ∈ Z3 | x + 2y +
7z = 0}. It is easy to check by using the definition, that M is a submodule. However, it is even
easier to observe that M is a submodule of Z3 because M = Ker ϕ for the Z-module homomorphism
ϕ(x, y, z) = x + 2y + 7z. 4
Definition 10.4.10
Let M and N be left R-modules. The set of R-module homomorphisms from M to N is
denoted by HomR (M, N ). The set of R-module endomorphisms on M , i.e., homomorphisms
from M itself, is denoted by EndR (M ).
Proposition 10.4.11
If R is a commutative ring, then HomR (M, N ) is a left R-module.
Proof. If X is a set and (N, +) is an abelian group, then the set of functions Fun(X, N ), equipped
with addition of functions, is easily seen to be an abelian group. We note that HomR (M, N ) is a
nonempty subset of Fun(M, N ). Let ϕ, ψ ∈ HomR (M, N ). Then consider the function ϕ − ψ defined
by (ϕ − ψ)(x) = ϕ(x) − ψ(x). For all x, y ∈ M ,
where the third equality requires the commutativity of R. Thus, rϕ is an R-module homomorphism
so the scalar multiplication is well-defined. The axioms for scalar multiplication follow from the
properties of the scalar multiplication on N .
The following proposition greatly generalizes the fact that the set of square matrices over a
commutative ring R is a untial associative R-algebra.
500 CHAPTER 10. MODULES AND ALGEBRAS
Proposition 10.4.12
Let R be a commutative ring with an identity 1 6= 0. The triple (End(M ), +, ◦) is a unital
associative R-algebra.
When discussing algebras in Section 10.3.4, we introduced the concept of a bilinear function
out of the Cartesian product of two R-modules. The reader should notice that a bilinear function
(Definition 10.3.23) is a function that is an R-module homomorphism in both entries separately.
We need ∼ to behave well with respect to the operations in order for the quotient structure to have
the desired properties.
Proposition 10.4.13
Let ∼ be an equivalence relation on a left R-module M that behaves well with respect to
the operations. The quotient set M/ ∼, i.e., the set of equivalence classes on M , is a left
R-module with operations:
def def
[m]∼ + [n]∼ = [m + n]∼ and r[m]∼ = [rm]∼ . (10.7)
Proof. All the algebraic properties required for a left R-module will hold in M/ ∼ from the definitions
in (10.7). The only concern is whether the operations are well-defined. Let m1 , m2 ∈ [m], let
n1 , n2 ∈ [n], and let r ∈ R. Since the equivalence relation behaves well with respect to the operations
on M , then
m1 + n1 ∼ m2 + n2 and rm1 ∼ rm2
so the addition and the scalar multiplication on M/ ∼ are well-defined regardless of the choice of
representative for [m] and [n].
As with groups and rings, there are only a certain number of equivalence relations that behave
well with respect to the operations on M .
Proposition 10.4.14
Let R and M be as above. Let ∼ be an equivalence relation on M that behaves well with
respect to the operations on M . Then the equivalence class [0]∼ is a submodule of M .
Proof. Call U = [0]∼ . Let m, n ∈ U . Then m ∼ 0 and n ∼ 0. Since ∼ behaves well with respect to
the operations, m + n ∼ 0 + 0 = 0. Hence, U is closed under addition. Similarly, rm ∼ r0 = 0 so U
is closed under scalar multiplication. Hence, U is a submodule of M .
Proposition 10.4.15
Let U be a submodule of the left R-module M . Define the relation ∼U on M by m1 ∼U m2
if and only if m2 − m1 ∈ U . Then ∼U is an equivalence relation on M that behaves well
with respect to the operations. Furthermore, the equivalence class of 0 is the submodule
U.
rm1 ∼U rm2 .
[0] = {m ∈ M | m ∼U 0} = {m ∈ M | 0 − m ∈ U }.
But this is the submodule U since −m is in a submodule if and only if m is in that submodule.
Definition 10.4.16
Let U be a submodule of a left R-module M and let ∼U be the equivalence relation given in
Proposition 10.4.15. The left R-module structure on M/ ∼U is called the quotient R-module
of M by U and is denoted by M/U .
Of key importance, Proposition 10.4.15 shows that it is possible to construct the quotient of M
from any submodule U . Furthermore, Proposition 10.4.14 establishes that the quotient module with
respect to a submodule is the only type of set quotient on M that induces a left R-module structure
from the operations on M .
As with quotient groups or quotient rings, we write elements in the quotient left R-module M/U
as m + U or more briefly as m, where the submodule U is understood by context. Addition and
scalar multiplication in M/U follow
N1 /U + N2 /U ∼
= (N1 + N2 )/U and (N1 /U ) ∩ (N2 /U ) ∼
= (N1 ∩ N2 )/U.
Proposition 10.4.21
Let V be a finite-dimensional vector space over a field F and let U be a subspace. Then
Proof. Let {v1 , v2 , . . . , vr } be a basis of U and complete this set to a basis {v1 , v2 , . . . , vn } of V . We
show that B = {v r+1 , v r+2 , . . . , v n } is a basis of V /U . Obviously, {v1 , v2 , . . . , vn } spans V /U but
v 1 = v 2 = · · · = v r = 0 in V /U . Hence, B spans V /U . Now suppose that there exist coefficient
ci ∈ F such that
cr+1 v r+1 + cr+2 v r+2 + · · · + cn v n = 0.
Then
Now since {v1 , v2 , . . . , vn } is linearly independent, we deduce that all the coefficients are 0, and in
particular that cr+1 = cr+2 = · · · = cn = 0. Hence, B is linearly independent and thus a basis of
V /U .
We have shown that if dim V = n and dim U = r, then dim V /U = n − r. The proposition
follows.
This result, along with the First Isomorphism Theorem, recovers the Rank-Nullity Theorem, an
important theorem in linear algebra that shows that dimensions are not lost under the action of a
linear transformation.
10.4. HOMOMORPHISMS AND QUOTIENT MODULES 503
i.e.,
nullity(T ) + rank(T ) = dim V.
a12 · · · a1m
r1 a11 r1
r2 a21 a22 · · · a2m r2
f . = .
.. .. .. ..
.. .. . . . .
rm an1 an2 · · · anm rm
14. Let M1 , M2 be left R-modules and let Ni be a submodule of Mi for i = 1, 2. Prove that
(M1 ⊕ M2 )/(N1 ⊕ N2 ) ∼
= (M1 /N1 ) ⊕ (M2 /N2 ).
15. Consider the cyclic group Z4 = hz | z 4 = 1i and the group ring R[Z4 ]. Let V = R4 and consider the
action of R[Z4 ] on V by scalar multiplication for real numbers and z acting on the standard basis
vectors by shifting through, them, i.e., z~ei = ~ei+1 for i = 1, 2, 3 and z~e4 = ~e1 .
(a) Show that the data equips V with the structure of a R[Z4 ]-module.
(b) Prove that the subspace W = {(x1 , x2 , x3 , x4 ) | x1 − x2 + x3 − x4 = 0} is a submodule.
(c) Determine the structure of V /W and state the action of R[Z4 ] on it.
17. Let F be a field, V be a finite-dimensional vector space over F , and U a vector subspace. Prove that
V ∼
= U ⊕ V /U .
10.5
Free Modules and Module Decomposition
One of the topics in the Outline that we have not presented so far is that of conveniently describing
elements in a R-module. We saw that vector spaces have bases and that each basis allows us to
describe a vector by its coordinates. If M is a module over an arbitrary ring R, many properties about
bases, rank, and coordinates do not hold generally and require careful attention. For example, not
all modules have a basis. However, like many other structures, we can define the notion of generating
subsets. With R-modules, the study of how to describe elements in the module leads naturally to
the question of internal structure within the module and then to decomposition of the module into
direct sums. This section studies these topics.
10.5. FREE MODULES AND MODULE DECOMPOSITION 505
Definition 10.5.1
Let R be a ring and let M be a left R-module. Let S be a subset of M . A linear combination
of elements in M is an expression
r1 m1 + r2 m2 + · · · + rk mk (10.8)
such that k is a positive integer, ri ∈ R and mi ∈ S for i = 1, 2, . . . , k. The set of all linear
combinations of S is called the span of S and is denoted by Span(S) or hSi or (S).
To emphasize that the coefficients ri come from the ring R, we sometimes call an expression like
(10.8) an R-linear combination.
As with vector spaces, again note that a linear combination consists of a finite expression of terms
from S. There are different notations for the set of linear combinations because modules generalize
a variety of algebraic structures and these notations come from these other structures.
It is sometimes convenient to refer to a linear combination of elements in S by the notation
finite
X
rm m (10.9)
m∈S
with this notation assumes that the coefficients rm are 0 for all but a finite number of m ∈ S. We
sometimes refer to such a collection {rm }m∈S as an almost zero family of elements in R whenever
all but a finite number of the rm are 0. If S is finite with S = {m1 , m2 , . . . , mk }, then the notation
in (10.9) is identical to (10.8).
Proposition 10.5.2
Let R be a ring and let S be a subset of a left R-module M . Then Span(S) is a submodule
of M .
Definition 10.5.3
A left R-module M is said to be generated by S if M = Span(S). A left R-module M
is said to be finitely generated if it has a finite subset S such that M = Span(S). A left
R-module is called cyclic if it can be generated by a single element.
If S is finite there are a few common alternate notations for the span. If S = {a}, some authors
denote Span(a) by Ra. If S = {a1 , a2 , . . . , ak }, then it is not uncommon to write Ra1 +Ra2 +· · ·+Rak
for Span(a1 , a2 , . . . , ak ).
The Z-module Z3 is generated by {(1, 0, 0), (0, 1, 0), (0, 0, 1)} so Z3 is finitely generated. In
contrast, let S = {p1 (x), p2 (x), . . . , pn (x)} be any finite subset of the Z-module Z[x]. The set of
Z-linear combinations consists of polynomials of the form
By properties of the degree, we see that for all p(x) ∈ Span(S), deg p(x) ≤ max{deg pi (x)}. Thus,
Span(S) is never equal to Z[x]. Hence, Z[x] is not finitely generated as a Z-module.
506 CHAPTER 10. MODULES AND ALGEBRAS
Definition 10.5.4
A subset S of a left R-module M is said to be linearly independent if
r1 m1 + r2 m2 + · · · + rk mk = 0 in M =⇒ m1 = m2 = · · · = mk = 0
In linear algebra, we saw that linear independence implied the uniqueness of coefficients in a
linear combination. The same holds true for modules.
Proposition 10.5.5
If S is a linearly independent set in a left R-module M , then if two linear combinations are
equal
finite
X finite
X
0
rm m = rm m,
m∈S m∈S
0
then rm = rm for all m ∈ S.
0
Proof. Note that in both summations, only a finite number of the coefficients rm and rm are nonzero,
0
hence the union of the subsets of S over which rm and rm are nonzero is a finite subset. Thus, equality
of the two linear combinations gives,
finite
X
0
(rm − rm )m = 0.
m∈S
0 0
Since S is linearly independent, rm − rm = 0 for all m ∈ S, so rm = rm .
10.5.3 – Basis
Following the organization for vector spaces, it is natural now to define the concept of a basis.
Definition 10.5.6
A subset S of a left R-module M is called a basis of M if M is generated by S and if S is
linearly independent. A left R-module is called free if it has a basis.
In Section 10.2.5, we saw that all vector spaces have bases. This key property of vector spaces
can be restated by saying that every F -vector space is a free F -module.
Example 10.5.7. In contrast, consider (Z/5Z, +) as a Z-module. Any subset S of Z/5Z except for
S = ∅ or S = {0} generates Z/5Z. However, 5m = 0 for all m ∈ S, so no subset S of M is linearly
independent. Thus, Z/5Z does not have a basis, i.e., is not a free module. 4
In the above example, the failure to have a basis came from the presence of torsion elements in
the R-module. However, there are other reasons that may prevent the existence of a basis as the
following example shows.
Example 10.5.8. Let R be a commutative ring that is not a PID (principal ideal domain) and let
I be a nonprincipal ideal. A left ideal I is a left R-module. Let S be any minimal generating subset
of I. Since I is nonprincipal, S contains two distinct nonzero elements r1 , r2 ∈ S. Then
while r2 6= 0 and −r1 6= 0. Hence, S is not linearly independent. Since no generating set of I is
linearly independent, I does not have a basis. √
√ Exercise 6.1.13 showed that I = (3, 2 + −5) is not a principal ideal in
As a√specific example,
R = Z[ −5]. Since Z[ −5] is an integral domain, it has no zero divisors (torsion elements in this
case) so the phenomenon of not having a basis does not stem only from the presence of torsion
elements. 4
If M is a free left R-module with a basis B, since B spans M , then every element in m ∈ M can
be written as a linear combination
m = r1 f1 + r2 f2 + · · · + rk fk
where {rb }b∈B is an almost zero family of ring elements. By Proposition 10.5.5, since B is linearly
independent, this linear combination is unique. As in linear algebra, this balance between linear
independence and spanning leads to the notion of coordinates.
Definition 10.5.9
The unique almost zero family {rb }b∈B in (10.10) is called the coordinates of m with respect
to B.
Suppose that M has a finite basis B = {m1 , m2 , . . . , mk }. The unique expression of an element
m ∈ M as a linear combination
m = r1 m1 + r2 m2 + · · · + rk mk ,
so f (cm) = (cr1 , cr2 , . . . , crk ) = c(r1 , r2 , . . . , rk ) = c(f (m)). Hence, the function f is an R-module
homomorphism. The function is obviously surjective and, since a basis is linearly independent,
Ker f = {0} so f is injective. Therefore, f is an isomorphism. This gives us the following theorem.
Theorem 10.5.10
Let M be a free left R-module with a finite basis B such that |B| = k. Then M ∼
= Rk .
Another key property of vector spaces proved in Section 10.2.5 is that any two bases of a vector
space have the same cardinality. In vector spaces, this gave us the notion of dimension. This result
is true for various classes of rings but not for arbitrary rings.
Theorem 10.5.11
Let R be a commutative ring with an identity 1 6= 0. Suppose that a free left R-module M
has a basis with k elements. Then every other basis is finite and has k elements.
Proof. First suppose that one basis {ei | 1 ≤ i ≤ k} is finite and that S is another basis, not
necessarily finite. Then for all i,
k
(
X 1 if j = j 0
aji bij 0 = (10.11)
i=1
0 6 j0.
if j =
Set
···
b11 b12 b1`
b21 b22 ··· b2`
a11 a12 ··· a1k 0 ··· 0 .. .. ..
..
a21 a22 ··· a2k 0 ··· 0
.
. . .
A= . bk1
and B = bk2 ··· bk` .
.. .. ..
..
. . . 0 ··· 0 0 0 ··· 0
a`1 a`2 ··· a`k 0 ··· 0
. .. .. ..
.. . . .
0 0 ··· 0
Then (10.11) in matrix multiplication is AB = I` . By Proposition 5.3.11, since R is commutative,
(det A)(det B) = 1. But this is a contradiction since det A = det B = 0 because of the column (resp.
row) of 0s in A (resp. B). This contradicts the assumption that k 6= `.
The following example describes a noncommutative ring and a module (itself) that has bases of
different cardinalities.
Example 10.5.12. Let M = Fun(N, Z) be the set of sequences into Z. This is naturally a Z-
module. Proposition 10.4.12 shows that the set of endomorphisms of M , namely R = EndZ (M ), is
a unital associative algebra, i.e., a ring with an identity 1 that is the identity endomorphism.
Consider the ring elements ϕe , ϕo ∈ R defined by ϕe (a)i = a2i and ϕo (a)i = a2i+1 , for all
sequences a = (ai )i∈N . The endomorphism ϕe (resp. ϕo ) takes a sequence a ∈ M and returns the
sequence of even (resp. odd) indexed terms. Consider also the elements ψe , ψo ∈ R that map a
sequence a ∈ M to a sequence with the terms,
( (
ai/2 if i even 0 if i even
ψe (a)i = and ψo (a)i =
0 if i is odd a(i−1)/2 if i is odd.
The terms of ψe (a) are the terms of a but spaced out every other term and with a 0 for the odd
terms. Consequently, ψe is the right inverse of φe , namely φe ◦ ψe = 1. Similarly φo ◦ ψo = 1. It is
also easy to tell that φe ◦ ψo = 0 and φo ◦ ψe = 0. Finally, we claim that
ψe ◦ φe + ψo ◦ φo = id,
10.5. FREE MODULES AND MODULE DECOMPOSITION 509
α ◦ id = α ◦ (ψe ◦ φe + ψo ◦ φo ) = (α ◦ ψe ) ◦ φe + (α ◦ ψo ) ◦ φo .
In particular, α ∈ Span(φe , φo ) and so the set {φe , φo } generates R as a left R-module. Furthermore,
suppose that γ1 ◦ φe + γ2 ◦ φo = 0. Multiplying (composing) on the right by ψe gives
0 = γ1 ◦ φe ◦ ψe + γ2 ◦ φo ◦ ψe = γ1 ◦ 1 = γ1
and similarly, γ2 = 0. Hence, {φe , φo } is a linearly independent set. Thus, {φe , φo } is a basis of
R. Consequently, we have produced a basis of R consisting of 1 element and a basis consisting of 2
elements. By Theorem 10.5.10, we deduce that R satisfies the unusual property that R ∼ = R2 . 4
Definition 10.5.13
Let R be an arbitrary ring and let M be a free R-module. If every basis of M has the same
cardinality, we call this cardinality the rank of M and denote it by rankR (M ).
The rank of a vector space over a field F is precisely the dimension. Theorem 10.5.11 shows that
if R is a commutative, unital ring and M is an R-module with a finite basis, then every basis of M
has the same cardinality. In this case, the concept of rank applies. As Example 10.5.12 shows, it is
possible for a module to be free but for the rank not to be defined.
By a trivial application of this terminology, we will call the trivial module {0} free of rank 0.
Definition 10.5.14
Let R be a commutative ring and let A be an R-algebra. Then A is a finitely generated
R-algebra if there exists a finite subset S ⊆ A such that the smallest subalgebra of A that
contains S is all of A.
According to this terminology, F [x] (equipped with it usual addition, multiplication, and scalar
multiplication) is not finitely generated as an F -module but is finitely generated as an F -algebra.
The same is true of F [x1 , x2 , . . . , xn ] for any positive integer n. In contrast, the polynomial ring
with a countably infinite number of variables F [x1 , x2 , x3 , . . .] gives an example of an F -algebra that
is not finitely generated.
(1) M = N1 + N2 + · · · + Nk ;
(2) (N1 + N2 + · · · + Ni ) ∩ Ni+1 = {0} for all i = 1, 2, . . . , k − 1.
Then M ∼
= N1 ⊕ N2 ⊕ · · · ⊕ Nk .
10.5. FREE MODULES AND MODULE DECOMPOSITION 511
Proof. We prove the theorem by induction on k. If k = 1, then the theorem is trivial. Now
suppose that the theorem holds for some positive integer k and suppose also that M has a collection
N1 , N2 , . . . , Nk+1 that satisfies (1) and (2). By (1), every element m ∈ M can be written as a sum
m = n1 + n2 + · · · + nk+1
where ni ∈ Ni for all 1 ≤ i ≤ k + 1. Suppose that m = n01 + n02 + · · · + n0k+1 as well. Then
so
(n0k+1 − nk+1 ) = ((n1 + n2 + · · · + nk ) − (n01 + n02 + · · · + n0k )) .
By condition (2), we deduce that both sides of the above equality must be 0. Thus,
The induction hypothesis then implies that ni = n0i for all 1 ≤ i ≤ k + 1. This shows that each
m ∈ M can be written uniquely as a sum of elements in N1 , N2 , . . . , Nk+1 . This defines a function
ϕ : M → N1 ⊕N2 ⊕· · ·⊕Nk+1 by setting ϕ(m) = (n1 , n2 , . . . , nk+1 ) whenever m = n1 +n2 +· · ·+nk+1
with ni ∈ Ni . This function is surjective and we just showed that it is injective. Furthermore, it is
easy to check that it is additive. Also, for all r ∈ R, since rni ∈ Ni for all ni ∈ Ni ,
Example 10.5.15 underscored the fact that in modules, for every submodule there does not neces-
sarily exist a complementary submodule. This observation leads us to distinguish a few possibilities.
Definition 10.5.18
Let R be a ring and let M be a nonzero R-module.
(1) If M contains a strict nonzero submodule N , i.e., 0 ( N ( M , then M is called
reducible. Otherwise, M is called irreducible (or simple).
Every irreducible module is trivially indecomposable. However, Example 10.5.15 shows that Z
is indecomposable but not irreducible. Similarly, every module is a direct sum of indecomposable
submodules but those submodules need not be irreducible. Hence, every module is completely
decomposable (if we had bothered to define such an expression) but not every module is completely
reducible. In particular, Z as a module of itself, is not completely reducible.
With the terminology of Definition 10.5.18, we can rephrase our above observation about vector
spaces by stating that if F is a field, then any nonzero F -module of dimension 2 or greater is
decomposable. In particular, the indecomposable modules are one-dimensional vector spaces (all
isomorphic to F ). It is also true that the irreducible modules are also the one-dimensional vector
spaces. This also implies that every finite-dimensional vector spaces is completely reducible.
With the examples given so far, the reader may suspect that the failure of a module to be
completely decomposable may come from the properties of the ring, like a discreteness property as
in Z or the possibility that torsion elements arise in quotient modules. However, consider the left
R[x]-module consisting of V = R2 , where x acts on R2 as multiplication by the matrix
2 1
A= .
0 2
512 CHAPTER 10. MODULES AND ALGEBRAS
Definition 10.5.19
If M1 is a submodule of M such that there exists another submodule M2 with M = M1 ⊕M2 ,
(1) M1 is called a summand of M ;
(2) M2 is called a complement of M1 in M ;
(3) We call M1 and M2 complementary submodules.
Proposition 10.5.20
Let M1 be a submodule of the the R-module M . Then M1 is a summand of M if and only
if there exists a projection of M onto M1 , i.e., a function π : M → M with π 2 = π and
π(M ) = M1 .
Proof. First suppose that M = M1 ⊕M2 for some complementary submodule M2 . Then the function
π : M1 ⊕ M2 → M1 ⊕ M2 defined by π(x, y) = (x, 0) has M1 for an image and is the identity
function on the submodule M1 , which we identify with {(m1 , 0) | m1 ∈ M1 }. Hence, π is a projection
homomorphism.
Conversely, suppose that there exists a projection of M onto M1 . Let m ∈ M . Obviously,
m = π(m) + (m − π(m)). Note that π(m) ∈ Im π = M1 and
7. Let R be a commutative ring. Prove that for any R-module homomorphism ϕ : Rn → Rm , there exists
a unique matrix A ∈ Mm×n (R) such that ϕ(r1 , r2 , . . . , rn ) can be written as (using column vectors)
r1
r2
A . .
..
rn
8. Let R be a commutative ring. Prove that an ideal I in R is a free R-module if and only if I is a
principal ideal generated by an element that is not a zero divisor.
9. Let R be an integral domain and let M be a finitely generated torsion module (i.e., Tor(M ) = M ).
Prove that M has a nonzero annihilator. Given an example of a torsion module whose annihilator is
the zero ideal.
10. Let R be an arbitrary ring and let M be a free module over R. Prove that if M has a finite basis,
then every basis of M is finite.
11. Show that the R-module HomR (Rm , Rn ) is free and of rank mn.
12. Let M, N1 , N2 be left R-modules. Prove the following R-module isomorphisms:
(a) HomR (N1 ⊕ N2 , M ) ∼= HomR (N1 , M ) ⊕ HomR (N2 , M ).
(b) HomR (M, N1 ⊕ N2 ) = ∼ HomR (M, N1 ) ⊕ HomR (M, N2 ).
13. Let R be an arbitrary ring and let M and M 0 be free left R-modules. Let B be a basis of M and let
B0 be a basis of M 0 . Prove that if |B| = |B0 | then M ∼
= M 0 . Do not assume that the bases are finite.
14. Find all the irreducible Z-modules.
15. Show that an R-module M is irreducible if and only if M 6= {0} and M is cyclic with every nonzero
elements as a generator.
16. (Schur’s Lemma) Show that if M1 and M2 are irreducible R-modules, then every homomorphism
between them is the 0 homomorphism or is an isomorphism. Deduce that EndR (M ) is a division ring.
[Hint: See Proposition 10.4.12.]
17. Let M be an R-module and suppose that M = M1 ⊕ M2 , where M1 and M2 are nonisomorphic
irreducible R-modules. Prove that EndR (M ) ∼
= EndR (M1 ) ⊕ EndR (M2 ) as unital associative R-
algebras.
18. An element in a ring R is called a central idempotent element if it is idempotent and in the center of
R.
(a) Prove that if e is a central idempotent element and M is a left R-module, then M = eM ⊕(1−e)M .
(b) Let M be a left R-module. Prove that the central idempotent elements in the ring EndR (M )
are the projection functions.
(c) Clearly explain how part (a) is tantamount to Proposition 10.5.20.
19. Show that any direct sum of free left R-modules is again free.
20. Prove that if R is a ring such that R ∼
= R2 , then R ∼
= Rk for any positive integer k.
10.6
Finitely Generated Modules over PIDs, I
Section 4.5 presented the Fundamental Theorem of Finitely Generated Abelian Groups (FTFGAG).
From the perspective of modules, abelian groups are simply Z-modules. Having discussed free mod-
ules and generating modules by subsets, it is natural to wonder if FTFGAG generalizes to modules
over other rings. The first key theorem in establishing all parts of FTFGAG was Theorem 4.5.9,
which establishes that every submodule of a free Z-module is also free. Example 10.5.8 in the pre-
vious section shows that a generalization to FTFGAG cannot extend to rings that are not principal
ideal domains. However, FTFGAG does have an equivalent for modules over a PID.
514 CHAPTER 10. MODULES AND ALGEBRAS
Both fields and Z are principal ideal domains. Consequently, since finite-dimensional vector
spaces and finitely generated abelian groups both fall under the umbrella of this section, we will see
how the theorems of this section resemble theorems already encountered in these two contexts.
Let R be a PID and M a finitely generated R-module generated by elements x1 , x2 , . . . , xn . If
(e1 , e2 , . . . , en ) is the standard (ordered) basis on Rn , then the function ϕ : Rn → M by
n
! n
X X
ϕ a i ei = ai xi (10.13)
i=1 i=1
Theorem 10.6.1
Let R be a PID, let L be a free R-module of rank n and let K be a nontrivial submodule.
The K is free of rank m ≤ n and there exists a basis {x1 , x2 , . . . , xn } of L and elements
d1 , d2 , . . . , dm ∈ R such that di |dj if i ≤ j and {d1 x1 , d2 x2 , . . . , dm xm } is a basis of K.
Furthermore, if {z1 , z2 , . . . , zn } is another basis of L such that {d01 z1 , d02 z2 , . . . , d0s zs } is a
basis of K satisfying d0i |d0j when i ≤ j, then s = m and d0i = di for 1 ≤ i ≤ m.
This theorem directly generalizes Theorem 4.5.9 for abelian groups. However, the proof for
that theorem used integer division. Consequently, the proof could possibly be modified to work
for any module over a Euclidean domain but a slightly different approach is required to generalize
it to modules over PIDs. Since a PID is a UFD, every nonzero, nonunit element has a unique
factorization. For the purposes of this section, we define the length `(a) of a nonzero element in a
PID as `(a) = 0 is a is a unit and `(a) = r if the unique factorization of a involves r (not necessarily
distinct) prime factors.
Lemma 10.6.2
Let {f1 , f2 , . . . , fn } be a basis of a free module L of rank n. Suppose that s, t ∈ R are
relatively prime with ss0 + tt0 = 1 for some s0 , t0 ∈ R. There exist elements f10 , f20 ∈ L such
that f1 = sf10 − t0 f20 and f2 = tf10 + s0 f20 and {f10 , f20 , f3 , . . . , fn } is another basis of L.
c1 f10 + c2 f20 + c3 f3 + · · · + cn fn = 0
=⇒ (s0 c1 − tc2 )f1 + (t0 c1 + sc2 )f2 + c3 f3 + · · · cn fn = 0.
Thus, c3 = · · · = cn = 0 and
0
s −t c1 0 c1 s t 0 0
= =⇒ = = .
t0 s c2 0 c2 −t0 s0 0 0
Proof (Theorem 10.6.1). Let {f1 , f2 , . . . , fn } be any basis of L. All nonzero elements of H are
expressed as linear combinations
c1 f1 + c2 f2 + · · · + cn fn ,
with ci ∈ R.
Let d1 be the minimum value of
z1 = d1 f1 + c2 f2 + · · · + cs fs .
For j = 2, . . . , n, consider the ideals (d1 , cj ). Since R is a PID, (d1 , cj ) = (aj ) for some nonzero
aj ∈ R. Thus, aj | d1 and aj | cj . We claim that for each j, the elements aj arise as coefficients
of the first vector in some ordered basis of L. Suppose that aj = sd1 + tcj . Then we know that s
and t are relatively prime with ss0 + tt0 = 1 for some s0 , t0 ∈ R. By Lemma 10.6.2, L has a basis
{f10 , f2 , . . . , fj0 , . . . , fn } where f1 = sf10 − t0 fj0 and fj = tf10 + s0 fj0 . With respect to this new basis,
In particular, a1 = (sd1 + tcj ) arises as a coefficient of the first basis vector in some basis of L. By
the minimality of `(dj ), we deduce that `(d1 ) ≤ `(aj ). However, since aj | d1 , we deduce by unique
factorization that aj is an associate of d1 . In particular, (d1 ) = (aj ) and d1 | cj . Setting cj = d1 qj
for 2 ≤ j ≤ n, we have
z1 = d1 (f1 + q2 f2 + · · · + qn fn ).
z2 = k1 x1 + c2 f2 + · · · + cn fn ∈ K
d2 f2 + c3 f3 + · · · + cn fn = d2 (f2 + q3 f3 + · · · + qn fn ).
{x1 , . . . , xm , fm+1 , . . . , fm }
of L such that {d1 x1 , d2 x2 , . . . , dm xm } is a basis of K for some positive integers di such that di | di+1
for 1 ≤ i ≤ m − 1.
Finally, suppose that {z1 , z2 , . . . , zn } is another basis of L such that {d01 z1 , d02 z2 , . . . , d0s zs } is
a basis of K satisfying d0i |d0j when i ≤ j. Since R is commutative, s = m by Theorem 10.5.11.
The definition of d1 by the minimality condition in (10.14), implies that d1 = d01 . In addition, the
subsequent definitions for di with i ≥ 2, imply that d0i = di for all 1 ≤ i ≤ m.
516 CHAPTER 10. MODULES AND ALGEBRAS
Definition 10.6.3
In the result of Theorem 10.6.1, we call the list of factors d1 , d2 , . . . , dm (only defined up
to multiplication by a unit) the invariant factors of the submodule K. We call the basis
{x1 , x2 , . . . , xn } of L ∼
= Rn a K-preferred basis.
Note that the rank and invariant factors of the submodule K are uniquely defined whereas K
usually has more than one preferred basis.
We can organize the data for the components of the yj as the columns of a matrix in Mn×` (R),
a11 a12 ··· a1`
a21 a22 ··· a2`
A= . .. .
.. ..
.. . . .
an1 an2 ··· an`
According to Theorem 10.6.1, there exists another ordered basis (x1 , x2 , . . . , xn ) of L and another
generating set {z1 , z2 , . . . , zm } of K such that zi = di xi for 1 ≤ i ≤ ` such that di | dj whenever
i ≤ j. This leads to the following proposition.
Proposition 10.6.4
For all matrices A ∈ Mn×` (R), there exist S ∈ GLn (R) and T ∈ GL` (R) such that
0 ··· 0
d1
0 d2 · · · 0
..
.. . . .. 0
SAT = .
. . .
(10.15)
0
0 · · · dm
0 0
where the nonzero entries di satisfy di | dj whenever i ≤ j. Furthermore, if S 0
and T 0 are other invertible matrices such that S 0 AT 0 is a diagonal matrix with entries
d01 , . . . , d0s , 0, . . . , 0 with di | dj whenever i ≤ j, then s = m and d0i is an associate of di .
Proof. Interpret A as the matrix in which the columns are the B-coordinates of elements for some
generating set X of a submodule K of a free module L. Multiplying A on the left by an invertible
matrix S ∈ Mn×n (R) corresponds to changing the basis of L and adjusting the coordinates of the
elements in the generating set X. Multiplying on the right by an invertible matrix T ∈ M`×` (R)
corresponds to replacing the generating set X with a set X 0 obtained as linear combinations of the
elements in X but defined by the matrix T . However, since T is invertible, Span(X) = Span(X 0 )
and hence multiplying A on the right by T does not change the corresponding submodule K.
The Proposition now follows by Theorem 10.6.1.
10.6. FINITELY GENERATED MODULES OVER PIDS, I 517
Definition 10.6.5
The diagonal matrix in (10.15) is called the Smith normal form of A. The matrix S is
called the left-reducing matrix, while T is called the right-reducing matrix.
In linear algebra, the reduced row echelon form of a matrix is obtained via the Gauss-Jordan elim-
ination algorithm. This algorithm involves three row operations applied to a matrix A ∈ Mn×m (F ),
where F is a field: (1) interchange two rows (Ri ↔ Rj ); (2) scale (multiply) a row by an invertible
(nonzero) constant (Ri → cRi ); (3) replace row Ri with row Ri + cRj where Rj is another row and
c is any scalar (Ri → Ri + cRj ).
In the algorithm to find the Smith normal form of a matrix, along with its left-reducing matrix and
right-reducing matrix, we use not only row operations but also column operations. If A ∈ Mn×` (R)
is a matrix, a row or column operation on A corresponds equivalently to a matrix operation, as
stated in the following list. Recall that the matrix Eij is the square matrix that is 0s everywhere
except for a 1 in the (i, j) entry.
(1) Row swap operation Ri ↔ Rj . Interchanges row Ri and row Rj . The corresponding ma-
trix operation is A 7→ M A, where M is the identity matrix but with the ith and jth row
interchanged.
(2) Scale row operation Ri → aRi , where a is a unit a ∈ U (R). Replaces the row Ri with the a
multiple of itself. The corresponding matrix operation is A 7→ M A, where M = I + (a − 1)Eii .
(3) Replace row operation Ri → Ri + aRj , where i 6= j. Replaces row Ri with the row vector
Ri + aRj . (Note that a can be an arbitrary element from the ring R.) The corresponding
matrix operation is A 7→ M A where M = I + aEij .
(4) Column swap operation Ci ↔ Cj . Interchanges the column Ri and the column Cj . The
corresponding matrix operation is A 7→ AN , where N is the identity matrix but with the ith
and jth row interchanged.
(5) Scale column operation Ci → aCi , where a ∈ U (R). Replaces the column Ci with the a
multiple of itself. The corresponding matrix operation is A 7→ AN , where N = I + (a − 1)Eii .
(6) Replace column operation Ci → Ci + aCj , where i 6= j. Replaces column Ci with the column
Ci + aCj . The corresponding matrix operation is A 7→ AN , where N = I + aEji .
Note that for each row operation M is an invertible n × n matrix and for each column operation
N is an invertible ` × ` matrix. Consequently, as one performs row operations on A, they correspond
to a product of invertible n × n matrices from right to left. A sequence of column operations on A
corresponds to a product of invertible ` matrices from left to right.
Example 10.6.6. Let R = Z and consider the free module Z3 equipped with the standard ba-
sis {f1 , f2 , f3 }. Consider the module K = Span(y1 , y2 , y3 , y4 ) in Z3 where y1 = (2, 0, −1), y2 =
(9, −6, 6), y3 = (0, 3, 12), and y4 = (3, 6, 5). To find the Smith normal form of A, we perform the
following sequence of row and column operations:
2 9 0 3 1 15 12 8
−−−−−−−−−−→
A = 0 −6 3 6 R1 → R1 + R3 0 −6 3 6
−1 6 12 5 −1 6 12 5
1 15 12 8 1 0 12 8
−−−−−−−−−−→ −−−−−−−−−−−−→
R3 → R3 + R1 0 −6 3 6 C2 → C2 − 15C1 0 −6 3 6
0 21 24 13 0 21 24 13
1 0 0 8 1 0 0 0
−−−−−−−−−−−−→ −−−−−−−−−−−→
C3 → C3 − 12C1 0 −6 3 6 C4 → C4 − 8C1 0 −6 3 6
0 21 24 13 0 21 24 13
518 CHAPTER 10. MODULES AND ALGEBRAS
1 0 0 0 1 0 0 0
−−−−−−→ −−−−−−−−−−−→
C2 ↔ C4 0 6 3 −6 R3 → R3 − 2R2 0 6 3 −6
0 13 24 21 0 1 18 33
1 0 0 0 1 0 0 0
−−−−−−→ −−−−−−−−−−−→
R2 ↔ R3 0 1 18 33 R3 → R3 − 6R2 0 1 18 33
0 6 3 −6 0 0 −105 −204
1 0 0 0 1 0 0 0
−−−−−−−−−−−−→ −−−−−−−−−−−−→
C3 → C3 − 18C2 0 1 0 33 C4 → C4 − 33C2 0 1 0 0
0 0 −105 −204 0 0 −105 −204
1 0 0 0 1 0 0 0
−−−−−−−−−−−→ −−−−−−−−−−−−→
C4 → C4 − 2C3 0 1 0 0 C3 → C3 + 18C4 0 1 0 0
0 0 −105 6 0 0 3 6
1 0 0 0
−−−−−−−−−−−→
C4 → C4 − 2C3 0 1 0 0 .
0 0 3 0
This is the Smith normal form of A. We read that the invariant factors of K are 1,1,3.
By keeping track of the row operations, and multiplying in order the corresponding matrix
operations from right to left, we calculate that
1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 1 0 1
S = 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 = 1 −2 1 .
0 −6 1 0 1 0 0 −2 1 1 0 1 0 0 1 −6 13 −12
By keeping track of the column operations, and multiplying in order the corresponding matrix
operations from left to right, we calculate that
1 −8 −138 261
0 0 18 −35
T = .
0 0 −35 68
0 1 36 −69
It is easy to check by direct computation that
2 9 0 3 1 0 0 0
S 0 −6 3 6 T = 0 1 0 0 .
−1 6 12 5 0 0 3 0
Having found the matrix S, we can also find a K-preferred basis. As in linear algebra, we can
think of the matrix A as representing a module homomorphism ϕ : Z4 → Z3 , given with respect
to the standard bases on Z4 and Z3 , respectively. The matrices S and T correspond to change of
bases on Z3 and Z4 , respectively. Then matrix S is the change of basis matrix from the standard
basis to the K-preferred basis. Hence, the columns of S −1 are the change of basis matrix from the
K-preferred basis to the standard basis. Consequently, the columns of S −1 represent the coordinates
of the K-preferred basis expressed in reference to the standard basis.
In this example, in M3×3 (Z), we have
−1
1 0 1 2 −13 −2
S −1 = 1 −2 2 = 0 6 1 .
−6 13 −12 −1 13 2
In summary, the invariant factors of the submodule K are d1 = 1, d2 = 1, and d3 = 3. Furthermore,
with respect to the ordered basis (x1 , x2 , x3 ) of Z3 given by x1 = (2, 0, −1), x2 = (−13, 6, 13), and
x3 = (−2, 1, 2), a basis of K is {x1 , x2 , 3x3 }.
Looking forward to calculations that we will consider in the next section, we can now easily
determine the quotient module Z3 /K. By the result of Exercise 10.4.14, we deduce that
Z3 /K = (Zx1 ⊕ Zx2 ⊕ Zx3 )/(Zx1 ⊕ Zx2 ⊕ Z3x3 )
∼ (Zx1 )/(Zx1 ) ⊕ (Zx2 )/(Zx2 ) ⊕ (Zx3 )/(Z3x3 )
=
∼
= Z/3Z.
10.7. FINITELY GENERATED MODULES OVER PIDS, II 519
Observe that at the end of the above elimination process, we performed three consecutive column
replacement operations on C3 and C4 . These amount to three steps in the Euclidean algorithm to
find the greatest common divisor of −105 and −204. If the numbers in this example had been larger,
the procedure might have required similar back-and-forth steps at earlier stages. 4
The strategy involved in this example involved: (1) finding the least greatest common divisor
across a row or column of the A part of the augmented matrix, (2) obtaining this least greatest
common divisor by row or column operations, (3) moving it to the upper left corner as possible
(pivot position), and, once a pivot position is created, (4) eliminate all nonzero terms in the row
and column of the pivot by using replace row and column operations.
The general strategy we just described applies for Z. For a general PID, we look for a greatest
common divisor that has the least length as defined just before Lemma 10.6.2.
Maple Function
with(LinearAlgebra); Loads the linear algebra theory package. (Many commands.)
ColumnOperation Depending on the options, perform any desired column operation
on a matrix.
RowOperation Depending on the options, perform any desired row operation on
a matrix.
SmithForm(A); Calculates the Smith normal form of the matrix A. Maple makes
an educated guess at the ring R based on the nature of the con-
tent of A. If the ring is a polynomial ring F [x], the variable
x should appear as an option as SmithForm(A,x);. The syn-
tax U,V:=SmithForm(A,output=[’U’,’V’]); also calculates the
left-reducing matrix U and the right-reducing matrix V .
10.7
Finitely Generated Modules over PIDs, II
In the previous section, we proved a theorem about the structure of submodules of free modules
over a PID. This theorem led naturally to the concept of the Smith normal form for a matrix
A ∈ Mn×` (R), where R is a PID. This section develops a few applications of Theorem 10.6.1 and of
the Smith normal form.
x = c1 f1 + c2 f2 + · · · + cn fn .
520 CHAPTER 10. MODULES AND ALGEBRAS
Thus, the components of ϕ(x) are given by matrix multiplication of A by the n-tuple (c1 , c2 , . . . , cn )
viewed as a column “vector.” As in linear algebra, the matrix A is called the matrix of ϕ with respect
to B and B 0 .
Proposition 10.7.1
Let R be a PID and let ϕ : Rn → Rm be an R-module homomorphism. Then there exists
an ordered basis B of Rn and an ordered basis B 0 of Rm such that, with respect to these
bases, the matrix in Mm×n (R) of ϕ with respect to B and B 0 is
d1 0 · · · 0
0 d2 · · · 0
..
.. . . .. 0
.
. . .
0 0 · · · dr
0 0
where the nonzero entries di satisfy di | dj if i ≤ j.
Proof. Possibly after relabeling, by Theorem 10.6.1, Ker ϕ is free submodule of Rn and there
is an ordered basis B = (x1 , x2 , . . . , xn ) of Rn such that {c1 xr+1 , c2 xr+2 , . . . , cn−r xn } satisfying
ci | ci+1 for 1 ≤ i ≤ n − r, is a basis of K. Denote by N the submodule of Rn spanned by
{x1 , x2 , . . . , xr }. Then ϕ restricts to an injective homomorphism ϕ N : N → Rm . It is easy to check
that {ϕ(x1 ), ϕ(x2 ), . . . , ϕ(xr )} is linearly independent, which implies that Im ϕ is free of rank r.
By Theorem 10.6.1, there is a basis B 0 = {y1 , y2 , . . . , ym } of Rm such that {d1 y1 , d2 y2 , . . . , dr yr }
is a basis of Im ϕ. For i = 1, . . . , r, define x0i ∈ L as the unique element such that ϕ(x0i ) = yi . Then,
with respect to the basis B = {x01 , . . . , x0r , xr+1 , . . . , xn } and B 0 , the matrix of ϕ is given by (10.15).
The matrix in Proposition 10.7.1 is obviously the Smith normal form of the matrix of ϕ with
respect to the bases B and B 0 . By the uniqueness properties of the Smith normal form, we can call
this matrix the Smith normal form of ϕ.
Proposition 10.7.1 leads to the following interesting consequence for homomorphism on a free
modules over a PID. Recall from linear algebra, that for a linear transformation T : V → V on
a finite-dimensional vector space V , only possibilities occur: (1) T is invertible, in which cases it
is both surjective and injective; (2) T is not invertible, in which case T is neither injective nor
surjective. With modules over a general PID, another possibility emerges.
Corollary 10.7.2
Let R be a PID and let ϕ : Rn → Rn be an R-module homomorphism between free modules
of rank n. Let A ∈ Mn×n (R) be the matrix of ϕ with respect to a basis B on the domain
and a basis B 0 on the codomain. One of the following mutually disjoint cases occurs:
(1) ϕ is not injective and not surjective if and only if det A = 0.
(2) ϕ is a bijection if and only if det A is a unit.
(3) ϕ is injective but not surjective if det A is a nonzero, nonunit element.
When F is a field, every element in F is either 0 or a unit. Consequently, case (3) in the above
corollary never occurs for linear transformations on a finite-dimensional vector space.
Theorem 10.7.3
Let R be a PID and let M be a finitely generated R-module. Then
M∼
= Rr ⊕ R/(d1 ) ⊕ R/(d2 ) ⊕ · · · ⊕ R/(dm ) (10.16)
with di | dj and d0i | d0j whenever i ≤ j, then r = s and d0i is an associate to di for 1 ≤ i ≤ m.
M∼
= (Ry1 ⊕ Ry2 ⊕ · · · ⊕ Ryn )/(Rdy1 ⊕ Rd2 y2 ⊕ · · · ⊕ Rdm ym ).
M∼
= (Ry1 )/(Rd1 y1 ) ⊕ (Ry2 )/(Rd2 y2 ) ⊕ · · · ⊕ (Rym )/(Rdm ym ) ⊕ Rn−m
∼
= Rm−n ⊕ R/(d1 ) ⊕ R/(d2 ) ⊕ · · · ⊕ R/(dm ).
If di is a unit, then R/(di ) = {0} and we remove all such terms from the direct sum.
The uniqueness of the decomposition follows from the uniqueness of the invariant factors as
established in Theorem 10.6.1.
Definition 10.7.4
The decomposition of M given in Theorem 10.7.3 is called the invariant factors decompo-
sition of M . The constant r is called the free rank of M and the elements d1 , d2 , . . . , dm
(only defined up to multiplication by a unit) are called the invariant factors.
For a submodule of a free module, we allowed the invariant factors to include units. However,
taking the quotient module Rn / Ker ϕ eliminates the invariant factors of Ker ϕ that are units.
We observe that for a module M satisfying (10.16), the torsion submodule is
Tor(M ) ∼
= R/(d1 ) ⊕ R/(d2 ) ⊕ · · · ⊕ R/(dm ).
This follows from Exercise 10.3.21 and the fact that the torsion submodule of a free module over a
PID is trivial. (See Exercise 10.7.1.)
In the language of annihilators, there exist elements z1 , z2 , . . . , zm ∈ Tor(M ) such that Tor(M ) =
Rz1 ⊕ Rz2 ⊕ · · · ⊕ Rzm with
These annihilators are called the invariant factor ideals because Ann(zi ) = (di ) for all i.
522 CHAPTER 10. MODULES AND ALGEBRAS
As with the fundamental theorem of finitely generated abelian groups, this corresponding theorem
for modules over principal ideal domains has an elementary divisors form.
Let a be a nonzero element in a PID R. Every PID is a unique factorization domain. So
a = upα 1 α2 αs
1 p2 · · · ps where u is a unit and the pi are some distinct primes (also irreducible) in R.
αj
Since the factorization is unique in R, for i 6= j the ideals (pα
i ) and (pj ) are comaximal, in that
i
R/(a) ∼
= R/(pα 1 α2 αs ∼ α1 α2 αs
1 p2 · · · ps ) = R/(p1 )(p2 ) · · · (ps )
∼ α α α
= R/(p 1 ) ⊕ R/(p 2 ) ⊕ · · · ⊕ R/(p s ). (10.17)
1 2 s
Returning to the decomposition (10.16), each R/(di ) decomposes according to (10.17). Recall
that if p and q are prime elements in a PID, then (p) = (q) if and only if p and q are associates of
each other. Hence, we can write (10.17) as
R/(a) ∼
= R/P1α1 ⊕ R/P2α2 ⊕ · · · R/Psαs , (10.18)
where each Pi is a prime ideal. Furthermore, in this decomposition (10.18), the primes ideals Pi and
the powers αi are uniquely defined again by unique factorization. So applying this decomposition
to each R/(di ) in the invariant factors decomposition leads to the following restatement of the
fundamental theorem for finitely generated PID modules.
Theorem 10.7.5
Let R be a PID and let M be a finitely generated R-module. Then
M∼
= Rr ⊕ R/P1α1 ⊕ R/P2α2 ⊕ · · · R/Ptαt (10.19)
αj
for some integer r ≥ 0 and some (not necessarily distinct) prime ideal powers Pj in R.
Furthermore, this decomposition is unique in the following sense: if
Proof. The Chinese Remainder Theorem along with the invariant factor decomposition gives the
decomposition (10.19). For the uniqueness, it suffices to recover a unique invariant factor form from
(10.19) and the uniqueness of this theorem will follow from the uniqueness proven in Theorem 10.7.3.
Let Q1 , Q2 , . . . , Qk be the complete set of distinct prime ideals appearing anywhere in the de-
composition (10.19). Let ` be the maximum number of times any Qi appears in the decomposition.
Then the torsion part of M in (10.19) can be written as
k `
α
M M
Tor(M ) = R/Qi ij
i=1 j=1
where some of the αij may be 0 but, for each i, we have αi1 ≤ αi2 ≤ · · · ≤ αi` . Now for each
α α α
j, define the product ideal Ij = Q1 1j Q2 2j · · · Qk kj . Since the prime ideals Qi are distinct, by the
Chinese Remainder Theorem,
k
α
M
R/Ij = R/Qi ij .
i=1
Consequently,
Tor(M ) = R/I1 ⊕ R/I2 ⊕ · · · ⊕ R/I` . (10.20)
α α
Furthermore, since αij ≤ αi,j+1 , then for each i, we have Qi ij
⊇ Qi i,j+1 . Hence, I1 ⊇ I2 ⊇ · · · ⊇ I`
so the expression (10.20) is the invariant factor form of Tor(M ).
10.7. FINITELY GENERATED MODULES OVER PIDS, II 523
Definition 10.7.6
We call the decomposition in (10.19) the elementary divisor form of a finitely generated
α
module over a PID. If Qi = (pi ) for each prime ideal Qi , then the elements pi ij (only
defined up to multiplication by a unit) are called the elementary divisors.
As we pointed out in motivation to this section, the Fundamental Theorem of Finitely Generated
Modules over a PID subsumes the Fundamental Theorem of Finitely Generated Abelian Groups.
Consequently, the structure theorems we just presented should feel familiar to those in Section 4.5.
However, as the following example illustrates, the torsion modules of PIDs that are not Z might not
look as familiar.
Example 10.7.7. Let R = Q[x] and let K be the submodule of Q[x]2 with invariant factors d1 =
(x2 − 2) and d2 = (x2 − 2)(x − 3). Then the invariant factor form of Q[x]2 /K is
M = M1 ⊕ M2 ⊕ · · · ⊕ Mk .
16. Let R = Z[x] and consider the free module M = R2 . Let f1 = (2, x), f2 = (x2 , x + 2), and f3 = (x, 3).
Prove that the submodule Span(f1 , f2 , f3 ) is not free.
17. Let R be a PID and let ϕ : Rn → Rm be an R-module homomorphism. Prove that the invariant
factors of Ker ϕ are all 1.
18. Give an example of an integral domain and a nonzero torsion module M such that Ann(M ) = 0.
Prove that if M is finitely generated, then Ann(M ) 6= 0. (Do not assume that R is a PID.)
19. Let R be a PID. Prove that if an R-module M is a summand of a free R-module, then M is free.
10.8
Applications to Linear Transformations
At various points in this textbook, we saw effective applications of the Fundamental Theorem of
Finitely Generated Abelian Groups. However, its more general counterpart for finitely generated
modules over a PID, Theorem 10.6.1, leads to other profound consequences. This section and the
next two analyze consequences for F [x]-modules and implications for linear transformations between
vector spaces over the field F .
Recall that if F is a field, then an F [x]-module consists of a vector space V along with a linear
transformation T : V → V and that the action of F [x] on V is determined by xv = T (v) for all
v ∈ V and F [x]-linearity of the module action. Note that F [x] and any free F [x]-module F [x]n is an
infinite-dimensional vector space over F . If V is finite -, then it is generated by its basis elements
as an F [x]-module so it is finitely generated. By the Fundamental Theorem of Finitely Generated
Modules over a PID, the module V equipped with T has free rank 0 and is a torsion F [x]-module.
Proposition 10.8.1
Let T : V → V be a linear transformation. A value λ ∈ F is an eigenvalue of T if
and only if it is root of cT (x) = 0. For each, eigenvalue λ, the associated eigenspace is
Eλ = Ker(A − λI).
A particularly nice situation occurs when the matrix A is diagonalizable. In other words, there
exists an invertible matrix M such that A = M DM −1 , where D is a diagonal matrix. In this
case, the direct sum of all the eigenspaces of A is all of V . This is a restrictive situation since it
corresponds to a dilation by a factor of λ in each Eλ , with the linear transformation completed by
linearity on the rest of V .
There are two ways in which diagonalization fails to occur.
First, a linear transformation might fail to have eigenvalues if cT (x) has no roots in F . Consider
the linear transformation T of rotation in R3 by π/2 around the z-axis. The associated matrix with
respect to the standard basis is
0 −1 0
A = 1 0 0
0 0 1
and cT (x) = det(xI −A) = (x−1)(x2 +1). The only root in R is 1 and the eigenspace E1 is the z-axis,
along which the rotation acts as the identity. (Over the field extension C of R, the characteristic
polynomial splits completely into cT (x) = (x − 1)(x + i)(x − i), but the geometric interpretation is
no longer the same as in the vector space R3 .)
Second, even if the characteristic polynomial splits completely in F [x], if an eigenvalue λ has
algebraic multiplicity 2 or greater, then the eigenspace may still only have a dimension of 1. For
example, consider the matrix
3 4
A= .
−1 7
The characteristic polynomial is cA (x) = (x − 3)(x − 7) + 4 = (x − 5)2 . So 5 is the only eigenvalue
and it has algebraic multiplicity of 2. However,
−2 4 2
E5 = Ker = Span ,
−1 2 1
526 CHAPTER 10. MODULES AND ALGEBRAS
so E5 is not all of R2 . We call dim Eλ the geometric multiplicity of λ. Then for any eigenvalue λ,
V ∼
= F [x]/(a1 (x)) ⊕ F [x]/(a2 (x)) ⊕ · · · ⊕ F [x]/(am (x)) (10.22)
for some polynomials ai (x) with a1 (x) | a2 (x) | · · · | am (x). Furthermore, these polynomials are
defined uniquely up to a unit. The units in F [x] are the nonzero constant polynomials. Therefore,
if we require that the polynomials ai (x) be monic, then they are uniquely defined. Note that not
only is V a torsion module but Ann(V ) = (am (x)).
Definition 10.8.2
The minimal polynomial of T : V → V , denoted by mT (x), is the monic polynomial am (x).
We use the term minimal polynomial (as opposed to largest invariant factor) because mT (x) is
the unique monic, polynomial of least degree such that mT (x) · V = {0}.
We consider an individual factor ai (x). Each summand F [x]/(ai (x)) in the invariant factor
decomposition (10.22) is a vector space over F . Suppose that deg ai (x) = k and that ai (x) = xk +
pk−1 xk−1 +· · ·+p1 x+p0 . Then the natural ordered basis for F [x]/(ai (x)) is Bi = (1, x, x2 , . . . , xk−1 ).
In particular, dimF F [x]/(ai (x)) = deg ai .
Now x acts on V according to the linear transformation T . So consider the action of x by left
multiplication on F [x]/(ai (x)). Obviously, x · xj = xj+1 for 0 ≤ j ≤ k − 2 but
since ai (x) = 0 in F [x]/(ai (x)). Hence, the matrix corresponding of the action of x is
0 0 0 ··· 0 −p0
1
0 0 ··· 0 −p1
0 1 0 ··· 0 −p2
. (10.23)
0
0 1 ··· 0 −p3
.. .. .. . . .. ..
. . . . . .
0 0 0 ··· 1 −pk−1
Definition 10.8.3
If a(x) = xk + pk−1 xk−1 + · · · + p1 x + p0 , the matrix in (10.23) is called the companion
matrix of a(x) and is denoted by Ca(x) .
By the direct sum decomposition (10.22), there are vectors in V that correspond to the vectors
in each of the bases Bi . Furthermore, the union of all these bases constitute a basis B of V and
m
X
dim V = deg ai (x).
i=1
10.8. APPLICATIONS TO LINEAR TRANSFORMATIONS 527
With respect to this basis B, the matrix of T is the n × n block diagonal matrix
0 0
Ca1 (x) ···
0 Ca2 (x) · · · 0
.. . (10.24)
.. .. . .
. . . .
0 0 · · · Cam (x)
Definition 10.8.4
A matrix A ∈ Mn×n (F ) is in rational canonical form if it is in the block diagonal form in
(10.24) where ai (x) are monic polynomials in F [x] such that a1 (x) | a2 (x) | · · · | am (x). A
rational canonical form for a linear transformation T : V → V is any matrix representing
T that is in rational canonical form.
The uniqueness result of Theorem 10.7.3 along with the characterization of isomorphisms between
F [x]-modules given in Example 10.4.6 leads to the following classification theorem.
Theorem 10.8.5
Let V be a finite-dimensional vector space over the field F and let T : V → V be a linear
transformation. Then T has a rational canonical form, which is unique. Furthermore,
S : V → V and T : V → V are similar linear transformations, if and only if S and T have
the same rational canonical form.
In terms of matrices, there exists a bijection between similarity classes of n × n matrices and
the set of matrices in rational canonical form. In other words, the set of n × n matrices in rational
canonical form is a complete set of distinct representatives of the similarity classes on Mn×n (F ).
It is a straightforward exercise in linear algebra to prove that the determinant of a block diagonal
matrix is a product of the determinants of the blocks:
A1 0 · · · 0
0 A2 · · · 0
det . .. = (det A1 )(det A2 ) · · · (det Am ). (10.25)
. .
. . .
. . . .
0 0 ··· Am
Consequently, we deduce that the invariant factors associated to a linear transformation T and the
characteristic are related as follows.
Proposition 10.8.6
Let T : V → V be a linear transformation on a vector space over a field F . Then the
characteristic polynomial cT (x) is the product of the invariant factors of T .
Proof. If A is a block diagonal matrix, then xI − A is also a block diagonal matrix. By (10.25),
det(xI − A) = det(xI − A1 ) det(xI − A2 ) · · · det(xI − Am ).
Consequently, to prove the proposition, it suffices to show that the characteristic polynomial of the
companion matrix Ca(x) of the monic polynomial a(x) is again a(x). Let Ca(x) be the companion in
(10.23). Then, the characteristic polynomial is
x 0 0 ··· 0 p0
−1 x 0 · · · 0 p1
0 −1 x · · · 0 p2
det . . (10.26)
.. .. . . .. ..
.. . . . . .
0 0 0 ··· x pk−2
0 0 0 · · · −1 x + pk−1
528 CHAPTER 10. MODULES AND ALGEBRAS
We perform Laplace expansion down the rightmost column. For all i with 1 ≤ i ≤ k, then (i, k)-
minor of this determinant is the determinant of the ((i − 1) + (k − i)) × ((i − 1) + (k − i)) block
diagonal matrix
x 0 0
−1 x 0
0 −1 x
0
. ..
.
−1 x 0
0 −1 x
0 0 0 −1
..
.
By (10.25), the (i, k) minor is (−1)k−i xi−1 . Consequently, the Laplace expansion of (10.26) is
ψ ϕ
F [x]n F [x]n Fn
where ψ is the R-module homomorphism, which, with respect to the standard basis on F [x]n , has
the matrix xI − A. The R-module homomorphism ϕ maps standard basis elements of F [x]n to
similarly indexed standard basis elements of F n . Hence,
p1 (x)
p2 (x)
ϕ : . 7−→p1 (x) · e1 + p + 2(x) · e2 + · · · + pn (x) · en
..
pn (x)
= p1 (A)(e1 ) + p2 (A)(e2 ) + · · · + pn (A)(en ).
Hence, v ∈ Ker ϕ if and only if b is the zero vector. We deduce that Im ψ = Ker ϕ. In particular,
the columns of xI − A, each as an element in F [x]n , give a generating set for Ker ϕ.
Proposition 10.8.7
The Smith normal form of xI − A as an element in Mn×n (F [x]) is the n × n diagonal matrix
0
In−m
a1 (x)
a2 (x) ,
0
..
.
am (x)
where the a1 (x), a2 (x), . . . , am (x) are the invariant factors of T : V → V as an F [x]-module.
Proof. Suppose the F [x]-module consisting of the vector space V and linear transformation T has
V ∼
= F [x]/(a1 (x)) ⊕ F [x]/(a2 (x)) ⊕ · · · ⊕ F [x]/(am (x)).
Then since it is a torsion module, the normal form contains no 0s on the diagonal. Also, the nonzero,
nonunit terms on the diagonal of the normal form are precisely the invariant factors.
Since the Smith normal form of two similar matrices is equal, then we can also talk about the
Smith normal form of a linear transformation T : V → V , where V is finite dimensional.
This shows that we can calculate the invariant factors of a matrix (or linear transformation) by
finding the Smith normal form of A. Furthermore, we can use the method of Example 10.6.6 to find
a basis of V = F n with respect to which the linear transformation T has rational normal form.
Example 10.8.8. Consider the linear transformation T : R4 → R4 that with respect to the standard
basis has the matrix
2 −4 6 −6
0 6 −6 6
A= −1 2 −2 3 .
0 −2 3 −1
530 CHAPTER 10. MODULES AND ALGEBRAS
A quick calculation shows that the characteristic polynomial is cT (x) = (x − 2)2 (x2 − x + 3), where
x2 − x + 3 is irreducible over R. By Proposition 10.8.6, we deduce that one of two possibilities occur:
(1) T has only one invariant factor, namely cT (x), or (2) it has 2 invariant factors a1 (x) = x − 2 and
a2 (x) = (x − 2)(x2 − x + 3).
Denote by V the R[x]-module of R4 equipped with the linear transformation T . We use row and
column operations on the 4 × 4 matrix xI − A to get the Smith normal form. Keeping track of the
row operations allows us to calculate the matrix S defined in Proposition 10.6.4. Using technology
(or doing the calculations by hand) we find that the Smith normal form of xI − A is
1 0 0 0
0 1 0 0
. (10.28)
0 0 x − 2 0
3 2
0 0 0 x − 3x + 5x − 6
∼ R[x]/(x − 2) ⊕ R[x]/(x3 −
Note that x3 − 3x2 + 5x − 6 = (x − 2)(x2 − x + 3), so we deduce that V =
2
3x + 5x − 6). Furthermore, the left-reducing matrix S is
1
4 0 0 0
1x − 1 − 23 1 0
S= 6 2
1 0 0 −2
−x2 + x − 6 −2x + 2 3x − 6 3x2 − 6x + 12
and so
4 0 0 0
3
x − 6
2 −3
x − 34 x2 + 32 x − 3 − 21
S −1 = ,
−2 x−1 − 12 x2 + x − 2 − 31
2 0 − 12 0
which gives basis elements of Ker ϕ as a submodule of R[x]4 , with entries listed as column vectors.
Denote the columns of S −1 by ξ1 , ξ2 , ξ3 , ξ4 . Then from the Smith normal form (10.28), we explicitly
have R[x]-module
V = (R[x])4 / Ker ϕ
∼
= (R[x]ξ1 ⊕ R[x]ξ2 ⊕ R[x]ξ3 ⊕ R[x]ξ4 )/(R[x]ξ1 ⊕ R[x]ξ2 ⊕ R[x]a1 (x)ξ3 ⊕ R[x]a2 (x)ξ4 )
∼
= R[x]/(a1 (x)) ⊕ R[x]/(a1 (x)).
Notice that the module elements ξ1 and ξ2 become 0 in the quotient module. The summands
R[x]/(x − 2) and R[x]/(x3 − 3x2 + 5x − 6) are generated respectively by v1 = ϕ(ξ3 ) and v2 = ϕ(ξ4 ).
Note that we only use columns ξ3 and ξ4 of S because these correspond to the nonunit invariant
factors of xI − A. We calculate these vectors v1 , v2 ∈ R4 by
0 3
− 3 x2 + 3 x − 3
4 2
3 2 3 1 2 1 −6
v1 = ϕ − 1 x2 + x − 2 = − 4 A + 2 A − 3I e2 + − 2 A + A − 2I e3 − 2 Ie4 = −3
2
− 12 1
and
0 0
− 1 1 1 − 1
2 2
− 1 = − 2 Ie2 − 3 Ie3 = − 1 .
v2 = ϕ
3 3
0 0
The summand R[x]/(x − 2) is one-dimensional so it is Span(v1 ). Then summand R[x]/(x3 − 3x2 +
5x − 6) is three-dimensional so it is Span(v2 , x · v2 , x2 · v2 ). Setting
3 0 0 2
−6 − 1 −1 −4
M = v1 v2 xv2 x2 v2 = 2
−3 − 1 − 1 − 4 ,
3 3 3
1 0 0 1
10.8. APPLICATIONS TO LINEAR TRANSFORMATIONS 531
we calculate,
2 0 0 0
0 0 0 6
M −1 AM =
.
0 1 0 −5
0 0 1 3
This is the rational canonical form of T , which is the matrix of T with respect to the ordered basis
of R4 listed as the columns of M . 4
The previous example illustrates that knowing the characteristic polynomial is not sufficient to
determine the form of the invariant factor decomposition. Depending on the factorization of cT (x)
into irreducible polynomials in F [x], there is a finite number of possibilities for the rational canonical
form. This situation is not unlike determining the possible abelian groups of a given finite cardinality.
Example 10.8.9. Suppose that T : R4 → R4 is a linear transformation such that cT (x) = (x − 2)4 .
By Proposition 10.8.6, the invariant factors must multiply to cT (x). We list them here below along
with the rational canonical form for each case. As always, the minimal polynomial is largest (by
degree) of the invariant factors.
0 0 0 −16
1 0 0 32
a1 (x) = (x − 2)4
0 1 0 −24
0 0 1 8
2 0 0 0
0 0 0 8
(a1 (x), a2 (x)) = ((x − 2), (x − 2)3 )
0 1 0 −12
0 0 1 6
0 −4 0 0
2 2
1 4 0 0
(a1 (x), a2 (x)) = ((x − 2) , (x − 2) )
0 0 0 −4
0 0 1 4
2 0 0 0
2
0 2 0 0
(a1 (x), a2 (x), a3 (x)) = ((x − 2), (x − 2), (x − 2) )
0 0 0 −4
0 0 1 4
2 0 0 0
0 2 0 0
(a1 (x), a2 (x), a3 (x), a4 (x)) = ((x − 2), (x − 2), (x − 2), (x − 2))
0
0 2 0
0 0 0 2
This list shows that there are 5 similarity classes of matrices in M4×4 (R) with cT (x) = (x − 2)4 .
Only the last canonical form corresponds to matrices that are diagonalizable. 4
7. How many similarity classes are there for matrices in M6×6 (Q) that have the characteristic polynomial
of (x − 3)3 (x − 4)3 ? Give the rational canonical form for each one.
8. Find the number of conjugacy classes of GL3 (F2 ).
9. Find the number of conjugacy classes of GL2 (F3 ).
In Exercises 10.8.10 through 10.8.14, consider the linear transformation T : F n → F n whose matrix with
respect to the standard basis is the given matrix A. Find the rational canonical form of T and a basis B on
F n so that the matrix of T with respect to B is the rational canonical form.
1 1 1 2 1 1
10. F = R: a) A = 1 1 1; b) A = 1 2 1.
1 1 1 1 1 2
13 −25 20 18 −76 32
11. F = R: a) A = −4 13 −8 ; b) A = −4 21 −8 .
−10 25 −17 −16 76 −30
3 −1 0 0
4 0 0 0
12. F = Q and A = .
7 −3 5 1
−23 13 −14 −2
0 1 4
13. F = F5 and A = 2 1 3.
3 1 3
−3 −10 25 −10
1 4 −5 2
14. F = R and A = −2 −4
.
12 −4
−3 −6 15 −4
15. Let n ≥ 2. Find the rational canonical form of the n × n matrix that consists of 1s in all entries except
for 2s down the diagonal.
10.9
Jordan Canonical Form
10.9.1 – The Jordan Canonical Form
By directly following the presentation in the previous section, it may seem reasonable to study a
canonical form for matrices induced from an elementary divisor expression
V ∼
= F [x]/(p1 (x)k1 ) ⊕ F [x]/(p2 (x)k2 ) ⊕ · · · ⊕ F [x]/(pm (x)km ),
where each pi (x) is a monic irreducible polynomial. Recall that the polynomial
where the a1 (x), a2 (x), . . . , as (x) are the nonunit monic invariant factors. The Jordan canonical form
takes this approach in the special case when the characteristic polynomial of the linear transformation
T : V → V splits completely in the field F . In particular, when F is algebraically closed, this occurs
for all linear transformations.
From now on, suppose that T is a linear transformation on a vector space V of dimension n over
a field F . Suppose further that cT (x) splits completely into
Counted with algebraic multiplicity, this means that T has n = deg cT (x) eigenvalues. From the
elementary divisor form of the Fundamental Theorem of Finitely Generated Modules of PIDs, as an
F [x]-module, V equipped with T has the decomposition
V ∼
= F [x]/((x − λ1 )α(1)1 ) ⊕ F [x]/((x − λ1 )α(1)2 ) ⊕ · · · ⊕ F [x]/((x − λ1 )α(1)`(1) )
⊕ F [x]/((x − λ2 )α(2)1 ) ⊕ F [x]/((x − λ2 )α(2)2 ) ⊕ · · · ⊕ F [x]/((x − λ2 )α(2)`(2) )
(10.29)
⊕ ···
⊕ F [x]/((x − λs )α(s)1 ) ⊕ F [x]/((x − λs )α(s)2 ) ⊕ · · · ⊕ F [x]/((x − λs )α(s)`(s) ),
where for each i with 1 ≤ i ≤ s, the powers satisfy α(i)1 ≥ α(i)2 ≥ · · · ≥ α(i)s(i) ≥ 1 and α(i)1 +
α(i)2 + · · · + α(i)s(i) = ki . Consequently, for each i, the finite sequence (α(i)1 , α(i)2 , . . . , α(i)s(i) ) is
a partition of ki , the algebraic multiplicity of the eigenvalue λi . (See Sections 4.5.3 and 4.5.4 for a
few comments on partitions of integers.) We will refer to the partition associated to an eigenvalue
λ as α(λ) so that the content |α(λ)| is the algebraic multiplicity of λ in cT (x). The summand
F [x]/((x − λi )α(i)1 ) ⊕ F [x]/((x − λi )α(i)2 ) ⊕ · · · ⊕ F [x]/((x − λi )α(i)`(i) )
is called the (x − λi )-primary component of V .
We point out that from knowing only the characteristic polynomial cT (x), we can only tell the
content |α(λ)| and not the actual partition α(λ).
The main difference between the Jordan canonical form and the rational canonical form consists
of the basis (as a vector space over F ) used for a summand F [x]/(x − λ)m . In the rational canonical
form, we used the ordered basis (1, x, x2 , . . . , xm−1 ); the Jordan canonical form uses the ordered
basis m−1 2
B = (x − λ) , . . . , (x − λ) , x − λ, 1 . (10.30)
The linear transformation T corresponds to the F [x]-action of x. As usual, the action of x on these
elements of B is
m−1 m−1 m m−1 m−1
x · (x − λ) = (x − λ + λ) · (x − λ) = (x − λ) + λ(x − λ) = λ(x − λ)
m−2 m−2 m−1 m−2
x · (x − λ) = (x − λ + λ) · (x − λ) = (x − λ) + λ(x − λ)
.. (10.31)
.
1 2
x · (x − λ) = (x − λ + λ) · (x − λ) = (x − λ) + λ(x − λ)
x · 1 = (x − λ + λ) · 1 = (x − λ) + λ.
Therefore, with respect to the basis B, the matrix of T is the m × m matrix
λ 1 0 ··· 0 0
0
λ 1 ··· 0 0
0 0 λ ··· 0 0
Jλ,m = . .. .
.. .. .. ..
.. . . . . .
0 0 0 ··· λ 1
0 0 0 ··· 0 λ
Definition 10.9.1
Let m be a positive integer. A matrix of the form Jλ,m is called a Jordan matrix or a
Jordan block .
More generally, if α is a partition of length `(α) = r, then we denote by Jλ,α the block diagonal
matrix
0 0
Jλ,α1 ···
0 Jλ,α2 · · · 0
Jλ,α = . .. .
. ..
.. .. . .
0 0 · · · Jλ,αr
534 CHAPTER 10. MODULES AND ALGEBRAS
0 0
Jλ1 ,α(1) ···
0 Jλ2 ,α(2) · · · 0
.. .
.. .. ..
. . . .
0 0 ··· Jλs ,α(s)
Furthermore, two linear transformations S, T : V → V are similar if and only if with respect
to some bases, they are represented by the same above matrix.
Proof. The block diagonal matrix above arises from the union of the bases of the form B on each
summand F [x]/(x − λ)m . The result about similarity comes from the uniqueness result of Theo-
rem 10.7.5 and the fact that V equipped with T is F [x]-module isomorphic to V equipped with S if
and only if S and T are similar.
Definition 10.9.3
The form described in Theorem 10.9.2 is called the Jordan canonical form of T . Also, if A
is a matrix in Mn×n (F ), then the Jordan canonical form of A is the Jordan canonical form
of the linear transformation on F n defined by ~x 7→ A~x.
If F is an algebraically closed field, Theorem 10.9.2 affirms that two matrices in Mn×n (C) are
similar if and only if they have the same Jordan canonical form.
Any summand of the form F [x]/(x − λ)m in V as depicted in (10.29) is a T -stable subspace of
V . With respect to the ordered basis B given in (10.30) the restriction of T on M = F [x]/(x − λ)m
is the Jordan block Jλ,m , which has the single eigenvalue λ with algebraic multiplicity m. However,
m−1
on this summand Ker(T − λI) = Span((x − λ) ) and hence, on T restricted to M , the geometric
multiplicity of λ is 1.
Example 10.9.4. To give some examples of what the Jordan canonical form can look like, consider
the following matrices. Most are in Jordan canonical form, though one is not. For those that are,
we list the eigenvalues λ and associated partitions α(λ). For matrices that are in Jordan canonical
form, we have shown lines in the matrices to emphasize the block diagonal structure of each matrix.
matrix eigenvalues (w/ alg. mult.) eigenspaces
2 1 0 0
0 2 1 0
0 0 2
1
λ = 2; α(2) = (4) E2 = Span(e1 )
0 0 0 2
2 1 0 0
0 2 1 0
0 0 2
0
λ = 2; α(2) = (3, 1) E2 = Span(e1 , e4 )
0 0 0 2
2 1 0 0
0 2 0 0
0 0 2
1
λ = 2; α(2) = (2, 2) E2 = Span(e1 , e3 )
0 0 0 2
2 1 0 0
0 2 0 0
0 0 2
0
λ = 2; α(2) = (2, 1, 1) E2 = Span(e1 , e3 , e4 )
0 0 0 2
10.9. JORDAN CANONICAL FORM 535
matrix
eigenvalues (w/ alg. mult.) eigenspaces
2 0 0 0
0 2 0 0
0 0 2
0
λ = 2; α(2) = (1, 1, 1, 1) E2 = Span(e1 , e2 , e3 , e4 )
0 0 0 2
3 0 0 0
0 3 0 0
0
α(3) = (1, 1); α(2) = (2) E2 = Span(e3 ), E3 = Span(e1 , e2 )
0 2 1
0 0 0 2
1 0 0 0
0 2 0 0
0 0 3
1
1,2,3,4 not in Jordan canonical form
0 0 0 4
Only the last matrix is not in Jordan canonical form. We can deduce that 1, 2, 3, and 4 are the
eigenvalues but the 1 in position (3, 4) makes the lower right 2 × 2 submatrix not a Jordan block.4
Definition 10.9.5
Let T : V → V be a linear transformation.
Proposition 10.9.6
If the sequence v1 , v2 , . . . , vk is a chain of generalized eigenvectors for a given eigenvalue,
then {v1 , v2 , . . . , vk } is linearly independent.
Let m be the least index such that (c1 , c2 , . . . , ck ) = (d1 , d2 , . . . , dm , 0, . . . , 0) is a solution to (10.32).
By the minimality, this requires that dm 6= 0. However,
(A − λ)m−1 (d1 v1 + d2 v2 + · · · + dm vm ) = 0
=⇒ d1 (A − λ)m−1 v1 + d2 (A − λ)m−1 v2 + · · · + dm (A − λ)m−1 vm = 0.
All the terms cancel except for the last one. Hence, dm v1 = ~0. Since v1 is nonzero, we deduce that
dm = 0. This leads to a contradiction, so {v1 , v2 , . . . , vk } is linearly independent.
Remark 10.9.7. On the other hand, the opposite situation when V ∼ = (F [x]/(x − λ))k corresponds
to when the algebraic multiplicity of λ is equal to the geometric multiplicity. In this case, any basis
of the eigenspace of λ is a basis of V . 4
Proposition 10.9.8
Suppose that for 1 ≤ i ≤ r, the lists (vλi ,1 , . . . , vλi ,ki ) are chains of generalized eigenvectors
of distinct eigenvalues λi . The union of the lists is a linearly independent set of vectors.
Proposition 10.9.9
For each eigenvalue λ of T , the associated partition α(λ) is the conjugate of the partition
β, where
(Note that βr is the last part of the partition β if dim(Eλ,r+1 ) = dim(Eλ,r ).)
Proof. Let λ be an eigenvalue of T . Consider the restriction of T to the generalized eigenspace Eλ,∞ .
As an F [x]-submodule, Eλ,∞ is isomorphic to
This means that that partition associated to λ is α = (α1 , α2 , . . . , α` ) with |α| = k, the algebraic
multiplicity of λ. The restriction of T to Eλ,∞ is the same as the action of x on the primary
component in (10.33). Furthermore, with respect to the Jordan basis on each summand of (10.33),
the matrix of T is the Jordan matrix Jλ,α .
Then by (10.31), the eigenspace Eλ is
α1 −1 α2 −1 α` −1
Eλ = Span (x − λ , 0, 0, . . . , 0), (0, x − λ , 0, . . . , 0), . . . , (0, 0, . . . , 0x − λ ) .
10.9. JORDAN CANONICAL FORM 537
In particular, the geometric multiplicity of λ is equal to the length of the partition α, which is also
α10 , the first part of the conjugate partition to α. We also notice that
(T − λI)(Eλ,∞ ) ∼
= F [x]/(x − λ)α1 −1 ⊕ F [x]/(x − λ)α2 −1 ⊕ · · · ⊕ F [x]/(x − λ)α` −1 .
If αi = 1, then the summand F [x]/(x − λ)αi −1 is the trivial module. Now the conjugate of the
partition (α1 − 1, α2 − 1, . . . , α` − 1) is the partition of α0 but with the first row removed. For
example, if α = (5, 2, 2, 1) we have
• • • •
•
•
•
α= • and α0 = .
Thus,
dim(T − λI)(Eλ,∞ ) = α20 + α30 + · · · + α`0 = k − α10 .
By the first isomorphism theorem, (x−λ)(Eλ,∞ ) ∼ = Eλ,∞ / Ker(x−λ) as F [x]-modules. Furthermore,
x − λ acts on (x − λ)(Eλ,∞ ) as a linear transformation with Jordan canonical form of eigenvalue λ
and partition (α1 − 1, α2 − 1, . . . , α` − 1). By a repeated application of this result, we deduce that
Since (T − λI)(Eλ,∞ ) ∼
= Eλ,∞ / Ker(T − λI)j , then dim Eλ,j = dim Ker(T − λI)j = α10 + α20 + · · · + αj0 .
0
We deduce that αj = dim Eλ,j − dim Eλ,j−1 .
Example 10.9.10. We perform all the following calculations assisted by a computer algebra system.
Consider the 5 × 5 matrix
−2 3 2 −2 2
−2 3 1 −1 1
−7
A= 5 5 −3 4.
−3 2 1 1 2
−2 2 1 −1 3
We calculated the characteristic polynomial to be cA (x) = (x − 2)5 . Then the reduced row echelon
forms of A − 2I and (A − 2I)2 are
− 21 − 12 − 12 − 12
1 0 0 0 −1 1
0 1 0 0 0 0 0 0 0 0
0
0 1 −1 −1 and 0
0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
with (A − 2I)3 = 0. We see that the generalized eigenspaces are dim E2,1 = 2, dim E2,2 = 4, and
dim E2,3 = 5. Thus, by Proposition 10.9.9, α0 = (2, 2, 1) so the partition associated to λ = 2 is the
conjugate α = (3, 2). Hence, the Jordan canonical form of A is
2 1 0 0 0
0 2 1 0 0
J2,(3,2) 0
= 0 2 0 0.
0 0 0 2 1
0 0 0 0 2
4
538 CHAPTER 10. MODULES AND ALGEBRAS
10.10
Applications of the Jordan Canonical Form
10.10.1 – Finding a Basis for the Jordan Canonical Form
Though the Fundamental Theorem for Finitely Generated Modules over a PID allowed us to find
the Jordan canonical form of a linear transformation T : V → V (whose characteristic polynomial
splits completely over the base field), it did not provide a constructive method to find a basis B with
respect to which T has a matrix in Jordan canonical form. We consider this problem now.
Since we need to perform calculations, we must express vectors in coordinates with respect to
some basis. For the remainder of this section, suppose that V has a fixed basis E = {e1 , e2 , . . . , en }.
Suppose that with respect to E, the matrix of the linear transformation T in question is A. Also,
we consider vectors v ∈ V ∼ = F n as expressed in coordinates with respect to E.
Propositions 10.9.6 and 10.9.8 as well as Remark 10.9.7 offer a strategy to find a desired basis.
If Bλ is a basis of the generalized eigenspace of λ with respect to which T restricted to Eλ,∞ is the
Jordan submatrix of λ, then [
B= Bλ
λ
is a basis of V with respect to which the matrix of T is in Jordan canonical form.
We first calculate the characteristic polynomial of A and in particular determine the eigenvalues
and their algebraic multiplicities. Second, we need to calculate the geometric multiplicities of each
eigenvalue λ, i.e., the dimension of the eigenspaces dim Eλ .
We described the algorithms in linear algebraic terms according to a few cases applied to each
eigenvalue λ.
Case 1: dim Eλ is equal to the algebraic multiplicity. In this case, the submatrix pertaining
to λ of the Jordan canonical form is Jλ,(1,1,...,1) = λI. Also, the eigenspace Eλ is equal to the
generalized eigenspace. Then any basis for Eλ will do, which is any basis of Ker(A − λI). This
is a straightforward calculation that uses the reduced row echelon form.
(We do not give an example for this case since it is a common exercise in linear algebra.)
Case 2: The geometric multiplicity of λ is 1. In this case, the submatrix pertaining to λ of
the Jordan canonical form is the single Jordan block Jλ,k , where k is the algebraic multiplicity
of λ. By Proposition 10.9.6, a basis of the generalized eigenspace of λ will consist of a single
chain (v1 , v2 , . . . , vk ) of generalized eigenvectors.
• The vector v1 may be chosen as any nonzero vector in Eλ , i.e., any nonzero solution to
(A − λI)u = 0.
• For all i ≥ 2, the vector vi satisfies T (vi ) = λvi + vi−1 . Hence, select vi as any solution to
(A − λI)u = vi−1 .
• The inductive algorithm terminates when i = k, in which case, (A−λI)u = vk has no solutions.
Example 10.10.1. Suppose that the matrix A is
−3 0 1
A = 1 −1 −1 .
0 1 2
The characteristic polynomial is cA (x) = x3 + 6x2 + 12x + 8 = (x + 2)3 . Hence, the only eigenvalue
of A is −2 with an algebraic multiplicity of 3. We calculate the eigenspace E−2 by
−1 0 1 1
(A + 2I)u = 0 ⇐⇒ 1 1 −1 u = 0 ⇐⇒ u ∈ Span 0 = E−2 .
0 1 0 1
540 CHAPTER 10. MODULES AND ALGEBRAS
Since the geometric multiplicity is 1, we are in Case 2. The Jordan canonical form will be J−2,3 .
Call v1 the listed generating vector of E−2 . To find v2 , a generalized eigenvector of rank 2, we solve
−1 0 1 1 −1
(A + 2I)u = v1 ⇐⇒ 1 1 −1 u = 0 ⇐⇒ u ∈ 1 + E−2 .
0 1 0 1 0
Call v3 = (1, 0, 0)> . The list (v1 , v2 , v3 ) of vectors (expressed in coordinates with respect to E), is a
chain of generalized eigenvectors for −2 and must be a basis. Setting
1 −1 1
M = v1 v2 v3 = 0 1 0 ,
1 0 0
it is an easy calculation to check that
−2 1 0
M −1 AM = J−2,3 = 0 −2 1 .
0 0 −2
Hence, A = M J−2,3 M −1 as desired. 4
The following example involves a combination of Case 1 and Case 2, one for each eigenvalue.
Example 10.10.2. Consider the linear transformation T : V → V , with respect to which T (u) =
Au with
−2 6 −3
A = −2 6 −2 .
0 2 1
The characteristic polynomial is cA (x) = x3 − 5x2 + 8x − 4 = (x − 1)(x − 2)2 . We determine the
eigenspaces as
−3 6 −3 −1
E1 : (A − I)u = 0 =⇒ −2 5 −2 u = 0 =⇒ u ∈ Span 0 = E1
0 2 0 1
−4 6 −3 0
E2 : (A − 2I)u = 0 =⇒ −2 4 −2 u = 0 =⇒ u ∈ Span 1 = E2 .
0 2 −1 2
We call v1 the column vector given as the generator of E1 . Since the algebraic multiplicity of the
eigenvalue of 1 is 1, then the Jordan submatrix associated to 1 is J1,1 . Since the algebraic multiplicity
of the eigenvalue 2 is 2, while the geometric multiplicity is 1, the Jordan submatrix associated to 2
is J2,(2) . Hence, the Jordan canonical form of A is
1 0 0
J = 0 2 1 ,
0 0 2
(though we have the freedom to change the order of the Jordan blocks). We need to perform the
steps for each Jordan block on v2 , so we need to solve
−4 6 −3 0 3/2
(A − 2I)u = v2 =⇒ −2 4 −2 u = 1 =⇒ u = 1 + E2 .
0 2 −1 2 0
10.10. APPLICATIONS OF THE JORDAN CANONICAL FORM 541
Note that r1 and r2 must solve r1 + r2 = 0. This implies that there is only a subspace of E3 of
dimension 1 that involve generalized eigenvectors of rank 2. Hence, we will be able to find one chain
of length 1 and one of length 3, and therefore the Jordan canonical form of A is the first of the
two possibilities listed in (10.34). From (10.36), it is straightforward to deduce that a generalized
eigenvector of rank 2 has the form
−1 −1 0
0 2 −2
u = s1
2 + s2 0 + s3 0
0 2 0
with s3 6= 0. (If s3 = 0, then the vector in E3,2 is actually in E3,1 , so is only of rank 1.)
We proceed to find a generalized eigenvector of rank 3. We need to solve
−1 −1 0
0 2 −2
(A − 3I)u = s1
2 + s2 0 + s3 0 .
0 2 0
From (10.37), it is straightforward to deduce that a generalized eigenvector of rank 3 has the form
−1 −1 −1
0 2 0
u = t1
2 + t2 0 + t3 0
0 2 0
with t3 6= 0.
10.10. APPLICATIONS OF THE JORDAN CANONICAL FORM 543
0 0 4 4 2
Finally, to find the fourth basis vector, we simply need a chain of length 1 (i.e., an eigenvector) that
is not in Span(v1 , v2 , v3 ). In other words, we need an eigenvector v4 such that E3 = Span(v1 , v4 ).
The vector v4 = (−1, 0, 2, 0)> will do. Setting,
M = v1 v2 v3 v4 ,
it is easy (with a computer) to verify that
3 1 0 0
−1
0 3 1 0
M AM =
0
.
0 3 0
0 0 0 3
This confirms that (v1 , v2 , v3 , v4 ) is an ordered basis, with respect to which the linear transformation
T is in Jordan canonical form. 4
10.10.3 – Applications
Let A ∈ Mn (F ), where F is a field. If A is diagonalizable, then there exists an invertible matrix
P ∈ Mn (F ) such that P −1 AP is a diagonal matrix D consisting of the eigenvalues λ1 , λ2 , . . . , λn
down the diagonal. Then we can find a formula for powers of A with
j
λ1 0 · · · 0
0 λj · · · 0
j −1 j j −1 2 −1
A = (M DM ) = M D M = M . M .
.. .. . . ..
. . .
0 0 ··· λjn
These formulas for the power and the exponential of a matrix are predicated on the fact that A is
diagonalizable. Not every matrix is diagonalizable. However, every square matrix does have a unique
Jordan canonical form when considered with coefficients in E, the splitting field of the characteristic
polynomial cA (x) over F . Exercises 10.9.13 and 10.9.15 establish formulas for the power and the
exponential function of Jordan block matrices. These formulas are useful for finding formulas for
various applications (e.g., difference equations, and systems of linear ordinary differential equations.)
n 0 1 2 3 4 5 6 7
fn 2 3 5 5 9 −11 1 −195
We can find a formula for this sequence in the following way. Define a new sequence {gn } in Z3 by
fn
gn = fn+1 .
fn+2
An explicit formula for gn , and hence for fn , comes from gn = An g0 , where A is the 3 × 3 coefficient
matrix appearing above. In order to calculate this, we must find a formula for An . The characteristic
polynomial of A is x3 − x2 − 8x + 12 = (x − 2)2 (x + 3). Whether doing the work by hand or using
technology, we can find that A = M JM −1 , where
−3 0 0 4 −30 21
1 −12 −60
J = 0 2 1 and M= 12 .
25
0 0 2 36 −120 −36
Using Exercise 10.9.13 and the block diagonal structure of J, we deduce that
(−3)n 0
4 −30 21 0 12 −12 3 2
1 1
gn = M J n M −1 g0 = −12 −60 12 0 2n n2n−1 0 −3 −1 3
25 2
36 −120 −36 0 0 2n 12 −2 −2 5
(−3)n − 20n2n−1 + 49 · 2n
1
= −3(−3)n − 40n2n−1 + 78 · 2n .
25
9(−3)n − 80n2n−1 + 116 · 2n
1 n
A formula for fn is the first entry of gn so fn = 25 ((−3) − 20n2n−1 + 49 · 2n ). 4
In Exercises 10.10.1 through 10.10.6, for the following matrix A, find the Jordan canonical form J of A
along with a matrix M such that A = M JM −1 . If the field is not stated, assume that it is C.
−1 −2 2
1. A = 2 4 −1.
−3 −2 4
10.10. APPLICATIONS OF THE JORDAN CANONICAL FORM 545
−1 −2 2
2. A= 2 4 −1.
−3 −2 4
1 1 0
3. A = 1 1 0 over the field F2 .
0 1 1
0 −1 1 0
1 0 0 1
4. A= .
0 0 0 −1
0 0 1 0
λ 1 1 1
0 λ 1 1
5. A= 0
.
0 λ 1
0 0 0 λ
−3 7 9 −6
5 −10 −12 9
6. A= .
−1 1 3 −1
7 −16 −17 14
7. Use Exercise 10.9.13 to give a formula for
n
6 −7 −4
1 0 −1 .
3 −5 −1
10. Suppose that a parametric curve ~ x(t) in Rn satisfies the system of ordinary differential equations
0
~x (t) = A~x(t), where A is an n × n matrix with coefficients in R. Prove that ~ ~ where C
x(t) = eAt C, ~ is
a constant vector, is a solution to the differential equation. Also observe that ~ ~
x(0) = C.
11. Use Exercises 10.10.10 and 10.9.15 to solve the following system of differential equations
0
x (t) = −2x(t) + y(t) + 2z(t)
y 0 (t) = −3x(t) + 4y(t) + 2z(t)
0
z (t) = 0x(t) − 3y(t) + z(t)
subject to the initial condition (x(0), y(0), z(0)) = (2, 1, 1). [Use a CAS to assist calculations.]
12. Use Exercise 11.2.19 to deduce that if a matrix A ∈ M4 (Q) has the characteristic polynomial (x2 −2)2 ,
then it can only have one of the following two Jordan canonical forms
√ √
2 √0 0 0 2 √1 0 0
0 2 0 0 0 2 0 0
√ or √ .
0 0 − 2 0
√
0 0 − 2 1
√
0 0 0 − 2 0 0 0 − 2
546 CHAPTER 10. MODULES AND ALGEBRAS
10.11
A Brief Introduction to Path Algebras
We conclude the chapter with an extended example that involves both algebras and modules. The
topic of path algebras has garnered interest in mathematical research in recent years.
In this section we only have room to describe the algebraic structures of a path algebra and
modules over path algebras. The reader should understand that, like other “brief introductions” in
this book, this section scratches the surface of this topic.
Definition 10.11.1
A quiver is a quadruple Q = (V, E, h, t), where V and E are sets and h and t are func-
tions h, t : V → E. The set V is called the set of vertices, the elements of E are called
(directed) edges or arrows, and the functions h and t are called the head and tail functions,
respectively.
A quiver is also known as a directed graph. Though the terminology of “directed graph” is more
common in the context of combinatorics, the term “quiver” is common in algebra.
Let K be a field and Q a quiver. We construct the path algebra K[Q] of the field K over Q as
follows. For each vertex v ∈ V , denote by ev a symbol representing the stationary path at the vertex
v. A path p is either ev for some v ∈ V or a finite expression p = an · · · a2 a1 , where a1 , a2 , . . . , an ∈ E
are arrows such that h(ai ) = t(ai+1 ) for i = 1, 2, . . . , n − 1. In intuitive terms, a path is a sequence of
arrows that are strung together head-to-tail. Caveat: Like functions, we read the arrows in a path
p = a1 a2 · · · an from right to left so that an is the first arrow in p while a1 is the last arrow.
We extend the head and tail functions from E to all paths as follows. If h(ev ) = t(ev ) = ev for
all stationary paths with v ∈ V . If p = an · · · a2 a1 , then h(p) = h(an ) and t(p) = t(a1 ).
The elements in the path algebra K[Q] are symbolic linear combinations of the form
c1 p1 + c2 p2 + · · · + cn pn
where ci ∈ K and pi is a path of the quiver Q. Note that the set of paths form a basis for the
vector space K[Q]. The product on K[Q] is defined on basis elements by the concatenation of paths,
namely
(
an · · · a2 a1 bm · · · b2 b1 if h(bm ) = t(a1 )
(an · · · a2 a1 ) · (bm · · · b2 b1 ) =
0 otherwise,
(
an · · · a2 a1 if t(a1 ) = v
(an · · · a2 a1 ) · ev =
0 otherwise,
(
an · · · a2 a1 if h(an ) = v
ev · (an · · · a2 a1 ) =
0 otherwise,
(
ev if v = w
ev · ew =
0 otherwise.
Finally, the product on K[Q] is determined by extending by distributivity and the above products
on paths.
10.11. A BRIEF INTRODUCTION TO PATH ALGEBRAS 547
4 1 5
a a d
c
1 a 2 b 3 1 b d 3 3 c 4
f f
c 2
d g
b
5 2 6
Q1 Q2 Q3
Example 10.11.2. Consider the quiver Q1 as depicted in Figure 10.2. This quiver has 5 stationary
paths e1 , e2 , e3 , e4 , e5 . Each of the directed edges a, b, c, d is a path. The quiver also has three paths
of length 2 (ba, cb, and db) and two paths of length 3 (cba and dba). Recall that, like functions,
we read the sequence of arrows from right to left. So the path algebra K[Q] is a vector space of
dimension 14 over K. Suppose that α, β ∈ R[Q] are
α = e2 + 3ba − 2c,
β = 5b − a + ba + 3e4 .
α + β = e2 + 3e4 − a + 5b − 2c + 4ba,
αβ = 5e2 b − e2 a + e2 ba + 3e2 e4 + 15bab − 3baa
+ 3baba + 9bae4 − 10cb + 2ca − 2cba − 2ce4
= 0 − a + 0 + 0 + 0 + 0 + 0 + 0 − 10cb + 0 − 2cba + 0
= −a − 10cb − 2cba,
βα = 5be2 + 15bba − 10bc − ae2 − 3aba + 2ac
+ bae2 + 3baba − 2bac + 3e4 e2 + 9e4 ba − 6e4 c
= 5b + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 − 6c
= 5b − 6c. 4
Though the quiver Q1 in Figure 10.2 produces a path algebra that is finite dimensional, the
quiver Q2 has a path algebra K[Q2 ] that is infinite dimensional. The paths in the path algebra
K[Q2 ] are
ea , e2 , e3 , a, b, c, d, da, db, dc, f n d, f n da, f n db, f n dc
for any positive integer n. The path f n intuitively means traveling around the f loop path n times.
In particular, a loop-edge is not idempotent.
In a similar manner, the path algebra for K[Q3 ] is also infinite dimensional because of the loop
f ed. Though, the there is no loop-edge, the path f ed is a loop and hence for all positive n, the path
(f ed)n consists of going around the loop, starting at vertex 4, n times.
Proposition 10.11.3
Let Q = (V, E, h, t) be a quiver where V is a finite set and K a field. The path algebra
K[Q] is a unital associative algebra with multiplicative unit
X
1= ev . (10.38)
v∈V
548 CHAPTER 10. MODULES AND ALGEBRAS
Proof. Note that for all paths p, the product ev p = 0 unless v = h(p), in which case eh(p) p = p.
Similarly, pev = 0 if v 6= t(p) and is p if v = t(p). Let α ∈ K[Q] be a linear combination of paths
α = c1 p1 + c2 p2 + · · · + cn pn . Then
! n X n n
X X X X
ev α= ci ev pi = ci eh(pi ) pi = ci pi = α.
v∈V i=1 v∈V i=1 i=1
P
Multiplication by α on the left also yields α. Thus, v∈V ev is an identity in K[Q].
That K[Q] is associative follows from how the product on paths consists of concatenation. Sec-
tion 3.8.3 discussed in detail how the operation of concatenation is associative.
Let Q be a quiver and let K be a field. The product on K[Q] equips the path algebra with the
structure of a ring that is generally not commutative. We saw in earlier sections what data is
necessary to describe modules for various rings, e.g., F [x]-modules (Example 10.3.13) or group ring
modules (Example 10.3.14). In a similar way, we explore the data for a left K[Q]-module.
Let Q = (V, E, h, t) be a quiver with V finite and let W be a left K[Q]-module. By definition,
W must be an abelian group. For all vertices v, denote by Wv the subset ev W . The subsets Wv are
not submodules but just subgroups of W . By (10.38), for all m ∈ W ,
!
X X
m=1·m= ev ·m= ev · m.
v∈V v∈V
Thus, the intersection of distinct Wv subgroups is the trivial subgroup {0}, so W is the direct sum
M
W = Wv .
v∈V
Because ev is idempotent, Kev is a subring of K[Q] that is isomorphic to the field K. Hence,
since modules over fields are vector spaces, the action of the subring Kev on Wv equips Wv with
the structure of a vector space over K.
For each arrow a ∈ E, we have as a product of paths, a = aet(a) . Consequently, a · Wv = {0} for
all vertices v unless v = t(a). A priori, a · W might be any subset of W but since a · W = (eh(a) a) · W ,
then a · W is a subset of Wh(a) . Furthermore, by linearity and distributivity properties for modules,
a acts as a linear transformation ϕa : Wt(a) → Wh(a) .
The action of any path p = an · · · a2 a1 on M consists of the linear transformation
Finally, the action of any element in K[Q] follows from the action of any path and extending by
linearity to all of W . We have shown the following.
10.11. A BRIEF INTRODUCTION TO PATH ALGEBRAS 549
Proposition 10.11.4
The data of a K[Q]-module W consists of a pair ({Wv }v∈V , {ϕa }a∈E ) where Wv is a vector
space over K for each vertex v ∈ V and ϕa : Wt(a) → Wh(a) is a linear transformation for
each arrow a ∈ E. The module W is the direct sum (as vector spaces over K) of the Wv
for all v ∈ V and the action of K[Q] satisfies:
The data involving vector spaces as described by Proposition 10.11.4 motivates the alternate
terminology for K[Q]-modules: quiver representations.
Example 10.11.5. Consider the quiver Q1 shown in Figure 10.2. A R[Q1 ]-module consists of a
R-vector spaces Wv associated to each vertex v of the quiver and linear transformations ϕa : Wt(a) →
Wh(a) . It is possible to depict a module with a diagram.
Consider the following R[Q1 ]-module W , in which we depict the linear transformations as given
with respect to the standard bases.
R
1 −3
−1 3 6
1 1 2
R R2 R2
4 5
An element of the module is a 5-tuple of vectors in R ⊕ R2 ⊕ R2 ⊕ R ⊕ R, with each component being
a vector space attached to a given vertex. Take for example w = (2, 32 , −1 3 , 3, 5) and consider the
element α ∈ R[Q1 ] given by α = 3e1 + 2e3 + b − 5db. Then the action α · w is
It is important to observe in Proposition 10.11.4 how modules of path algebras generalize many
situations in linear algebra. For example, everything in linear algebra about properties of a linear
transformation T : V → W between vector spaces over a field K falls under the purview of modules
of the path algebra K[Q] where Q is the simple quiver.
1 a 2
Qarrow :
Similarly, the study of properties of a linear transformation T from a vector space V to itself
corresponds to studying K[Q]-modules for the simple loop quiver.
550 CHAPTER 10. MODULES AND ALGEBRAS
1
Qloop : a
For this latter quiver Qloop , the set {e1 , a, a2 , a3 , . . .} is a basis of the path algebra K[Q]. In the
algebra K[Q], since there is only one vertex, the ring identity is just the stationary path e1 . Fur-
thermore, by how elements multiply, we see that K[Q] is isomorphic to the polynomial ring K[x]
simply by mapping a to x. This recovers the earlier result that a left K[x]-module is defined by the
data of a vector space V equipped with a linear transformation T : V → V .
Because K[Q] contains the field K as the subring K · 1, and since f (cu) = cf (u) for all c ∈ K · 1,
then f is also a linear transformation between the direct sums. However, for all v ∈ V and all
u ∈ U , the linear transformation f satisfies f (ev u) = ev f (u). Now ev u is the projection of u onto
the Uv component of U and ev f (u) is the projection of f (u) onto the Wv component of W . Hence,
f maps Uv components to Wv components. In other words, f consists of a collection of linear
transformations fv : Uv → Wv for all v ∈ V . For each edge a ∈ E, a K[Q]-module homomorphism
also satisfies f (au) = af (u). Now,
(
ϕa (u) if u ∈ Ut(a)
a·u=
0 otherwise
and similarly for the action in W . Hence, the identity f (au) = af (u) translates into fh(a) (ϕa (u)) =
ψa (ft(a) (u)) for all u ∈ Ut(a) . If this identity holds, then by associativity of path concatenation and
by linearity of f , the f (αu) = αf (u) for all u ∈ U . We have shown the following characterization of
K[Q]-module homomorphisms.
Proposition 10.11.6
Let U = ({Uv }v∈V , {ϕa }a∈E ) and W = ({Wv }v∈V , {ψa }a∈E ) be two K[Q] modules for a
quiver Q = (V, E, h, t). A K[Q]-module homomorphism from U to W consists of a collection
of linear transformations fv : Uv → Wv for all v ∈ V satisfying
The requirement in (10.40) for the linear transformations is the same as saying that for all arrows
a ∈ V , the following diagram is commutative.
10.11. A BRIEF INTRODUCTION TO PATH ALGEBRAS 551
ϕa
Ut(a) Uh(a)
ft(a) fh(a)
Wt(a) Wh(a)
ψa
With the quiver Q1 in Figure 10.2, the overall diagram of linear transformations for a K[Q1 ]-
module homomorphism f : U → W has the following diagram in which all squares of functions are
commutative.
U1 ϕa ϕc U4
U2 ϕb
U3 ϕd
f1 f4
f2 U5
f3
W1 f5 W4
ψa W2 ψc
ψb W3
ψd
W5
0
The individual indecomposable modules are: K −→ 0, which means the module K ⊕ {0} with the
id
trivial map from K to {0}; K −→ K, which means the module K ⊕ K with the identity from K to
0
K; and 0 −→ K, which means the module {0} ⊕ K with the trivial inclusion map from {0} to K.
It is important to observe that though (K ⊕ {0}) ⊕ ({0} ⊕ K) ∼
= K ⊕ K as vector spaces over K,
0 0 0
(K −→ 0) ⊕ (0 −→ K) ∼
= (K −→ K),
id id
which is not isomorphic to (K −→ K). In particular, (K −→ K) is indecomposable.
As another result, let K be an algebraically closed field and consider Theorem 10.9.2 and the
existence of the Jordan canonical form. Let V be a vector space over K and let T : V → V be
a linear transformation. The fact that there exists a basis B of V such that the matrix of T is a
block diagonal matrix in which the blocks are Jordan blocks. To interpret this in modules over path
algebras, we use the quiver Qloop . Theorem 10.9.2 can be restated to say that every K[Qloop ]-module
decomposes into indecomposable modules of the form
Kn Jλ,n
Example 10.11.7. We propose to decompose the module W in Example 10.11.5 into indecompos-
able submodules. Consider first a submodule W 0 that contains the subspace (Span(1), 0, 0, 0, 0).
Applying the edge a we get a(Span(1), 0, 0, 0, 0) = (0, Span −1 1 , 0, 0, 0). Since W is closed under
0
the left action of the path algebra K[Q], then this new subspace the submodule W . Applying
is in
−1 3
b to this new subspace, we get b(0, Span 1 , 0, 0, 0) = (0, 0, Span 1 , 0, 0). Applying c to this gives
c(0, 0, Span 31 , 0, 0) = (0, 0, 0, (1 − 3) 31 , 0) = (0, 0, 0, 0, 0). Then, applying d to (0, 0, Span 31 , 0, 0)
we obtain the subspace (0, 0, 0, 0, Span(17)). This shows that W 0 is the following submodule.
{0}
1 −3
−1 3 6
1 1 2
−1 3
R Span 1 Span 1
4 5
Write the action of the edges with respect to appropriate bases on each of the component vector
spaces, we see that W 0 is isomorphic to the following K[Q]-module.
0
0
id id
R R R
id
R
3 6 2
. Thus, we have found another submodule W 00
In W2 , the kernel of is Span −1
1 2
10.11. A BRIEF INTRODUCTION TO PATH ALGEBRAS 553
{0}
−3
−1 3 6 1
1 1 2
2
{0} Span −1 {0}
4 5
{0}
0
0
0 0
0 R 0
0
0
Finally, since the kernel of the action of c as a subspace of W3 is Span 31 , we can take any other
subspace not in Span 31 , say Span 21 and consider the submodule containing this subspace. We
{0}
R
id
0 0
0 0 R
0
0
We have shown that the R[Q1 ]-module depicted in Example 10.11.5 decomposes into the three
indecomposable submodules W = W 0 ⊕ W 00 ⊕ W 000 . 4
2
c
a 5
1 e 4
d
b
3
554 CHAPTER 10. MODULES AND ALGEBRAS
a b c
1 2 3 4
1 2
Let α = −e1 + 2e2 + 3b, β = 4e2 + a − 3b, and γ = a + b. Calculate a) αβ; b) βα; e) αγβ; d) γ n (for
all positive integers n).
4. Consider the quiver Q from Exercise 10.11.2 and consider the R[Q]-module W depicted by
3
−2 2 1 0
2 −3 2 −1 2 3
R2 R R3 R2
Describe how the following R[Q] element α acts on the given element w ∈ W .
1
(a) α = 3e1 + 2ba − cb + 2e4 with w = 32 , 5, 2 , 51 .
3
0
(b) α = 2a + 3b + 4c with w = −1 , −4, 1 , 24 .
2
4
5. Let Q be the quiver consisting of one vertex v and n arrows, each of which is a loop on the vertex v.
Prove that the path algebra of Q with field K is the multivariable polynomial ring K[x1 , x2 , . . . , xn ].
6. Let Q = (V, E, h, t) be a quiver with V and E both finite. Prove that the following conditions are
equivalent:
(a) dimK K[Q] is finite;
P
(b) the element a∈E a is nilpotent;
(c) Q does not contain any cycles, i.e., paths (consisting possibly of a single arrow) p = an · · · a2 a1
such that h(an ) = t(a1 ).
7. For every quiver Q and any field K, prove that the groups of units in K[Q] consists of elements c1,
where c ∈ U (K).
8. Let Q = (V, E, h, t) be a quiver. Prove that a subset of the path algebra K[Q] is an ideal (two-sided)
if and only if it is {0} or of the form K[Q0 ], where Q0 is a connected component of Q. (A connected
component of Q consists of a subset V 0 ⊆ V and E 0 ⊆ E such that all arrows a ∈ E such that t(a) ∈ V 0
or h(a) ∈ V 0 are in E 0 .) In particular, deduce that if Q has only one connected component, then K[Q]
has only two ideals, the trivial one and itself.
9. Let Q be the following quiver.
3
b
a
2 1
c
4
K
0
1 1
0
K K2
1
1
a b
1 2 3
Prove that M is isomorphic to the direct sum of the following 5 indecomposable modules:
id id id
M∼
= (R −→ 0 −→ 0) ⊕ (R −→ R −→ 0) ⊕ (R −→ R −→ R)
id
⊕ (0 −→ R −→ R) ⊕ (0 −→ 0 −→ R).
a b
1 2 3
Prove that for any field K, the path algebra K[Q] has exactly 6 indecomposable modules and show
what these modules are.
10.12
Projects
Project I. Modules over P(S). Study properties of modules over the ring (P(S), 4, ∩), where
S is a set. Can you determine a characterization of these modules? Discuss and give examples
of torsion and annihilators. Give examples of finitely and infinitely generated modules. Can
you imagine useful applications of this algebraic structure?
Project II. Smith Normal Form of an Integer Matrix. Write a program that obtains an
M -preferred basis and the invariant factors of a submodule M = Span(f1 , f2 , . . . , fm ) in Zn
where the module elements fi are given with respect to the standard basis. (Feel free to modify
the project to do the same problem over the PID Z[i].)
Project III. Decomposition of K[Q]-modules. Consider the quiver Q1 given in Figure 10.2
and let K be any field. Study the decomposition of K[Q1 ]-modules. Can you determine
whether there are a finite number of indecomposable modules? What are the indecomposable
modules? What are the irreducible modules? Given a particular K[Q]-module, can you provide
a procedure to determine its decomposition if it does decompose?
556 CHAPTER 10. MODULES AND ALGEBRAS
Project IV. Algebras and Structure Coefficients. Exercise 10.3.35 defined the notion of
structure constants of an algebra. Give some examples of algebras and their corresponding
structure coefficients. (For example, R3 with vector
√ addition and cross product; or M2×s (R)
with matrix addition and multiplication; of Q( 3 2) over Q.) Find everything you can about
k
how the structure coefficients relate to the algebra. (What must the γij satisfy if the algebra
is commutative, or is associative, or has an identity, or has inverse, or is a field? The structure
coefficients are given in reference to a basis; how do they change under a change of basis?)
Project V. Modules of Smooth Function Rings. Let X be an interval [a, b] ⊆ R, all of R,
or Rk . Consider the ring C ∞ (X, R) of smooth functions f : X → R. (A function is smooth
if its derivatives of all orders exists and are continuous.) Explore what a C ∞ (X, R)-module
is. Can you relate it to any mathematical object that you already know? If not, can you
describe properties of modules? What do elements of a C ∞ (X, R)-module look like? Discuss
what simple and indecomposable C ∞ (X, R)-modules might be.
Project VI. Shear-Rotations in R4 . As in Euclidean three-space, a rotation in R4 is a linear
transformation such that there is a basis of R4 with respect to which the rotation has the
matrix
cos θ − sin θ 0 0
sin θ cos θ 0 0
.
0 0 1 0
0 0 0 1
In R4 , there is enough room to allow for a linear transformation T : R4 → R4 that is similar
to
cos θ − sin θ 1 0
sin θ cos θ 0 1
A= .
0 0 cos θ − sin θ
0 0 sin θ cos θ
Find the Jordan canonical form of A. Find a formula for An for all positive n. Can there be a θ
for which T is periodic? Discuss properties of T . Study the action of T on the unit hypercube.
(Is the image still a hypercube of side length 1?) This project proposed the name of “shear-
rotation” to T . Is this a good name? Why or why not? Is there any linear transformation in
R3 that has similar properties? Why or why not?
11. Galois Theory
As mentioned at the beginning of Chapter 3, the axioms of group theory were first written down in
their present form by Evariste Galois. His purpose was not to study combinatorial properties of sets
equipped with a binary operation. Instead, he used groups to study symmetries within the roots
of a polynomial. Like many other mathematicians before him, the big open problem he hoped to
address was how to solve arbitrary polynomials using radicals.
This approach to studying polynomial equations uncovered a deep connection between field
extensions and groups. The study of this relationship became known as Galois theory. To fully
appreciate it, one needs group theory, but also field theory, as well as the concept of composition
series, and group actions. Many of the profound theorems in this theory come from understanding
the interplay between the structure of field extensions and the structure of groups.
Sections 11.1 and 11.2 introduce Galois theory, along with the Fundamental Theorem of Galois
theory. Sections 11.3 and 11.4 present a number of applications of Galois theory, including the
Fundamental Theorem of Algebra and fully answering the question of geometric constructibility of
regular n-gons. Then Sections 11.5 through 11.8 focus on studying the Galois groups of polynomials
over fields of characteristic 0 or p. The final section presents the landmark result that it is impossible
to solve arbitrary equations with radicals.
Though it is possible to define a Galois theory on field extensions of infinite degree, this book
restricts its study of Galois extensions to finite extensions. For further study of Galois theory, we
recommend [15, 48].
11.1
Automorphisms of Field Extensions
Instead of directly attacking the problem of studying symmetries among roots of a polynomials,
Galois theory takes a step into abstraction and considers groups of transformations on field exten-
sions. After all, if F is a field and p(x) ∈ F [x] is an irreducible polynomial, then F [x]/(p(x)) is a
field extension of F . Furthermore, if char F = 0 and K is a finite field extension of F , then by the
Primitive Element Theorem (Theorem 7.6.13), K = F [x]/(p(x)) for some irreducible polynomial.
Definition 11.1.1
An automorphism σ ∈ Aut(K) is said to fix an element α if σα = α and σ is said to fix a
subfield F ⊆ K if σ|F = idF . If K/F is a field extension, we denote by Aut(K/F ) the set
of automorphisms of K that fix F .
Proposition 11.1.2
If F is a prime field, then Aut(F ) = {1}. In particular Aut(Q) = {1} and Aut(Fp ) = {1}.
557
558 CHAPTER 11. GALOIS THEORY
Proof. Since 12 = 1 in F , then any ring homomorphism σ : F → F satisfies σ(1)2 = σ(1). Since
F is a field, σ(1) is equal to 1 or 0. If σ(1) = 0, then σ(a) = σ(a)σ(1) = 0 so σ is the trivial
homomorphism. Thus, since σ is an automorphism, σ(1) = 1.
Then σ(n · 1) = n · 1 for all positive integers n. The elements of the prime field consist of elements
of the form f = (a · 1)/(b · 1) = (a · 1)(b · 1)−1 . Thus, the image of σ(f ) satisfies
a · 1 = σ(a · 1) = σ(f (b · 1)) = σ(f )σ(b · 1).
Thus, σ(f ) = (a · 1)(b · 1)−1 = f . Consequently, every automorphism of a prime field is trivial.
Proposition 11.1.3
If K is a field and F a subfield, then Aut(K) is a group and Aut(K/F ) is a subgroup.
Proof. The set Aut(K/F ) is not empty since it contains the identity function on K. It is clear that
if σ, τ ∈ Aut(K/F ), then στ fixed the subfield F . Hence, στ ∈ Aut(K/F ). Suppose now that σ ∈
Aut(K/F ). By definition σ(a) = a for all a ∈ F . The inverse σ −1 satisfies a = σ −1 (σ(a)) = σ −1 (a)
for all a ∈ F and so σ −1 also fixes F . Thus, Aut(K/F ) is closed under taking inverses and the
proposition follows.
The following proposition is the key to connecting properties of the group Aut(K/F ) with the
study of symmetries among roots of a polynomial.
Proposition 11.1.4
Let K/F be a field extension and let α ∈ K be algebraic over F . For any σ ∈ Aut(K/F ),
σα is also a root of the minimal polynomial mα,F (x) ∈ F [x].
√
and, therefore, is determined uniquely by√the action on the element 3 13. This is all essentially the
same as in Example 11.1.5. However, σ( 3 13) must be a root of x3 − 13 = 0. This polynomial has 3
roots, two
√ of which are complex.
√ The
√ field extension K is a subfield of R so the only root of x3 − 13
3 3 3
in K is 13. Therefore, σ( 13) = 13 for all σ ∈ Aut(K/Q) and so Aut(K/Q) = {1}. 4
11.1. AUTOMORPHISMS OF FIELD EXTENSIONS 559
Example 11.1.7. √ Following the pattern of the previous two examples consider the field exten-
4
sion K = √ Q( 13) over Q. An √ automorphism σ ∈ Aut(K/Q) is uniquely determined by how it
acts
√ on√
4
13. √ The image
√ σ( 4
13) must be a root of x4 − 13. The roots of this polynomial are
4
13, i 4 13, − 4 13, −i 4 13 but only two of these are in the real field K. Hence, again Aut(K/Q) ∼
=
Z2 . 4
√ √
Example 11.1.8. Consider√the √ field extension F = Q( 2 + 3) over Q. (See Example 7.2.7.) The
minimal polynomial of α = 2+ 3 is mα,Q (x) = x4 −10x2 +1 and the four roots of this polynomial
are √ √ √ √ √ √ √ √
α1 = 2 + 3, α2 = 2 − 3, α3 = − 2 + 3, α4 = − 2 − 3.
Let σ ∈ Aut(F/Q). Then according to Proposition 11.1.4, σ must permute the roots of mα,Q (x).
√ √S4 has order 4! = 24. However, Aut(F/Q) is not all of S4 . In Example 7.2.7
The permutation group
we observed
√ that 2, 3 ∈√F so all the roots of mα,Q (x) are in F . It is straightforward to check
that 2 = 12 (α3 − 9α) and 3 = 12 (11α − α3 ). Hence,
Proposition 11.1.9
Let σ1 , σ2 , . . . , σn be distinct embeddings (injective homomorphisms) of a field K into a
field L. Then {σ1 , σ2 , . . . , σn } is a linearly independent set in Fun(K, L).
Proof. Assume that the set {σ1 , σ2 , . . . , σn } is linearly dependent. Then there exists a nontrivial
linear combination of the σi that gives the 0 function. Let m be the least positive integer such
that there exists a nontrivial linear combination of the σi that gives the 0 function. Possibly after
relabeling the σi , suppose that
c1 σ1 + c2 σ2 + · · · + cm σm = 0 (11.1)
for some nonzero ci ∈ L. Since σ1 6= σm as functions, then there exists some a ∈ K such that
σ1 (a) 6= σm (a). Then for all x ∈ K,
c1 σ1 (ax) + c2 σ2 (ax) + · · · + cm σm (ax) = 0
=⇒ c1 σ1 (a)σ1 (x) + c2 σ2 (a)σ2 (x) + · · · + cm σm (a)σm (x) = 0.
560 CHAPTER 11. GALOIS THEORY
Proposition 11.1.11
Let H ≤ Aut(K). The subset Fix(K, H) of elements in K fixed by H is a subfield of K.
Definition 11.1.12
The field Fix(K, H) in the above proposition is called the fixed subfield of K by H.
Proposition 11.1.13
Let K be a field. The association between subgroups of Aut(K) and subfields of K via
Aut(K) K
H −→ Fix(K, H)
Aut(K/F ) ←− F
Proof. Suppose first that F1 ⊆ F2 ⊆ K. Let σ be an automorphism in Aut(K) that fixes F2 . Then
since F1 ⊆ F2 , σ fixes every element in F1 so σ ∈ Aut(K/F1 ). Thus, Aut(K/F2 ) ≤ Aut(K/F1 ).
Suppose that H1 ≤ H2 ≤ Aut(K). Let a ∈ Fix(K, H2 ). Since H1 ≤ H2 , the field element a is
fixed by all σ ∈ H1 . This implies that a ∈ Fix(K, H1 ). Thus, Fix(K, H2 ) ≤ Fix(K, H1 ).
Finally, suppose that Fix(K, H1 ) = Fix(K, H2 ). Then H1 fixes Fix(K, H2 ) and since Fix(K, H2 )
is the fixed field of H2 , then H1 ≤ H2 . Since H2 fixes Fix(K, H1 ), we also deduce that H2 ≤ H1 .
Thus, H1 = H2 .
Theorem 11.1.14
Let G be a finite subgroup Aut(K) and let F = Fix(K, G). Then [K : F ] = |G|.
However, {ωj }m
i=1 forms a basis of K over F so since the aj are arbitrary, then α is an arbitrary
element of K. Thus, we have shown that {σ1 , σ2 , . . . , σn } is linearly dependent, which contradicts
Proposition 11.1.9. By contradiction, we conclude that m ≥ n.
562 CHAPTER 11. GALOIS THEORY
Suppose now that n < m. We can find n + 1 linearly F -independent elements α1 , α2 , . . . , αn+1
in K. Then the system
has n equations in n + 1 unknowns. Therefore, the system has a nonzero solution (β1 , β2 , . . . , βn+1 )
in K. Furthermore, not all βi can be in F because otherwise, since σ is the identity, the first equation
in the system would produce a linear dependence of the αj over F . Thus, at least one βi ∈ / F.
From the nontrivial solutions, choose one that has the least number of nonzero entries βi . Without
loss of generality, assume that β1 ∈ K − F and that for some 2 ≤ r ≤ n + 1 we have β1 , . . . , βr−1
are nonzero, βr = 1 and βi = 0 for i > r. Then our system of equations becomes
for i = 1, 2, · · · , n.
Since β1 ∈ / F , there exists i0 such that σi0 (β1 ) 6= β1 ; otherwise, β1 would be in the fixed field of
G which is F . Applying σi0 to the above equations (indexed by i), gives
(σi0 σi )(α1 )σi0 (β1 ) + · · · (σi0 σi )(αr−1 )σi0 (βr−1 ) + (σi0 σi )(αr ) = 0. (11.3)
However, since the {σ1 , σ2 , . . . , σn } is a group, then σi0 σi as i = 1, 2, . . . , n runs through all σj in
G. Therefore, the system in (11.3) has equations reordered from (11.2). Subtracting equations with
corresponding σi , we obtain the new system
is a solution to the original system. Furthermore, this solution is nontrivial since β1 − σi0 β1 6= 0
and has fewer than r nonzero coefficients in its n-tuple. This contradicts the minimality of r. We
conclude that the trivial solution to the system is the unique solution, and thus n = m.
This theorem implies the following important corollary that offers an upper bound and a sharp-
ness condition on the size of the automorphism group Aut(K/F ).
Corollary 11.1.15
Let K/F be any field extension. Then | Aut(K/F )| ≤ [K : F ] with equality if and only if
F is the fixed field of Aut(K/F ).
Definition 11.1.16
A finite extension of a field K/F is called a Galois extension if | Aut(K/F )| = [K : F ]. If
K/F is a Galois extension, then the automorphism group Aut(K/F ) is called the Galois
group of the extension and is denoted by Gal(K/F ).
11.1. AUTOMORPHISMS OF FIELD EXTENSIONS 563
Galois theory hinges on Galois extensions and the relationship between subgroups of Gal(K/F )
and fields L satisfying F ⊆ L ⊆ K. At present, it may seem like an intractable problem to determine
if a field extension is Galois. Corollary 11.1.15 can be restated to give the following criterion for a
Galois extension.
Corollary 11.1.17
An extension K/F is Galois if and only if F is the fixed field of Aut(K/F ).
Example 11.1.18. All quadratic extensions of Q are Galois. Recall that a quadratic extension
of Q involves a field Q(α), where α is the root of some irreducible quadratic polynomial m(x) =
x2 + bx + c ∈ Q[x]. The polynomial m(x) has two roots
1 p 1 p
(−b + b2 − 4c) and (−b − b2 − 4c)
2 2
and if we label one of the roots as α, then the other one is −b − α. Consequently, there are two
automorphisms in Aut(Q(α)/Q): the identity function and σ such that σ(m + nα) = m + n(−b − α).
Hence, Aut(Q(α)/Q) has order 2, which is equal to [Q(α) : Q]. We write Gal(Q(α)/Q) ∼ = Z2 . 4
√
Example√ 11.1.19. The observations in Examples 11.1.6 and 11.1.7 show that neither Q( 3 13)
4
nor√Q( √13) are Galois extensions over Q. In Example 11.1.10, we considered the field extension
Q( 3 13, √ −3)√over Q, which is the splitting field of x3 − 13. We had already determined directly
that Q( 3 13, −3)/Q is a Galois extension, though Theorem 11.2.1 proves it in general. 4
Galois extensions and their groups of automorphisms are the central themes of this chapter.
11.2
Fundamental Theorem of Galois Theory
The definition of a Galois extension does not lend itself to readily identifying which field extensions
are Galois. This section first gives a full characterization of Galois extensions and then explores in
further detail the relationship between subgroups of Aut(K/F ) and fields L satisfying F ⊆ L ⊆ K.
A more detailed analysis of the association described in Proposition 11.1.13 leads to the Fundamental
Theorem of Galois Theory.
Theorem 11.2.1
Let E be the splitting field over F of a separable polynomial p(x) ∈ F [x]. Then the
extensions E/F is Galois, i.e., | Aut(E/F )| = [E : F ].
Proof. It is convenient to prove a stronger result. We prove that (*) the number of ways of extending
isomorphisms ϕ : F → F 0 to splitting fields E and E 0 of p(x) and p0 (x) = ϕ(p(x)) is equal to
[E : F ] = [E 0 : F 0 ]. The Theorem follows once we apply this result to F = F 0 .
We proceed by induction. If [E : F ] = 1 then E = F and any extension E → E 0 of the
isomorphism ϕ : F → F 0 involves a field E 0 such that [E 0 : F 0 ] = 1. Hence, E 0 = F 0 and the only
extension of ϕ to E is ϕ itself.
Now suppose that (*) holds for all separable polynomials q(x) ∈ F [x] with a splitting field Eq
satisfying [Eq : F ] < n for some positive integer n ≥ 2. If p(x) ∈ F [x] has a splitting field E with
[E : F ] = n ≥ 2 then p(x) has at least one irreducible factor f (x) F [x] with degree greater than 1
and similarly for p0 (x) = ϕ(p(x)) and f 0 (x). Let α be a root of f (x). If σ : E → E 0 is an extensions
of the isomorphism ϕ, then σ restricts to F (α) as σ F (α) = τ : F (α) → F 0 (β) where β is a root of
f 0 (x).
σ
E E0
σ F (α)
F (α) F 0 (β)
τ
ϕ
F F0
11.2. FUNDAMENTAL THEOREM OF GALOIS THEORY 565
To count the number of extensions σ of the isomorphism ϕ, we count the number of such diagrams.
The number of extensions of ϕ to isomorphisms τ : F (α) → F 0 (β) is equal to the number of distinct
roots β of f 0 (x). Since deg f 0 (x) = [F 0 (β) : F 0 ] = [F (α) : F ] and since p0 (x) is separable, the number
of extensions τ is exactly [F (α) : F ].
Since E and E 0 are splitting fields of p(x) over F (α) and of p0 (x) over F 0 (β), then we can use the
induction hypothesis on extending the isomorphism τ : F (α) → F 0 (β) to E → E 0 , since [E : F (α)]
is strictly less than n. There are exactly [E : F (α)] extensions of τ to maps σ : E → E 0 . Since
[E : F ] = [E : F (α)][F (α) : F ], the number of extensions of ϕ to maps σ : E → E 0 is equal to
[E : F ]. This establishes (*).
Applying the result (*) to the isomorphism idF : F → F proves the theorem.
Theorem 11.2.1 gives a particularly easy way of finding Galois extensions: Find a splitting field
of a separable polynomial. This situation is so common that we give it its own notation.
Definition 11.2.2
If f (x) ∈ F [x] is a separable polynomial over F , then the Galois group of the splitting field
of f (x) over F is called the Galois group of f (x) and is denoted by GalF (f (x)) or simply
Gal(f (x)), whenever the field F is understood by context.
Though Theorem 11.2.1 gives a way to find Galois extensions, it has a converse, which gives a
complete characterization of Galois extensions.
Theorem 11.2.4
A finite extension K/F is Galois if and only if K is the splitting field of a separable
polynomial over F . Furthermore, if K/F is Galois, then every irreducible polynomial in
F [x] which has a root in K is separable and has all its roots in K.
Proof. Theorem 11.2.1 shows that if K is the splitting field of a separable polynomial in F [x], then
K/F is Galois. We need to prove the converse.
Suppose that K/F is Galois and call G = Gal(K/F ). Let p(x) ∈ F [x] be a polynomial that
has a root α in K. Call G · α the orbit of α under the action of G on K and write G · α = {α1 =
α, α2 , . . . , αr }. Any automorphism τ ∈ G acts as a permutation on G · α. Therefore, the coefficients
of
f (x) = (x − α)(x − α2 ) · · · (x − αr )
are fixed by all the elements of G, since any τ ∈ G simply permutes these linear factors.
If p(x) ∈ F [x] is a minimal polynomial and α is a root of p(x) in K, then p(x) must divide f (x)
since it is a constant multiple of mF,α (x). However, f (x) must also divide p(x) since any αi must
also be a root of p(x). Thus, adjusting multiplicative constants, we can assume that p(x) = f (x).
This proves the second part of the theorem.
566 CHAPTER 11. GALOIS THEORY
Now suppose that β1 , β2 , . . . , βm is a basis of K over F . We just proved that each minimal
polynomial mβi ,F (x) is separable with all its roots in K. Define
Again, this polynomial splits completely in K[x]. Furthermore, it contains all the roots of every
mβi ,F (x) so the action of any τ ∈ G simply permutes the linear factors. Thus, g(x) ∈ F [x]. Since all
the roots of g(x) are in K, then the splitting field E of g(x) over F is a subfield of K. On the other
hand, the splitting field of g(x) contains {β1 , β2 , . . . , βm }, which is a basis of K over F . Hence, K
is a subfield of E. We deduce that K = E and thus that K is the splitting field of some separable
polynomial.
Because of this theorem, Galois extensions become the natural context in which to study roots
of polynomials.
Definition 11.2.5
Let K/F be a Galois extension and let α ∈ K. The (Galois) conjugates of α are the distinct
elements σ(α) for σ ∈ Gal(K/F ). If E is a subfield of K containing F , then σ(E) is called
the conjugate field of E over F .
In other words, the conjugates of α consist of the elements in the orbit of α in the action
Gal(K/F ) on K. By Proposition 11.1.4 and Theorem 11.2.4, the conjugates of α are precisely the
set of roots of the minimal polynomial mα,F (x).
√
Example 11.2.6. Consider K = Q( 3 13, ζ3 ). This is the splitting field over Q of the polynomial
p(x) = x3 − 13. Recall from Example 11.1.10 √ that G = Gal(K/Q) ∼ = D3 , generated by ρ and τ as
described in that example. On the generators 3 13 and ζ3 , the automorphisms ρ and τ act as
( √ √ ( √ √
ρ( 3 13) = 3 13ζ3 τ ( 3 13) = 3 13
ρ: and τ:
ρ(ζ3 ) = ζ3 τ (ζ3 ) = ζ32 .
√ √
Consider the element α = 1 − 3 3 13 + 2( 3 13)2 . The conjugates of α are the distinct elements
one obtains by applying the elements of G. In this case:
√ √ √ √ √ √
α1 = 1 − 3 2 + 2( 2)2 , α2 = 1 − 3 2ζ3 + 2( 2)2 ζ32 , α3 = 1 − 3 2ζ32 + 2( 2)2 ζ3 .
3 3 3 3 3 3
Note there are only 3 conjugates even though |G| = 6. We can get some mileage out of this
observation. First note that the minimal polynomial of α is given by
√ √ √ √ √ √
p(x) = (x − (1 − 3 13 + 2( 13)2 ))(x − (1 − 3 13ζ3 + 2( 13)2 ζ32 ))(x − (1 − 3 13ζ32 + 2( 13)2 ζ3 )).
3 3 3 3 3 3
A calculation gives p(x) = x3 − 3x2 + 237x − 1236. We can use this result to calculate the inverse of
α in a similar way as the technique taught in college algebra called rationalizing the denominator:
√ √ √ √ √ √
1 α2 α3 (1 − 3 3 13ζ3 + 2 3 169ζ32 )(1 − 3 3 13ζ32 + 2 3 169ζ3 ) 79 + 55 3 13 + 7 3 169
= = = .
α αα2 α3 1236 1236 4
Sub(K/F ) Sub(G)
E −→ Aut(K/E)
Fix(K, H) ←− H
Proof. By Theorem 11.2.4, we can suppose that K is the splitting field of some separable polynomial
f (x) ∈ F [x].
Proposition 11.1.13 already established injectivity from right to left as well as part (1).
We now prove surjectivity from right to left. For any E ∈ Sub(K/F ), we can view f (x) as a
polynomial in E[x]. Then K is the splitting field of f (x) over E and by Theorem 11.2.4, K/E is
Galois. By Corollary 11.1.17, E is the fixed field of Aut(K/E). Now, Aut(K/E) is a subgroup of
Aut(K/F ) and we have now proved that E = Fix(K, Aut(K/E)). This gives surjectivity from right
to left. This also establishes part (3).
For part (2), let E ∈ Sub(K/F ) with H = Gal(K/E) and thus also E = Fix(K, H). Then since
K/E is Galois, |H| = | Gal(K/E)| = [K : E]. It follows that
For part (4), first consider two subfields E1 , E2 ∈ Sub(K/F ). If σ ∈ Aut(K/E1 ) ∩ Aut(K/E2 ),
then σ fixes everything in the composite field E1 E2 . Thus,
Conversely, if σ ∈ Aut(K/(E1 E2 )), then σ fixes every element of E1 and also every element of E2 .
Therefore,
Aut(K/(E1 E2 )) ≤ Aut(K/E1 ) ∩ Aut(K/E2 )
which shows that these two subgroups are equal. Second, consider H1 , H2 ∈ Gal(K/F ). If a ∈
Fix(K, H1 ) ∩ Fix(K, H2 ), then a is fixed by any element in H1 and any element in H2 , so a ∈
Fix(K, hH1 , H2 i). Thus,
Conversely, if a ∈ Fix(K, hH1 , H2 i), then a is certainly in both Fix(K, H1 ) and in Fix(K, H2 ),
thereby establishing the reverse inclusion.
For part (5), let E = Fix(K, H) for H ≤ G. Every σ ∈ G gives an isomorphism σ|E : E → σ(E)
where σ(E) is a conjugate subfield of E in K. On the other hand, if we consider any isomorphism
between field τ : E → E 0 where E 0 ⊆ F̄ which fixes F , then τ (E) is in fact contained in K. We see
568 CHAPTER 11. GALOIS THEORY
D3
hρi
H
hτ i hτ ρi 2
hτ ρ i
{1}
Gal(K/E) Fix(K, H)
K
√ √ √
Q( 3 13) Q( 3 13ζ3 ) Q( 3 13ζ32 )
E
Q(ζ3 )
this because if α ∈ E with minimal polynomial mα,F (x), then τ (α) is another root of mα,F (x) and
K contains all these roots. Since K is the splitting field of f (x) over E, then it is also the splitting
field of τ (f (x)). By Theorem 7.6.10 τ : E → τ (E) extends to an isomorphism σ : K → K.
Since σ fixes F (since τ does), we conclude that every isomorphism τ of E with another field
extension of F is the restriction of some σ ∈ Aut(K/F ).
Now two automorphisms of σ1 , σ2 ∈ G restrict to the same embedding τ of E fixing F if σ2−1 σ1
is the identity map on E, which means that σ2−1 σ1 ∈ H, which also means that σ1 H = σ2 H as
cosets in G. Consequently, there is a bijection between the cosets of H and the isomorphisms of
E with some subfield of F . In particular, we again see the result of part (2) that the number of
isomorphisms of E with subfield of F is |G : H| = [E : F ].
The extension E/F is Galois if and only if | Aut(E/F )| = [E : F ]. This will only be the case if
all the isomorphisms of E with some subfield of F are actually isomorphisms of E with itself that
fix F . Thus, E/F is Galois if an only if σ(E) = E for all σ ∈ G. However, since H = Aut(K/E),
then σHσ −1 = Aut(K/σ(E)). And we conclude from previous parts of the theorem that σ(E) = E
if and only if σHσ −1 = H for all σ ∈ G. This concludes part (5).
The Fundamental Theorem of Galois Theory can be restated in the following way. The function
Ψ : Sub(K/F ) ←→ Sub(Aut(K/F )) defined by Ψ(E) = Aut(K/E) is a monotonic bijection between
the lattices (Sub(K/F ), ⊇) and (Sub(Aut(K/F )), ≤) in which Galois extensions E/F correspond to
normal subgroups N = Aut(K/E) in Aut(K/F ). The statement that Ψ is a bijection between
lattices (as opposed to just posets) means that it must preserve greatest lower bounds and least
upper bounds. (Part (4).)
As a first example, let us revisit the running example we have considered regularly in this and
the previous section.
√
Example 11.2.8. Consider the Galois extension K = Q( 3 13, ζ3 ) over Q. In Example 11.1.10, we
saw that Aut(K/Q) ∼ = D3 generated by ρ and τ defined by
( √ √ ( √ √
ρ( 3 13) = 3 13ζ3 τ ( 3 13) = 3 13
ρ: and τ:
ρ(ζ3 ) = ζ3 τ (ζ3 ) = ζ32 .
The subgroup lattice of D3 is easy to draw. (See the top part of Figure 11.1.)
11.2. FUNDAMENTAL THEOREM OF GALOIS THEORY 569
We see explicitly that the generators of Gal(K/F ) have the same relations as D4 . The lattice of D4
is the following.
Gal(K/F )
hτ i hτ ρ2 i hρ2 i hτ ρi hτ ρ3 i
According to the Fundamental Theorem of Galois Theory, the Hasse diagram of the lattice of
subextensions in Sub(K/F ) is:
570 CHAPTER 11. GALOIS THEORY
√ √
Q(α) Q(β) Q( 2, 3) Q(α − β) Q(α + β)
√ √ √
Q( 3) Q( 2) Q( 6)
Again, note that the above two diagrams are reflections of each other through a horizontal line. In
the above diagram, we took care to calculate the fixed subfields of K by determining elements that
remained invariant under the action of the corresponding subgroup H. A useful result for the above
calculations is that
√ √ √
τ (β) = τ ( 2(α2 − 3)/α) = τ ( 2)(τ (α)2 − 3)/τ (α) = (− 2)(α2 − 3)/α = −β.
2
For example, if H √ = hρ , τ i, then Fix(K, H)√is a subfield of K of index 2 over Q. √ Consequently,2
Fix(K, H) = Q( a) for some a ∈ Q. Since 3 √= α2 − 3, it is easy to see that 3 is fixed √ by ρ
2 2 2 2 2 2 2
(since
√ ρ (α − 3) = (ρ (α)) − 3 = (−α) − 3 = 3. Furthermore, τ (α − 3) = α − 3 = 3. Thus,
√
Q( 3) ⊆ Fix(K, H), so knowing that [Fix(K, H) : Q] = 2, we deduce that Fix(K, H) = Q( 3).
By the Fundamental Theorem of Galois Theory, knowing the normal subgroups√of Gal(K/F √ √) =
D4√, we deduce √ that the only nontrivial Galois extensions of Q inside K are K, Q( 2, 3), Q( 2),
Q( 3), and Q( 6). 4
By virtue of the close connection to group theory and Galois extensions, it is common to adapt
terms heretofore applied only to describe groups to field extensions.
Definition 11.2.10
A field extensions K/F is called
10. Find the Galois group of x4 − 13 and draw the subfield lattice of its splitting field over Q.
√ √ √
11. Let p1 , p2 , . . . , pn be distinct prime numbers and let K = Q( p1 , p2 , . . . , pn ).
(a) Show that K/Q is Galois.
(b) Show that Gal(K/Q) is generated by automorphisms σi defined by
( √
√ − pj if i = j
σi ( pj ) = √
pj 6 j.
if i =
√ √
13. Let F be a field of characteristic char F 6= 2. Suppose that c ∈ F such that c ∈/ F and let K = F ( c).
√ √ √
Let α = a + b c with a, b not both zero and call L = K( α). Set α0 = a − b c. Prove that L is
Galois over F if and only if either αα0 or cαα0 is a square in F . Prove also that Gal(L/F ) is cyclic of
degree 4 if and only if cαα0 is a square in F .
14. Let K = F (α) be a finite separable extension of a field F with [K : F ] prime. Let α = α1 , α2 , . . . , αp
be the conjugates of α over F . Prove that if α2 ∈ K, then K/F is Galois and that Gal(K/F ) ∼ = Zp .
15. Let L/F be a Galois extension and let p be the smallest prime dividing [L : F ]. Prove that if K is a
subfield of L containing F such that [L : K] = p, then K/F is a Galois extension.
16. Let E/F be a normal extension of finite degree. Let K and L be fields with F ⊆ K, L ⊆ E. Assume
that E is separable over K and over L. Prove that E is separable over K ∩ L.
17. Find an example of fields F ⊆ K ⊆ E such that K/F is Galois and E/K is Galois but such that E/F
is not Galois.
18. The relation of normal subgroup is not transitive. In other words if N1 and N2 are subgroups in G
such that N1 E N2 and N2 E G, then it is not necessarily true that N1 E G. Express this nontransitive
property in terms of Galois extensions under the Galois correspondence.
19. Let F be a field and let A ∈ Mn (F ). Let E be the splitting field of the characteristic polynomial
cA (x) over F . Let σ ∈ Gal(E/F ) be a field automorphism. Let Gal(E/F ) act on the vector space E n
by σ(v) = (σ(v1 ), σ(v2 ), . . . , σ(vn )) for all v = (v1 , v2 , . . . , vn ) ∈ E n . Prove that if w is a generalized
eigenvector of rank k with respect to the eigenvalue λ, then σ(w) is also a generalized eigenvector of
rank k with respect to the σ(λ).
11.3
First Applications of Galois Theory
The Galois correspondence established in Theorem 11.2.7 opens doors to many pathways of investi-
gation that relate properties of group theory and field extensions. This section explores a few initial
applications.
572 CHAPTER 11. GALOIS THEORY
Proposition 11.3.1
Let K1 and K2 be Galois extensions of a field F . Then the intersection K1 ∩ K2 is Galois
over F and the composite field K1 K2 is Galois over F .
Proposition 11.3.2
Let K1 and K1 be Galois extensions of a field F with K1 ∩ K2 = F . Then
Gal(K1 K2 /F ) ∼
= Gal(K1 /F ) ⊕ Gal(K2 /F ).
Proposition 11.3.3
Let E be a finite separable extension of a field F . Then E is contained in an extension
K that is Galois over F and is minimal in the sense that in a fixed algebraic closure of K
every Galois extension of F containing E also contains K.
Proof. Since E is a finite extension of F , then E = F (α1 , . . . , αn ) for some finite set of algebraic
elements αi over F . The splitting field K 0 of the collection {mF,αi (x)} is Galois over F and contains
E. Now K 0 might not be the smallest field extension that satisfies the desired property.
By the Fundamental Theorem of Galois Theory, K 0 has only a finite number of subfields. By
Proposition 11.3.1, K1 ∩ K2 is again a Galois extension of F . The desired field K is the intersection
of all Galois subfields in K 0 that contain E, since this is again Galois.
Definition 11.3.4
The field K in the above proposition is called the Galois closure of E over F .
√
As a simple example, we observe√ that K = Q( 3 13) is not Galois over Q. The field K 0 in the
proof of Proposition 11.3.3 is Q( 3 13, ζ3 ). There is no Galois subfield of K 0 containing K so K 0 is
the Galois closure of K. If K is already Galois, then it is its own Galois closure.
In a variety of applications, when considering the properties of a field extension K/F , it is useful
to embed K in the Galois closure of K over F . In the Galois closure, we can use Galois theory and
then restrict the study to the extensions K/F . The proofs of the following two propositions gives a
first example where this strategy is effective.
Proposition 11.3.5
Let F be a field, K a Galois extension of F , and L any extension of F . Then the composite
field KL is Galois over L with
Gal(KL/L) ∼
= Gal(K/K ∩ L),
Proof. This is an application of the Second Isomorphism Theorem. Recall that if B is a normal
subgroup of a group G and A is any other group, then B E AB, A ∩ B E A and AB/B ∼ = A/(A ∩ B).
Let L0 be an extension of L that is Galois over F , say the Galois closure of L over F . Then KL0
is a Galois field extension of F . Call G = Gal(KL0 /F ). Let A be the subgroup of G corresponding
to L and let B be the subgroup corresponding to K under the Galois correspondence of subfields of
KL0 over F and subgroups of G. Since K is Galois over F , then by the fundamental theorem, B is
a normal subgroup of G.
Again by the fundamental theorem, Fix(KL0 , AB) = K ∩ L and Fix(KL0 , A ∩ B) = KL. Thus,
we conclude that K is Galois over K ∩ L and KL is Galois over L and that
Gal(KL/L) ∼
= Gal(K/K ∩ L).
Proposition 11.3.6
Suppose that K is a Galois extension of F and that L is any finite extension. Then
[K : F ][L : F ]
[KL : F ] = .
[K ∩ L : F ]
Proof. Let L0 be the Galois closure of L over F . Let G = Gal(KL0 /F ). Under the Galois corre-
spondence, let A correspond to L and B correspond to K. In particular, A = Gal(KL0 /L) and
B = Gal(KL0 /K). By Proposition 4.1.16,
|A||B|
|AB| = .
|A ∩ B|
However, by definition any Galois extension E of F has | Gal(E/F )| = [E : F ]. Hence,
[KL0 : L][KL0 : K]
| Gal(KL0 /(K ∩ L))| = [KL0 : K ∩ L] = .
[KL0 : KL]
But for any subfield E of KL0 containing F , we have [KL0 : E] = [KL0 : F ]/[E : F ]. The
proposition follows after simplification.
Definition 11.3.7
With the above fields, we define the norm of α from K to F as the product
Y
NK/F (α) = σ(α),
σ
where the product is taken over a complete set of distinct coset representatives of H in
Gal(L/F ).
This product is well-defined because if σ1 and σ2 are two different representatives of the same
coset, then σ2−1 σ1 ∈ H. Therefore, σ2−1 σ1 (α) = α and therefore σ1 (α) = σ2 (α). Hence, no matter
the choice of coset representative taken in the product, the value of σ(α) remains the same.
Note that if K is Galois, then we can take L = K and the product in Definition 11.3.7 runs over
all elements in Gal(K/F ).
574 CHAPTER 11. GALOIS THEORY
Proposition 11.3.8
The norm is a function NK/F : K → F satisfying NK/F (αβ) = NK/F (α)NK/F (β) for all
α, β ∈ K.
However, since the action of a group G on the cosets of a subgroup H is a transitive action, as σ
runs through a complete set of distinct coset representatives, so does τ σ. Hence,
Y
τ (NK/F (α)) = σ(α) = NK/F (α).
σ
The norm of an algebraic element α over a base field F provides a direct way to calculate the
inverse of α. In the product defining the norm, the coset representative of H fixes α, so α appears
explicitly in the product. Consequently,
!
1 1 Y
= σ(α) .
α NK/F (α)
σ: σ ∈H
/
We understand the product to run over all conjugates of α over F except α itself.
√ √
Example 11.3.9. Consider√ the element α = 2−3( 3 13)2 in K = Q( 3 13). We know that K/Q is not
Galois but that L = Q( 3 13, ζ3 ) is a Galois extension over Q. Using the notation of Example 11.2.6,
K = Fix(L, hτ i) so a distinct set of coset representatives of H = hτ i is 1, ρ, ρ2 . Hence,
= −4555.
Definition 11.3.10
We define the trace of α from K to F as the sum
X
TrK/F (α) = σ(α),
σ
where the sum is taken over a complete set of distinct coset representatives of H in
Gal(L/F ).
11.3. FIRST APPLICATIONS OF GALOIS THEORY 575
The sum is well-defined for the same reason that the norm was well-defined, despite the use of
coset representatives. We omit the proof for the following proposition since it is nearly identical to
the equivalent one for the norm.
Proposition 11.3.11
The trace is a function TrK/F : K → F satisfying TrK/F (α + β) = TrK/F (α) + TrK/F (β)
for all α, β ∈ K.
The definition of the norm and the trace of elements from a field K to a base field F illustrate
a rather common strategy in field theory. Since the automorphism groups of Galois extensions of
a base field have nice group properties, when studying an arbitrary finite field extension K/F it is
convenient to simply work in the Galois closure of K over F .
Proof. To show that C is algebraically closed, we must show that every polynomial p(z) ∈ C[z] has
a root in C.
Let p(z) ∈ C[z] and consider the polynomial q(z) = p(z)p(z). If
p(z) = cn z n + · · · + c1 z + c0 ,
However, we see that for all m with 0 ≤ m ≤ 2n, the mth coefficient is unchanged under complex
conjugation so we deduce that q(z) ∈ R[z]. Furthermore, if z0 is a root of q(z), then either p(z0 ) = 0
or p(z0 ) = 0, so either z0 or z0 is a root of p(z). Hence, in order to prove that every polynomial
p(z) ∈ C[z] has a root in C, it suffices to prove that every polynomial q(x) ∈ R[x] has a root in C.
We now prove by induction on ord2 (deg q(x)) that every polynomial q(x) ∈ R[x] has a root in C.
For the base case, assume that ord2 (deg(q(x)) = 0, i.e., that deg q(x) is odd. Then
lim q(x) = − lim q(x) = ±∞.
x→−∞ x→∞
Hence, for a, b ∈ R large enough in the negative sense and positive sense, q(a) and q(b) are of
opposite signs. Consequently, by the Intermediate Value Theorem, since polynomials are continuous
functions, there exists c ∈ [a, b] such that q(c) = 0. Thus, every odd degree real polynomial has a
real root.
576 CHAPTER 11. GALOIS THEORY
Now let k ≥ 1 and suppose that every polynomial q(x) ∈ R[x] with ord2 (deg q(x)) < k has a root
in C. Let F be a splitting field of q(x) over C, so that in F we write
A priori, this polynomial is in the ring F [x], where F is an algebraic and Galois extension of C and
also of R. The degree of ft (z) is n2 = 21 n(n − 1) = 2k−1 m(n − 1). Since n is even, n − 1 is odd
and so ord2 (deg ft (z)) = k − 1. Any σ ∈ Gal(F/R) permutes the terms in the product of ft (z) so in
fact the coefficients of ft (z) are invariant under σ. Thus, ft (z) ∈ R[x]. By the induction hypothesis,
ft (z) has a root in C.
Every root of ft (z) is of the form zi + zj + tzi zj . There are only n2 pairs (i, j) but there are
infinitely many parameters t ∈ R, so by the pigeonhole principle there is a specific pair (i, j) such
that there exist two distinct parameter values t1 6= t2 such that
zi + zj + t1 zi zj = c1 ∈ C and zi + zj + t2 zi zj = c2 ∈ C.
This polynomial is in C[x]. However, the quadratic formula gives explicit solutions in C for any
quadratic polynomial in C. Hence, both these specific zi and zj are in C. Consequently, we have
proven that q(x) has a root in C. This completes the induction proof and the theorem follows.
The proof given above is due to Laplace in 1795. However, at the time that Laplace offered this
argument, it was incomplete because the theorem on the existence of splitting fields of a polynomial
had not been established. The essay by Remmert in [52] gives a excellent summary of the history
behind various proofs of the FTA, Fundamental Theorem of Algebra.
√
8. Let n ∈ Z be cube-free and consider the extension K = Q( 3 n). Give formulas for the norm to Q, the
√ √ 2
trace to Q, and the inverse in K of the generic element α = a + b 3 n + c 3 n .
√ √
9. Consider the ring R = Z[ 3 2] inside its field of fractions K = Q( 3 2).
√
(a) Prove that the norm NK/Q , when restricted to R, is a multiplicative function from Z[ 3 2] to Z.
(b) Prove that the group of units U (R) is {α ∈ R | NK/Q (α) = ±1}.
√
(c) Find one unit in R = Z[ 3 2] that is neither 1 nor −1.
√ √ 2
10. Using Galois conjugates, find the minimal polynomial of Q of 3 − 3 7 + 2 3 7 .
11. Suppose that K/F is any field extension with [K : F ] = n. Let α ∈ K. Suppose that the minimal
polynomial of α over F is mα,F (x) = xd + ad−1 xd−1 + · · · + a1 x + a0 .
(a) Prove that d | n.
(b) Prove that there are d distinct Galois conjugates that are repeated n/d times in the product for
the norm NK/F (α).
n/d
(c) Deduce that NK/F (α) = (−1)n a0 .
(d) Deduce also that TrK/F (α) = − nd ad−1 .
12. Let K/F be a Galois extension and let σ ∈ Gal(K/F ).
(a) Prove that if α = β/σ(β), for some β ∈ K, then NK/F (α) = 1.
(b) Prove that if α = β − σ(β), for some β ∈ K, then TrK/F (α) = 1.
13. Let K/F be a finite extension. Prove that K is a simple extension of F (with K = F (α)) if and only
if there exist only finitely many subfields of K containing F .
11.4
Galois Groups of Cyclotomic Extensions
As we continue to develop the theory of Galois groups of polynomials, it is very useful to understand
the Galois groups of xn − a and more precisely xn − 1. In Section 7.5, we saw that xn − 1 can
be factored into cyclotomic polynomials so we propose to study the Galois groups of cyclotomic
polynomials Φn (x) ∈ Q[x].
Theorem 11.4.1
The Galois group of Q(ζn ) is isomorphic to U (n) = U (Z/nZ) given explicitly by
With this result and the Fundamental Theorem of Galois Theory, we can construct primitive
field elements for all the subfields of Q(ζp ), where p is an odd prime. Recall that [Q(ζp ) : Q] = p − 1
and that the cyclotomic polynomial Φp (x) is
Though it is natural to use 1, ζp , ζp2 , . . . , ζpp−2 as a basis of Q(ζp ) over Q, we can alternatively use
the elements ζp , ζp2 , . . . , ζpp−2 , ζpp−1 because
1 = −ζp − ζp2 − · · · − ζpp−2 − ζpp−1 .
Furthermore, by Proposition 7.5.2, U (Z/pZ) = U (Fp ) is a cyclic group so Gal(Q(ζp )/Q) ∼
= Zp−1 . A
generator of U (Z/pZ) is called a primitive root modulo p.
Now let H be any subgroup of G = Gal(Q(ζp )/Q). Define the element
X
αH = σ(ζp ).
σ∈H
h2i
h6i
{1}
With H = h2i = {1, 24} a generator of Fix(Q(ζ7 ), H) is ζ7 + ζ72 + ζ74 . Similarly, a generator of
the subfield Fix(Q(ζ7 ), h6i) is ζ7 + ζ76 . Hence, by the Fundamental Theorem of Galois Theory, the
subfield structure of Q(ζ7 ) is
Q(ζ7 )
Q(ζ7 + ζ7−1 )
Q
11.4. GALOIS GROUPS OF CYCLOTOMIC EXTENSIONS 579
It is not hard to find minimal polynomials for α = ζ7 + ζ72 + ζ74 and β = ζ7 + ζ76 . Note that
However, 1 + ζ7 + ζ72 + · · · + ζ76 = 0 since these roots of unity are the distinct roots of x7 − 1 = 0.
Thus, we notice that α2 = −2 − α. Hence, α is a root of x2 + x + 2.
For β we calculate that
The following example illustrates how the process of defining αH does not necessarily work for
Q(ζn ) where n is not a prime number.
Example 11.4.3. Consider the field Q(ζ9 ). This is Galois over Q with Galois group G = U (Z/9Z) ∼ =
Z6 . We can write G = {σ1 , σ2 , σ4 , σ5 , σ7 , σ8 }, where σa (ζ9 ) = ζ9a . This group has two nontrivial
subgroups H1 = {σ1 , σ4 , σ7 } and H2 = {σ1 , σ8 }.
For either H ≤ G, define αH ∈ Q(ζ9 ) as the sum of the Galois conjugates of ζ9 in Q(ζ9 ), namely
X
αH = σ(ζ9 ).
σ∈H
Since the sum of the primitive 9th roots of unity is 0, it is not hard to show as we did in the
previous example that α2 is a root of x3 − 3x + 1 = 0 and that this polynomial is irreducible. So
(as expected), [Q(α2 ) : Q] = 3. However, we observe that
α12 = ζ92 + ζ98 + ζ95 + 2ζ95 + 2ζ98 + 2ζ92 = 3(ζ92 + ζ95 + ζ98 ) = 3ζ9 α1 .
This implies that α1 solves x(x − 3ζ9 ) = 0. By the triangle inequality |α1 | < 3, while |3ζ9 | = 3.
Hence, α1 = 0. In particular, α1 is fixed by all of G and not just by H1 . Consequently, the element
α1 is not a generator for Fix(Q(ζ9 ), H1 ). On the other hand, by the same reasoning as above,
ζ9 ζ94 ζ97 = ζ93 is in Fix(Q(ζ9 ), H1 ). But ζ93 = ζ3 is not in Q but has degree 2. Hence, we find that
Fix(Q(ζ9 ), H1 ) = Q(ζ3 ). 4
The reason why αH might not be a primitive element of simple extension of Q in Q(ζn ), when
n is composite follows from the fact that, though {ζna | 0 ≤ a ≤ ϕ(n) − 1} is a basis of Q(ζn ), the
set {ζna | a ∈ U (n)} is not a basis of Q(ζn ). For example, with n = 9, the element ζ93 = ζ3 is not in
{ζ9a | a ∈ U (9)}.
Corollary 11.4.4
If n has the prime decomposition n = pa1 1 pa2 2 · · · pakk , then
Gal(Q(ζn )/Q) ∼
= Gal(Q(ζpa1 1 )/Q) × Gal(Q(ζpa2 2 )/Q) × · · · × Gal(Q(ζpak )/Q). k
Corollary 11.4.5
The order of the Galois field extension is [Q(ζn ) : Q] = φ(n), where φ is the Euler totient
function.
Proposition 11.4.6
A regular n-gon is constructible if and only if [Q(ζn ) : Q] = 2k for some nonnegative integer
k. Consequently, a regular n-gon is constructible if and only if φ(n) = 2k .
Proof. The discussion motivating the proposition showed that if a regular n-gon is constructible then
[Q(cos(2π/n)) : Q] is a power of 2, which in turn implies that [Q(ζn ) : Q] is a power of 2. However,
the converse is not at all obvious.
Suppose that [Q(ζn ) : Q] = 2k . Then Gal(Q(ζn )/Q) ∼ = U (n) has an order that is a power of 2.
Since the group U (n) is abelian, the subgroup of Gal(Q(ζn )/Q) corresponding to complex conjugation
σc is a normal subgroup. Hence, by the Fundamental Theorem of Galois Theory, Q(cos(2π/n)) is
Galois over Q and Gal(Q(cos(2π/n))/Q) is a quotient group of Gal(Q(ζn )/Q) and hence is an abelian
group whose order is a power of 2.
By the Fundamental Theorem of Finitely Generated Abelian Groups, an abelian group G whose
order is a power of 2, say 2k , has the form
G∼
= Z2a1 ⊕ Z2a2 ⊕ · · · Z2as .
Since a cyclic group Zm has a subgroup of order d for all d | m, then it is possible to find a subgroup
G2 of order 2k−1 . By induction, there exists a sequence of subgroups
{1} = Gk ≤ Gk−1 ≤ · · · ≤ G1 ≤ G0 = G
so the regular n-gon is constructible with a compass and a straightedge. This establishes the first if
and only if statement.
Finally, the second if and only if statement holds because [Q(ζn ) : Q] = φ(n).
Before we give the (almost) final word on constructible regular polygons, we must introduce the
notion of a Fermat prime.
Definition 11.4.7
A Fermat prime is a prime number of the form 2` + 1.
Proposition 11.4.8
If 2` + 1 is a prime number then ` itself is a prime power, ` = 2r .
The only known Fermat primes are 3, 5, 17, 257, 65537. It is an unsolved problem if there are
any other Fermat primes, let alone an infinite number of Fermat primes. Note that
5
22 + 1 = 4294967297 = 641 × 6700417,
`
which gives a counterexample to the hypothesis that 22 + 1 is prime for all positive integers `.
Theorem 11.4.9
A regular n-gon is constructible with a compass and a straightedge if and only if n is the
product of 2m times a product of distinct Fermat primes.
Proof. Suppose that n has the prime factorization n = pa1 1 pa2 2 · · · pakk . The regular n-gon is con-
structible if and only if φ(n) is a power of 2. But
Suppose that p1 = 2 then any a1 will do because 2a1 − 2a1 −1 = 2a1 −1 . On the other hand, if pi is
an odd prime, then pai i − pai i −1 = pai i −1 (pi − 1) is a possibly a power of 2 if and only if ai = 1 and
pi = 2m + 1 for some positive integer m. Hence, pi must be a Fermat prime and can only occur at
most once in the prime decomposition of n.
The problem of constructibility of regular n-gons compellingly illustrates the deep interconnect-
edness of mathematics. The original problem came out of classical geometry and drew attention
from mathematicians all over the world. Theorem 11.4.9 that answers the problem almost to com-
plete satisfaction uses advanced techniques of algebra, including group theory, field theory, and the
combination of the two that results from Galois theory. Now, a complete solution appears to hide
in the realm of number theory and properties of prime numbers.
Using Figure 11.2, show that η2 < 0 < η1 . Prove also that η1 + η2 = −1 and that η1 η2 = −4.
Deduce that η1 and η2 solve the equation x2 + x − 4 = 0. Using the inequalities for η1 and η2 ,
give explicit formulas (using radicals) for both.
(c) Now call
X a
ε1 = ζ + ζ 4 + ζ 13 + ζ 16 = ζ , ε3 = ζ 3 + ζ 5 + ζ 12 + ζ 14 ,
a∈H2
2 8 9 15
ε2 = ζ + ζ + ζ + ζ , ε4 = ζ 6 + ζ 7 + ζ 10 + ζ 11 .
Using Figure 11.2, show that 0 < ε2 < ε1 . Prove also that ε1 + ε2 = η1 and that ε1 ε2 = −1.
Deduce that ε1 and ε2 solve the equation x2 − η1 x − 1 = 0. Using the inequalities for ε1 and ε2 ,
give explicit formulas (using radicals) for both.
(d) Repeat and change the previous part as needed to find explicit formulas (using radicals) for ε3
and ε4 .
11.5. SYMMETRIES AMONG ROOTS; THE DISCRIMINANT 583
ζ5 ζ4
ζ3
ζ6
ζ2
ζ7
ζ1
8
ζ
ζ9
ζ 16
10
ζ
ζ 15
ζ 11
ζ 14
ζ 12 ζ 13
Using Figure 11.2, show that 0 < γ2 < γ1 . Prove also that γ1 + γ2 = ε1 and that γ1 γ2 = ε3 .
Deduce that γ1 and γ2 solve x2 − ε1 x + ε3 = 0. Using the inequalities for ε1 and ε2 , give explicit
formulas (using radicals) for both.
(f) Conclude the exercise by using the previous part to show that
r !
√ √ √ √ √
q q q
2π 1
cos = −1 + 17 + 2(17 − 17) + 2 17 + 3 17 − 2(17 − 17) − 2 2(17 + 17) .
17 16
(a) Prove that Q(α) is the unique subfield of Q(ζp ) of degree 2 over Q.
(b) Prove that αα = p.
[The element α is called a Gauss sum.]
11.5
Symmetries among Roots; The Discriminant
The beginning of this chapter posed the study of symmetries among roots of polynomials as one of
the main motivations for developing Galois theory. Our approach jumped straight to the study of
automorphisms of fields and their relevance to the motivating problem. In this section, we approach
from a different direction to find the largest possible Galois group of a given polynomial.
584 CHAPTER 11. GALOIS THEORY
Definition 11.5.1
Let F be a field. A multivariable polynomial p(x1 , x2 , . . . , xn ) ∈ F [x1 , x2 , . . . , xn ] is called
symmetric if p(xσ(1) , xσ(2) , . . . , xσ(n) ) = p(x1 , x2 , . . . , xn ) for all σ ∈ Sn .
For example, the polynomial f (x1 , x2 , x3 ) = 2x31 + 2x32 + 2x33 − 7x1 x2 x3 is symmetric in the
variables. The polynomial g(x1 , x2 , x3 ) = x21 x2 + x22 x3 + x23 x1 is not symmetric because, if σ = (1 2)
then
g(xσ(1) , xσ(2) , xσ(3) ) = g(x2 , x1 , x3 ) = x22 x1 + x21 x3 + x23 x2 6= g(x1 , x2 , x3 ).
Definition 11.5.1 extends to rational expressions. With the concept of automorphisms at our
disposal, we observe that a permutation σ of the variables in a rational expression
p(x1 , x2 , . . . , xn )
q(x1 , x2 , . . . , xn )
is an automorphism ωσ ∈ Aut(F (x1 , x2 , . . . , xn )/F ). This gives an embedding of Sn into the auto-
morphism group Aut(F (x1 , x2 , . . . , xn )/F ). We would like to determine and to study properties of
Fix(F (x1 , x2 , . . . , xn ), Sn ).
In the ring F [x1 , x2 , . . . , xn ][x], consider the polynomial
This polynomial possesses n distinct roots, namely the indeterminates x1 , x2 , . . . , xn . Note that
if f (x) ∈ F [x] is any monic polynomial of degree n with roots α1 , α2 , . . . , αn ∈ F listed with
multiplicity, then f (x) is the image of q(x) under the evaluation homomorphism that maps each
indeterminate xi to αi . Expanding the factored expression in (11.4) gives
Y
n n−1
q(x) = x − (x1 + x2 + · · · + xn )x + xi xj xn−2 + · · · + (−1)n (x1 x2 · · · xn ).
1≤i<j≤n
The coefficients of q(x), as polynomials in x1 , x2 , . . . , xn , play an important role. For the following
definition, recall that if U is a set, then Pk (U ) = {A ∈ P(U ) |A| = k}.
Definition 11.5.2
For any k ∈ {1, 2, . . . , n}, the kth elementary symmetric polynomials in x1 , x2 , . . . , xn is
X
sk (x1 , x2 , . . . , xn ) = xa1 xa2 · · · xak ,
A∈Pk ({1,2,...,n})
With the elementary symmetric polynomials, the expression in (11.4) can be written as
n
X
(x − x1 )(x − x2 ) · · · (x − xn ) = (−1)k sk (x1 , x2 , . . . , xn )xn−k . (11.5)
k=0
We see that the polynomials sk (x1 , x2 , . . . , xn ) are symmetric polynomials in two different ways.
First, for all σ ∈ Sn ,
Hence, upon expanding the product on the left-hand side, we deduce that
sk (xσ(1) , xσ(2) , . . . , xσ(n) ) = sk (x1 , x2 , . . . , xn )
for all k and all σ ∈ Sn . For a second way to see that the sk are symmetric, consider the action of
σ on Pk ({1, 2, . . . , m}) via σ · {a1 , a2 , . . . , ak ) = {σ(a1 ), σ(a2 ), . . . , σ(ak )}. Then
X
sk (xσ(1) , xσ(2) , . . . , xσ(n) ) = xσ(a1 ) xσ(a2 ) · · · xσ(ak )
A∈Pk ({1,2,...,n})
X
= xa1 xa2 · · · xak
σ −1 ·A∈Pk ({1,2,...,n})
X
= xa1 xa2 · · · xak
A∈Pk ({1,2,...,n})
= sk (x1 , x2 , . . . , xn ).
As an explicit example, the four elementary symmetric polynomials in x1 , x2 , x3 , x4 are
s1 (x1 , x2 , x3 , x4 ) = x1 + x2 + x3 + x4 ,
s2 (x1 , x2 , x3 , x4 ) = x1 x2 + x1 x3 + x1 x4 + x2 x3 + x2 x4 + x3 x4 ,
s3 (x1 , x2 , x3 , x4 ) = x1 x2 x3 + x1 x2 x4 + x1 x3 x4 + x2 x3 x4 ,
s4 (x1 , x2 , x3 , x4 ) = x1 x2 x3 x4 .
Theorem 11.5.3
Every symmetric rational expression f (x1 , x2 , . . . , xn ) is a rational expression in the ele-
mentary symmetric polynomials as
f (x1 , x2 , . . . , xn ) = g(s1 , s2 , . . . , sn ).
Corollary 11.5.4
Every symmetric polynomial in F [x1 , x2 , . . . , xn ] is a polynomial in the elementary sym-
metric polynomials.
586 CHAPTER 11. GALOIS THEORY
Theorem 11.5.3 along with the above corollary give an interesting application for symmetric
expressions of roots of polynomials.
Let F be a field. If a polynomial p(x) ∈ F [x] splits completely in a field extension K of F , then
for some α1 , α2 , . . . , αn ∈ K and where pn = LC(p(x)) is the leading coefficient of p(x). Then
In particular, the elementary symmetric polynomials applied to the roots of p(x) are in the field F .
Corollary 11.5.4 gives the following proposition.
Proposition 11.5.5
Let p(x) ∈ F [x]. Let α1 , α2 , . . . , αn be the roots of p(x) (counted with multiplicity). Then
any symmetric polynomial of the roots is in F .
Proof. Every symmetric polynomial in the roots of p(x) is of the form r(s1 , s2 , . . . , sn ), with the sk
evaluated at the roots of p(x), where r ∈ F [s1 , s2 , . . . , sn ]. Hence, the evaluation of r on the roots is
in F .
Example 11.5.6. Consider the polynomial p(x) = x3 −3x2 +7x+5. Suppose that the three roots of
p(x) are α1 , α2 , α3 (not necessarily distinct). Without knowing the roots explicitly, we can calculate
α13 + α23 + α33 . We expand
and
α13 + α23 + α33 = s31 − 3(s2 s1 − 3s3 ) − 6s3 = s31 − 3s2 s1 + 3s3 .
From the coefficients of p(x), we see that s1 = 3, s2 = 7 and s3 = −5. Thus, by (11.5),
Though we study multivariable polynomial rings in more depth in Chapter 12, some terminology
is valuable here. In the polynomial ring, F [x1 , x2 , . . . , xn ], a term xa1 1 xa2 2 · · · xann is said to have
total degree a if a = a1 + a2 + · · · + an . A polynomial p(x1 , x2 , . . . , xn ) is called homogeneous of
degree k if it consists entirely of terms of total degree k. We observe that the elementary symmetric
polynomials sk (x1 , x2 , . . . , xn ) are homogeneous of degree k.
11.5. SYMMETRIES AMONG ROOTS; THE DISCRIMINANT 587
g(α1 , α2 , . . . , αn ) = 0.
Since g can have a constant term in F , we can equivalently consider algebraic relations to include
conditions of the form g(α1 , α2 , . . . , αn ) ∈ F .
If p(x) ∈ F [x] is a polynomial with deg p(x) = n, then any automorphism σ ∈ GalF (p(x))
permutes the roots of p(x). This gives the fundamental result.
Proposition 11.5.7
If p(x) ∈ F [x] is a separable polynomial with deg p(x) = n, then GalF (p(x)) ≤ Sn .
This proposition is more precise than Theorem 7.6.3, which simply put the upper bound of n!
on the index of the splitting field of p(x) over F .
Algebraic relations among the roots put conditions on the automorphisms in Gal(p(x)). For
example, suppose that p(x) is the product of two relatively prime irreducible polynomials p1 (x) and
p2 (x) of degree n1 and n2 . Then, any automorphism σ ∈ Gal(p(x)) can only permute the roots
of p1 (x) and permute the roots of p2 (x). Then Gal(p(x)) is contained in a subgroup of Sn that is
isomorphic to Sn1 ⊕ Sn2 . This observation can be restated as follows.
Proposition 11.5.8
A Galois extension K of F is such that Gal(K/F ) is a transitive subgroup of Sn if and only
if K is the splitting field of some irreducible polynomial f (x) ∈ F [x] with deg f (x) = n.
As a particular case of this, suppose that the roots of a separable polynomial p(x) satisfy the
relation that α1 ∈ F . We consider Gal(p(x)) as a subgroup of Sn , the group of bijections on
{α1 , α2 , . . . , αn }, the set of roots of p(x), with σ(αi ) = ασ(i) . The subgroup H = {σ ∈ Sn | σ(1) = 1}
of Sn fixes the element α1 . Furthermore, for all σ ∈ Sn , we have σ(α1 ) = αi if and only if σ ∈ (1 i)H.
As i runs through 1, 2, . . . , n, the subsets (1 i)H run through the cosets of H in Sn , which partition
Sn . Hence, since α1 ∈ F , then σ ∈ Gal(p(x)) if and only if Gal(p(x)) ≤ H. By observing that
H∼ = Sn−1 we can state that Gal(p(x)) is a subgroup of Sn−1 .
The following proposition gives a general principle.
Lemma 11.5.9
Let p(x) be a separable polynomial in F [x] and suppose that the roots {α1 , α2 , . . . , αn }
satisfy g(α1 , α2 , . . . , αn ) ∈ F for some multivariable polynomial g ∈ F [x1 , x2 , . . . , xn ]. In
the action of Sn on F [x1 , x2 , . . . , xn ], let H be the stabilizer of g. Suppose that
Proof. The cosets of H partition Sn . Since the polynomial g(x1 , x2 , . . . , xn ) is fixed by any σ ∈ H,
then
g(α1 , α2 , . . . , αn ) = g(ασ(1) , ασ(2) , . . . , ασ(n) )
588 CHAPTER 11. GALOIS THEORY
for all σ ∈ H. The condition (11.7) that representatives from different cosets of H give different
values when applied to g(α1 , α2 , . . . , αn ) imply that σ fixes g(α1 , α2 , . . . , αn ) if and only if σ ∈ H.
Since all the automorphisms in Gal(p(x)) fix F , the condition g(α1 , α2 , . . . , αn ) ∈ F implies that
Gal(p(x)) ≤ H.
Example 11.5.10. Suppose that a separable polynomial p(x) ∈ F [x] has deg p(x) = 4 and suppose
that the roots α1 , α2 , α3 , α4 satisfy the relation
α1 α2 + α3 α4 ∈ F.
This algebraic relation puts a condition on automorphisms σ ∈ GalF (p(x)). It is not hard to see
that the subgroup
H = h(1 3 2 4), (1 2)i ≤ S4
leaves the polynomial x1 x2 +x3 x4 fixed. Conversely, σ ∈ S4 −H, then σ ·x1 x2 +x3 x4 is either x1 x3 +
x2 x4 or x1 x4 + x2 x3 . It is an easy algebra exercise to show that x1 x2 + x3 x4 = x1 x3 + x2 x4 implies
(x1 − x4 )(x2 − x3 ) = 0. Since p(x) is separable, these three polynomials applied to α1 , α2 , α3 , α4
give distinct values. It is easy to calculate that
α1 α2 + α3 α4 if σ ∈ h(1 3 2 4), (1 2)i
σ(α1 α2 + α3 α4 ) = α1 α3 + α2 α4 if σ ∈ (2 3)h(1 3 2 4), (1 2)i
α1 α4 + α2 α3 if σ ∈ (2 4)h(1 3 2 4), (1 2)i.
Each of the above three cases is distinct, corresponding to the three cosets of h(1 3 2 4), (1 2)i. Con-
sequently, σ ∈ S4 fixes β if and only if σ ∈ h(1 3 2 4), (1 2)i. This illustrates Lemma 11.5.9 and we
deduce that Gal(p(x)) ≤ h(1 3 2 4), (1 2)i. It is easy to check that this subgroup of S4 is isomorphic
to D4 . 4
Example 11.5.11. As a nonexample, suppose that p(x) ∈ F [x] is a separable polynomial and
suppose that that the product of two of the roots (without loss of generality α1 and α2 ) is in F . Let
H = {σ ∈ Sn | σ · x1 x2 = x1 x2 }. This subgroup has order 2(n − 2)! and is isomorphic to S2 ⊕ Sn−2 .
For various σ ∈ Sn , the product ασ(1) ασ(2) can take on up to n2 values. However, even if p(x) is
separable, there is no guarantee that αi αj 6= α1 α2 for {i, j} =
6 {1, 2}. Hence, we cannot deduce that
Gal(p(x)) ≤ H. 4
Also recall from Section 3.4.3 that σ ∈ Sn maps a pair (i, j) ∈ Tn to a pair (σ(i), σ(j)), which is
either another pair in Tn or another pair in Tn but with the entries inverted. By Definition 3.4.10,
the number of times the entries are inverted is called the number of inversions inv(σ). Hence,
Y Y Y
(xσ(i) − xσ(j) ) = (−1)inv(σ) (xi − xj ) = sign(σ) (xi − xj ).
1≤i<j≤n 1≤i<j≤n 1≤i<j≤n
Definition 11.5.12
Let p(x) = pn xn + · · · + p1 x + p0 ∈ F [x] with roots α1 , α2 , . . . , αn (listed with multiplicity)
in some field extension K of F . The element in F defined by
Y
∆(p) = p2n−2
n (αi − αj )2 ,
1≤i<j≤n
The term pn2n−2 may seem superfluous but we will explain its value as we present more properties
of the discriminant.
Proposition 11.5.13
A polynomial p(x) ∈ F [x] has a double root if and only if ∆(p) = 0.
(αi −αj )2 = 0.
Q
Proof. If a polynomial p(x) has degree n, then pn 6= 0. Hence, ∆(p) = 0 if and only if
In turn, this is equivalent to αi = αj for some pair (i, j).
This proposition is particularly interesting because it puts an equation on the set of polynomials
with a double root. In other words, the equation ∆ = 0, which is expressed as an equation in the
coefficients {a0 , a1 , . . . , an−1 } of the polynomial p(x) = xn + an−1 xn−1 + · · · + a1 x + a0 , gives a locus
(a “hypersurface” in Rn ) of the polynomials with double roots.
Example 11.5.14. Consider the general quadratic ax2 + bx + c. This has two roots α1 and α2 not
necessarily distinct. According to Definition 11.5.12, the discriminant is a2 (α1 − α2 )2 . However,
Proposition 11.5.15
√ p(x) ∈ F [x] be a separable polynomial with discriminant ∆. Then ∆ ∈ F . Furthermore,
Let
∆ ∈ F if and only if GalF (p(x)) is a subgroup of An .
Proof. Since ∆ is a symmetric function in the roots of p(x), then by Proposition 11.5.5, ∆ ∈ F . Call
K the splitting field of p(x) over F .√
Up to multiplication by a unit, ∆ is pn−1n times the Vandermonde polynomial on√the roots of
p(x). The subgroup of Sn that preserves the Vandermonde polynomial is An . Since σ ∆ takes on
distinct values corresponding to distinct cosets of An , by Lemma 11.5.9, Gal(p(x)) ≤ An .
for any q(x) ∈ F [x]. In particular, the remainder r(x) of the polynomial division of b(x) by f (x)
is such that there exists a polynomial r2 (x) satisfying r(x)a(x) = r2 (x)b(x). Either r(x) = 0 or
deg r(x) < deg f (x). However, by the minimality of the degree of f (x), we deduce that r(x) = 0
and hence that f (x) divides b(x). Then d(x) = b(x)/f (x) has degree 1 or greater. Then
b(x)
a(x) = g(x)b(x) =⇒ b(x)a(x) = d(x)g(x)b(x) =⇒ a(x) = g(x)d(x).
d(x)
Thus, d(x) is a common divisor of a(x) and b(x) of degree at least 1, and hence a(x) and b(x) are
not relatively prime. We have proven the following result.
Proposition 11.5.16
Two polynomials a(x) and b(x) in F [x] are not relatively prime if and only if there exist
nonzero polynomials f (x) and g(x) satisfying deg f (x) ≤ n − 1, deg g(x) ≤ m − 1, and
f (x)a(x) = g(x)b(x).
Interestingly enough, given two polynomials a(x) and b(x), the problem of finding the coefficients
of two polynomials f (x) and g(x) described in Proposition 11.5.16 is a linear algebra problem.
Writing f (x) = fn−1 xn−1 + · · · + f1 x + f0 and g(x) = gm−1 xm−1 + · · · + g1 x + g0 , the polynomial
equation f (x)a(x) − g(x)b(x) = 0 is
All the coefficients of powers of x must be 0. Consider the system of m + n equations corresponding
to the powers xm+n−1 , xm+n−2 , . . . , 1 in the m + n variables fn−1 , . . . , f1 , f0 , −gm−1 , . . . , −g1 , −g0 .
The coefficient matrix of this system is
n times m times
am bn
am−1 am bn−1 bn
.. ..
. am−1 am . bn−1 bn
.. .. .. ..
a0
. am−1 . b0 . bn−1 . .
(11.8)
.. .. .. ..
. .
a0 . am b0 . bn
a0 am−1 b0 bn−1
.. .. .. ..
. . . .
a0 b0
Definition 11.5.17
The resultant of two polynomials a(x) and b(x) of degree m and n respectively, written
R(a, b), is the determinant of the matrix in (11.8).
11.5. SYMMETRIES AMONG ROOTS; THE DISCRIMINANT 591
2 0 5 0
3 2 −1 5
R(a, b) = = 120.
1 3 2 −1
0 1 0 2 4
Since the polynomial equation f (x)a(x) − g(x)b(x) = 0 leads to a homogeneous equation in the
coefficients of f (x) and g(x), then there is always the trivial solution. Furthermore, the system has
a nontrivial solution if and only if R(a, b) = 0. We have proven the following proposition.
Proposition 11.5.19
Two polynomials a(x), b(x) ∈ F [x] have a common root (possibly in a field extension of F )
if and only if R(a, b) = 0.
We now return to the study of the discriminant and apply the resultant techniques to a polynomial
and its derivative. Proposition 11.5.19 gives the following corollary.
Corollary 11.5.20
Let p(x) ∈ F [x]. The following are equivalent:
(1) p(x) is not separable;
(2) ∆(p) = 0;
(3) R(p, Dx (p)) = 0, where Dx (p) is the derivative of p(x) with respect to x.
Both the discriminant of p(x) and the resultant R(p, Dx (p)) are multivariable polynomials in
the coefficients of p(x). The last two equivalent conditions of the corollary show that, with the
assumption pn 6= 0, one of these multivariable polynomials is 0 if and only if the other one is 0. In
fact, in the exercises we will show the following more efficient way of calculating the discriminant.
Proposition 11.5.21
Let p(x) = pn xn + · · · + p1 x + p0 ∈ F [x] be a polynomial. Then
1
∆(p) = (−1)n(n−1)/2 R(p, Dx (p)). (11.9)
pn
Relationship (11.9) offers a motivation for the factor of p2n−2 n in the definition of the discrimi-
nant. From the resultant matrix (11.8) for R(p, Dx (p)), we see that when performing the Laplace
expansion to calculate the determinant, R(p, Dx (p)) is a homogeneous polynomial in the coefficients
p0 , p1 , p2 , . . . , pn . Furthermore, the top row of the matrix (11.8) for R(p, Dx (p)) has only two nonzero
entries, namely pn and npn . Then by the linearity property of the determinant, R(p, Dx (p)) is di-
visible by pn . So, the importance of the p2n−2 n factor in Definition 11.5.12 is encapsulated in the
following proposition.
Proposition 11.5.22
For a generic polynomial p(x) = pn xn + · · · + p1 x + p0 , the discriminant is a homogeneous
polynomial in the coefficients p0 , p1 , . . . , pn of degree 2n − 2.
Maple Function
discrim(a,x); Implements (11.9) to calculate the discriminant of the poly-
nomial a with variable x.
resultant(a,b,x); Calculates the resultant of a(x) and b(x).
convert( p, ’elsymfun’ ); Converts the symmetric polynomial into an expression in
the elementary symmetric polynomials in the relevant vari-
ables.
4. Let α1 , α2 , α3 be the roots of x3 − 3x2 + 4x + 1 ∈ Q[x]. Find the value of α13 + α23 + α33 in Q.
5. Let α1 , α2 , α3 be the roots of 2x3 + 3x2 + 4x − 1 ∈ Q[x]. Find the value of α12 α22 + α12 α32 + α22 α32 in Q.
6. Consider the usual action of Sn on F [x1 , x2 , . . . , xn ] defined by permuting the variables according to
σ. Define Hp as the stabilizer in Sn of p, namely Hp = {σ ∈ Sn | σ · p = p}. Prove that both
X Y
τ ·p and τ ·p
τ τ
are symmetric polynomials, where both the sum and the product run over a distinct set of coset
representatives of Hp in Sn .
7. Consider the action of Sn on F [x1 , x2 , . . . , xn ] as described in the previous exercise. Set n = 4 and
consider p(x1 , x2 , x3 , x4 ) = x1 x2 + x3 x4 .
(a) Calculate the stabilizer of p and determine its isomorphism type.
(b) Deduce that Q(x1 , x2 , x3 , x4 ) = (x1 x2 + x3 x4 )(x1 x3 + x2 x4 )(x1 x4 + x2 x3 ) is a symmetric poly-
nomial.
(c) Prove that Q(x1 , x2 , x3 x4 ) = s21 s4 + s23 − 4s2 s4 .
(d) Suppose that f (x) = x4 − 3x3 + 2x + 5 ∈ Q[x]. Calculate Q(α1 , α2 , α3 , α4 ), where αi are the four
roots (possibly listed with multiplicity) of f (x).
8. This exercise finds a cyclic extension of degree 3 over Q.
1
(a) Prove that the function f : C − {0, 1} → C − {0, 1} defined by f (x) = 1−x has order 3.
(b) Suppose that the roots of a polynomial are α, f (α), f (f (α)). Find s1 , s2 , and s3 of these roots.
(c) Deduce that for all q ∈ Q, a polynomial of the form x3 − qx2 + (q − 3)x + 1 has three distinct
real roots and that the splitting field is a cyclic extension of Q of degree 3.
9. Calculate the resultant of the following pairs of polynomials:
(a) a(x) = 5x2 + 4x − 3 and b(x) = x2 + x + 3;
(b) a(x) = x2 − 3x + 2 and b(x) = 2x3 − 3x2 + 2x − 1.
10. Let p(x) be an arbitrary polynomial. Prove that R(p(x), x − α) = (−1)n p(α), where n = deg p.
11. Let p(x) = pn xn + · · · + p1 x + p0 be a polynomial in F [x]. Suppose that listed with multiplicity the
roots of p(x) in it splitting field are α1 , α2 , . . . , αn . Writing p(x) = pn (x − α1 ) · · · (x − αn ), prove that
n
Y Y
p0 (αi ) = (−1)n(n−1)/2 pn
n (αi − αj )2 .
i=1 1≤i<j≤n
11.6. COMPUTING GALOIS GROUPS OF POLYNOMIALS 593
12. Consider the polynomial ring F (x1 , x2 , . . . , xm , y1 , y2 , . . . , yn )[t], consider the polynomials
(a) Show that R(a, b) is An B m multiplied by a polynomial expression that is symmetric in the
symbols x1 , x2 , . . . , xm and symmetric in y1 , y2 , . . . , yn .
(b) Show that R(a, b) is a polynomial that is homogeneous of degree mn in the variables
x1 , x2 , . . . , xm , y1 , y2 , . . . , yn .
(c) Since R(a, b) = 0 whenever a(t) and b(t) have a common root, show that R(a, b) is divisible by
every polynomial xi − yj for 1 ≤ i ≤ m and 1 ≤ j ≤ n and deduce that
m Y
Y n
R(a, b) = An B m (xi − yj ).
i=1 j=1
13. Apply the previous two exercises to the situation a(x) = p(x) and b(x) = p0 (x) to deduce Proposi-
tion 11.5.21
14. Calculate the discriminant of x3 + 3x2 − 7 ∈ Q[x] using Proposition 11.5.21.
15. Calculate the discriminant of x4 + 2x + 1 ∈ Q[x] using Proposition 11.5.21.
16. Prove that the discriminant of xn + a ∈ Q[x] is (−1)n(n−1)/2 nn an−1 .
17. Prove that the discriminant of xn + cx + d is (−1)n(n−1)/2 nn dn−1 + (−1)(n−1)(n−2)/2 (n − 1)n−1 cn .
18. Use (11.9) to prove that the discriminant of the general cubic p(x) = ax3 + bx2 + cx + d is
19. Let p be an odd prime. Use Exercises 11.4.11 and 11.5.16 to prove that the discriminant of the
cyclotomic polynomial Φp (x) is ∆(Φp ) = (−1)(p−1)/2 pp−2 .
n
Y i
20. Let p(x) = pn xn + · · · + p1 x + p0 be a generic polynomial. Prove that if a term C pkk appears in
k=0
the discriminant ∆(p), then the powers ik satisfy both of the following conditions:
n
X n
X
ik = 2n − 2 and kik = n(n − 1).
k=0 k=0
[Hint: Consider homogeneity in the coefficients of p(x) and homogeneity in the roots of p(x).]
11.6
Computing Galois Groups of Polynomials
We are now in a position to start a systemic study of the Galois groups of polynomials. Without
loss of generality, in this section we work with monic polynomials.
594 CHAPTER 11. GALOIS THEORY
By Proposition 11.3.2,
Therefore, Gal(K1 K2 /F ) is a group that contains Gal(K1 /K1 ∩ K2 ) ⊕ Gal(K2 /K1 ∩ K2 ) as a normal
subgroup such that the quotient group thereof is Gal(K1 ∩ K2 /F ).
Example 11.6.1. As a relatively simple example, consider√ the polynomial√ p(x) = (x3 − 2)(x3 − 3).
3 3
The splitting field of p(x) is K1 K2 , where K1 = Q( 2, ζ3 ) and K2 = Q( 3, ζ3 ). It is not hard to
determine that K1 is not a subfield of K2 and vice versa. Consequently, by field degree considerations,
we deduce that √ K1 ∩ K√ ∼
2 = Q(ζ3 ). Now Gal(K1 /K1 ∩ K2 ) = Z3 , generated by the automorphism σ
that satisfies σ( √2) = √2ζ3 . Similarly, Gal(K2 /K1 ∩ K2 ) ∼
3 3
= Z3 , generated by the automorphism τ
that satisfies τ ( 3 3) = 3 3ζ3 . Finally, Gal(K1 ∩ K2 /Q) ∼
= Z2 generated by complex conjugation ρ,
which satisfies ρ(ζ3 ) = ζ3 = 1/ζ3 . We can give a presentation of Gal(K1 K2 /Q) as
Gal(K1 K2 /Q) ∼
= hρ, σ, τ | ρ2 = σ 3 = τ 3 = 1, στ = τ σ, ρσ = σ −1 ρ, ρτ = τ −1 ρi. 4
Proposition 11.6.2
Let f (x) ∈ F [x] be an irreducible cubic, where char F 6= 2. Two mutually exclusive cases
occur:
(1) If ∆(f ) ∈ F , then Gal(f (x)) ∼
p
= Z3 .
Example 11.6.3. Consider the polynomial f (x) = x3 −2x2 +x−1. Using the strategy in Section 7.3,
replace x with y + 32 , so f (y + 23 ) = y 3 − 13 y − 27
25
. The discriminant of the polynomial is
2 3
25 1 621
∆ = −27 − −4 − =− = −23.
27 3 27
Corollary 11.6.4
Let f (x) be an irreducible cubic in F [x] and let β be a root of f (x) in the splitting field of
f (x) over F . Then f (x) splits completely in F (β) if and only if ∆(f ) is a square in F .
Proposition 11.6.5
The elements β1 , β2 , β3 are the roots of the resolvent cubic θf (t).
Proposition 11.6.6
If θf (t) is the resolvent cubic of a quartic polynomial f , then ∆(θf ) = ∆(f ). In particular,
θf (t) is separable if and only if f (x) is separable.
Proof. Since all the roots of g(y) differ from the roots of f (x) by a fixed constant, then ∆(f ) = ∆(g).
Now
∆(θf ) = (β1 − β2 )(β1 − β3 )(β2 − β3 ).
We also have
β1 − β2 = α1 α2 + α3 α4 − α1 α3 − α2 α4 = (α1 − α4 )(α2 − α3 ),
β1 − β3 = α1 α2 + α3 α4 − α1 α4 − α2 α3 = (α1 − α3 )(α2 − α4 ),
β2 − β3 = α1 α3 + α2 α4 − α1 α4 − α2 α3 = (α1 − α2 )(α3 − α4 ).
596 CHAPTER 11. GALOIS THEORY
We deduce that Y Y
(βi − βj )2 = (αi − αj )2
1≤i<j≤3 1≤i<j≤4
and the proposition follows. The concluding remark follows from Corollary 11.5.20.
The Galois group Gal(f (x)) is a subgroup of S4 . A renumbering of the roots of f (x) corresponds
to conjugation by the permutation that defines the renumbering. The labeling of the roots is not
intrinsically important. Consequently, we do not care as much about the specific subgroup of S4 that
is equal to Gal(f (x)) as we care about the isomorphism type of Gal(f (x)). Indeed, two conjugate
subgroups in a group are isomorphic. So we only need to compute Gal(f (x)) up to conjugacy in S4 .
Theorem 11.6.7
Let F be a field with char F 6= 2 and let f (x) = x4 + ax3 + bx2 + cx + d ∈ F [x] be a monic
irreducible quartic polynomial.
Proof. We know that since f (x) is a quartic we will view Gal(f (x)) as a subgroup of S4 , where S4
acts on the set of roots of f (x). Call K the splitting field of f (x) over F .
For part (1), let α1 be a root of f (x) and, by Proposition 11.6.5, let β1 be a root of θf (t).
Obviously, α1 and β1 are in K. Since f (x) is irreducible, then [F (α1 ) : F ] = 4. Since θf (t) is
irreducible, then [F (β1 ) : F ] = 3. Thus, [K : F ] is divisible by both 3 and 4. Hence, [K : F ] is
divisible by 12. Now A4 is the only subgroup of S4 of index 2. (If there were another subgroup H
of index 2, then A4 ∩ H would be a subgroup of A4 of order 6. In Example 3.6.7, the lattice of A4
shows there is no such subgroup.) Part (1) follows by Proposition 11.5.15.
For parts (2) and (3), suppose that θf (t) has a root β in F . Without loss of generality (by
relabeling the roots of f (x) if necessary), suppose that β = β1 = α1 α2 + α3 α4 . In Example 11.5.10,
we saw that the condition β ∈ F implies that Gal(f (x)) ≤ h(1 3 2 4), (1 2)i. Since f (x) is irreducible,
we know that 4 = [F (α1 ) : F ] divides | Gal(f (x))| = [K : F ]. Thus, Gal(f (x)) is a subgroup of
h(1 3 2 4), (1 2)i of order 4 or 8. It is not hard to show that the possibilities are
h(1 2), (3 4)i, h(1 2)(3 4), (1 3)(2 4)i, h(1 3 2 4)i, h(1 3 2 4), (1 2)i. (11.10)
However, by Proposition 11.5.8, Gal(f (x)) must be a transitive subgroup of S4 . This rules out the
first of the four possibilities, leaving only the latter three.
We deal first with case (3). By Corollary 11.6.4, the cubic θf splits completely over F if and
only if ∆(f ) = ∆(θf ) is a square in F . Since the discriminant of f is a square, then Gal(f (x)) ≤ A4 .
By Proposition 11.5.15, Gal(f (x)) ≤ A4 . The only subgroup listed in (11.10) that is in A4 is
h(1 2)(3 4), (1 3)(2 4)i, which is isomorphic to Z2 ⊕ Z2 . p
Finally, we address part (2). By Corollary 11.6.4, this time we deduce that ∆(f ) ∈/ F,
which also implies by Proposition 11.5.15, that Gal(f (x)) A4 . Consequently, Gal(f (x)) is ei-
ther h(1 3 2 4), (1 2)i, or h(1 3 2 4)i. We proceed to distinguish between these two cases.
11.6. COMPUTING GALOIS GROUPS OF POLYNOMIALS 597
(1 3 2 4) · (α1 + α2 − α3 − α4 ) = −(α1 + α2 − α3 − α4 ).
p p p
Thus, (1 3 2 4)· ∆(f )(4β + a2 − 4b) = ∆(f )(4β + a2 − 4b) and therefore, ∆(f )(4β + a2 − 4b) ∈
F . As for the transposition (1 2), since it is odd, it is easy to see that
p p
(1 2) · ∆(f )(α1 + α2 − α3 − α4 ) = − ∆(f )(α1 + α2 − α3 − α4 ).
Assume both of the following relations hold simultaneously among the roots
(
α1 + α2 − α3 − α4 = 0
α1 α2 − α3 α4 = 0.
However,
p p p
(1 2) · ∆(f )(β 2 − 4d) = (1 2) · ∆(f )(α1 α2 − α3 α4 ) = − ∆(f )(β 2 − 4d).
p
So (1 2) ∈ Gal(f (x)) if and only if Gal(f (x)) = h(1 3 2 4), (1 2)i if and only if ∆(f )(β 2 − 4d) ∈
/ F.
This covers all the cases and completes the proof.
Example 11.6.8. Consider the polynomial f (x) = x4 + 2x3 + 2x2 + 2 in Q[x]. By Eisenstein’s
Criterion, this polynomial is irreducible. First we effect a shift x = y − 21 and get
1 1 37
g(y) = f y − = y4 + y2 − y + .
2 2 16
Proceeding according to Ferrari’s method, the resolvent of g(y) is
1 37 29
θf (t) = t3 − t2 − t + .
2 4 8
In Theorem 11.6.7, the first thing we need to test is whether θf (t) is irreducible. Since it is a cubic,
we simply need to test whether it has a root in Q. By the Rational Root Theorem, the only possible
rational roots are of the form
divisor of 29
± .
divisor of 8
598 CHAPTER 11. GALOIS THEORY
This gives us 16 possibilities. Testing them all shows that none of these possibilities is a root. As a
cubic, since θf (t) has no root in Q, it is irreducible. We are in part (1) of the Theorem 11.6.7. Using
the general formula for the discriminant of the cubic calculated in Exercise 11.5.18, we calculate
that
∆(f ) = ∆(θf ) = 3136 = 562 .
Hence, ∆(f ) ∈ Q so by Theorem 11.6.7, Gal(f (x)) ∼
p
= A4 . 4
where r is a primitive root modulo p, i.e., a generator of U (Fp ). It is not too hard to check that this
group has the presentation
More precisely, the Galois group of xp − a is the holomorph (see Exercise 9.3.17) of Zp . Equivalent
notations are
Gal(f (x)) = Fp o U (Fp ) = Zp o Aut(Zp ) = Hol(Zp ).
Since the roots of f (x) are distinct, then su (y) is also separable.
We consider the action of two different groups on E[u1 , u2 , . . . , un , y], the automorphism group
Gal(E/F ), which acts on the roots α1 , α2 , . . . , αn , and the symmetric group Sn acting by permuting
the variable {u1 , u2 , . . . , un }. We investigate how these two actions interact.
Let ρ ∈ G = Gal(E/F ). Then ρ permutes the roots of f (x) so that ρ(αi ) = αψ(ρ)(i) , where
ψ : G → Sn is a homomorphic embedding of G into Sn . Thus,
Y
ρ · su (y) = y − (u1 ρ(ασ(1) ) + u2 ρ(ασ(2) ) + · · · + un ρ(ασ(n) ))
σ∈Sn
Y
= (y − (u1 αψ(ρ)(σ(1)) + u2 αψ(ρ)(σ(2)) + · · · + un αψ(ρ)(σ(n)) )).
σ∈Sn
11.6. COMPUTING GALOIS GROUPS OF POLYNOMIALS 599
Since the product defining su (y) runs over all permutations in Sn , we have the equality {ψ(ρ)σ | σ ∈
Sn } = Sn for all ρ ∈ Sn . Hence, ρ · su (y) = su (y) and so all the coefficients of su (y) are fixed by
every ρ ∈ Gal(E/F ) and hence are in F . Thus, su (y) ∈ F [u1 , u2 , . . . , un , y].
Now consider the Sn action. Any τ ∈ Sn , on the y-linear terms of su (y) as
τ · (u1 ασ(1) + u2 ασ(2) + · · · + un ασ(n) ) = uτ (1) ασ(1) + uτ (2) ασ(2) + · · · + uτ (n) ασ(n)
= u1 ασ(τ −1 (1)) + u2 ασ(τ −1 (2)) + · · · + un ασ(τ −1 (n)) .
The new polynomial τ · h(y) is also irreducible in F [u1 , u2 , . . . , un ][y] because if it were not then
h(y) = τ −1 · (τ · h(y)) would be reducible. However, τ · h(y) consists of a product of linear terms
that appear in the product for su (y), so τ · h(y) is an irreducible factor of su (y). Since Sn acts
transitively on the linear terms of su (y), we have proven the following useful lemma.
Lemma 11.6.9
The orbit of h(y) under this action of Sn on F [u1 , u2 , . . . , un ][y] consists precisely of the
irreducible factors of su (y).
Galois resolvents in Kronecker’s analysis gives the following formula for the Galois group of a
polynomial.
Theorem 11.6.10
Let f (x) ∈ F [x] be monic and separable of degree n. Let su (y) and h(y) be as above. Then
GalF (f (x)) is isomorphic to the stabilizer Gh of h(y) under the action of Sn permuting
u1 , u2 , . . . , un . In other words,
G = GalF (f (x)) ∼
= {τ ∈ Sn | τ · h(y) = h(y)} = Gh .
Proof. Let h(y) be any irreducible factor of su (y) and let ω ∈ Sn be a permutation such that
where H is a subset of Sn that contains 1. Suppose that τ ∈ ω −1 Hω, say with τ = ω −1 σ0 ω. Then
by the above discussion, σ0 · h(y) is another irreducible factor of su (y). However, because σ0 ∈ H,
the y-linear term indexed by σ0 ω is a factor of h(y) in E[u1 , u2 , . . . , un , y], whereas the y-linear term
600 CHAPTER 11. GALOIS THEORY
indexed by σ0 ωτ −1 = σ0 ω(ω −1 σ0 ω)−1 = ω is a factor of τ · h(y). However, the y-linear term index
by ω is a factor of h(y). Hence, τ · h(y) = h(y). This shows that ω −1 Hω ⊆ Gh .
Conversely, suppose that τ ∈ Gh . Then h(y) is a product of the y-linear term indexed by σωτ −1
with σ ∈ H. However, one of these these y-linear factors corresponds to the permutation ω. Hence,
there exists σ0 ∈ H such that σ0 ωτ −1 = ω. Then τ = ω −1 σ0 ω, which shows that Gh ⊆ ω −1 Hω.
Thus, H = ωGh ω −1 . In particular, H is a subgroup of Sn , not merely a subset.
Now consider the polynomial
Y
h̃(y) = (y − (u1 ρ(αω(1) ) + u2 ρ(αω(2) ) + · · · + un ρ(αω(n) )))
ρ∈G
Y (11.13)
= (y − (u1 αψ(ρ)(ω(1)) + u2 αψ(ρ)(ω(2)) + · · · + un αψ(ρ)(ω(n)) )).
ρ∈G
The action of G on h̃(y) simply permutes the linear factors in the product expression so h̃(y) is fixed
by the Galois group G and hence h̃(y) ∈ F [u1 , u2 , . . . , un , y]. The term in (11.11) corresponds to
ρ = 1 ∈ G so this term divides both h(y) and h̃(y) in E[u1 , u2 , . . . , un , y]. Since ρ · h(y) = h(y), then
all the linear factors of h̃(y) divide h(y) so h̃(y) divides h(y) in E[u1 , u2 , . . . , un ][y]. This means that
h(y) = h̃(y)q(y) in E[u1 , u2 , . . . , un , y]. However, for all ρ ∈ G,
Thus, ρ · q(y) = q(y) for all ρ ∈ G, and hence q(y) ∈ F [u1 , u2 , . . . , un ][y], which means that h̃(y)
divides h(y) in F [u1 , u2 , . . . , un ][y]. Since h(y) is irreducible, we deduce that h(y) = h̃(y).
Identifying (11.13) with (11.12) we deduce that ψ(G) = H = ωGh ω −1 . Since ψ(G) is conjugate
to Gh in Sn , then ψ(G) is isomorphic to Gh .
Example 11.6.11. Consider the polynomial f (x) = x3 + 2x2 − x − 1 ∈ Q[x]. By the Rational Root
Theorem, f (x) has no roots and since it is a cubic f (x) is irreducible. Let s1 , s2 , and s3 be the
elementary symmetric functions evaluated on the roots α1 , α2 , α3 of f (x). Numerically s1 = −2,
s2 = −1, and s3 = 1.
Calculating the Galois resolvent su (y) by hand is particularly onerous. Expanding it as a poly-
nomial expression in y, u1 , u2 , u3 , α1 , α2 , α3 produces a multivariable polynomial with 4096 terms
before collecting like terms. However, computer algebra systems can simplify the work. In Chap-
ter 12, we discuss powerful computation techniques to work with ideals in multivariable polynomial
rings. Furthermore, computer algebra systems combine a number of algorithms to be able to factor
multivariable polynomials. Using these techniques, we can show that
su (y) = (y 3 + (2u1 + 2u2 + 2u3 )y 2 + (5u1 u2 + 5u1 u3 + 5u2 u3 − u21 − u22 − u23 )y
+ (8u1 u2 u3 − u31 − u32 − u33 − 3u1 u22 − 3u2 u23 − 3u3 u21 + 4u21 u2 + 4u22 u3 + 4u23 u1 ))
× (y 3 + (2u1 + 2u2 + 2u3 )y 2 + (5u1 u2 + 5u1 u3 + 5u2 u3 − u21 − u22 − u23 )y
+ (8u1 u2 u3 − u31 − u32 − u33 + 4u1 u22 + 4u2 u23 + 4u3 u21 − 3u21 u2 − 3u22 u3 − 3u23 u1 )).
Choose h(y) as the first factor in the above product. The coefficients for y 2 and y are symmetric
in u1 , u2 , u3 . However, the last coefficient is not. Instead, it is stabilized by the cyclic subgroup
h(1 2 3)i. By Theorem 11.6.10, GalQ (f (x)) = Z3 = A3 . 4
Obviously, it is easier to find this result by calculating the discriminant ∆(f ) = 49, which implies
the same result by Proposition 11.6.2. However, the method of Kronecker analysis lends it better to
algorithmic methods to compute the Galois group of a polynomial. In fact, some computer algebra
systems implement this method and can calculate the Galois group of polynomials up to degree 9
or more. (In Maple, the command is simply galois. See the help files on how to use it.)
11.6. COMPUTING GALOIS GROUPS OF POLYNOMIALS 601
11.7
Fields of Finite Characteristic
11.7.1 – Galois Groups of Finite Fields
The characterization of finite fields given in Section 7.7.2 allows us to calculate the group of auto-
morphisms of finite extensions of finite fields. In fact, these turn out to be particularly easy.
Recall that if F is a finite field, |F | = pn for some prime and some positive integer n. Such a
field has Fp as its prime subfield with [F : Fp ] = n. Furthermore, for each prime p and each positive
integer n, there exists a unique (up to isomorphism) field F of order pn , denoted Fpn , namely the
n
spitting field of xp − x ∈ Fp [x]. Consequently, F is a Galois extension of Fp .
If f (x) ∈ Fp [x] is an irreducible polynomial of degree n, then Fp [x]/(f (x)) is an extension of Fp
of degree n. By the uniqueness of finite fields of a given order, Fp [x]/(f (x)) ∼
= Fpn . We deduce that
n
f (x) splits completely in Fpn and f (x) must divide xp − x.
The Frobenius automorphism σp : Fpn → Fpn defined by σp (α) = αp fixes Fp . Therefore,
σp ∈ Gal(Fpn /Fp ). However, a stronger result holds.
Proposition 11.7.1
The Galois group Gal(Fpn /Fp ) is the cyclic group of order n and generated by the Frobenius
automorphism σp .
Proof. We know that | Gal(Fpn /Fp )| = [Fpn : Fp ] = n so the order of σp divides n. Assume that
σpd = id for some strict divisor d of n. Then for all α ∈ Fpn ,
d d
σpd (α) = α ⇐⇒ αp = α ⇐⇒ αp − α = 0.
This condition would mean that every α ∈ Fpd but this is a strict subfield of Fpn so we have arrived
at a contradiction. Consequently, we conclude that σp has order n and thus Gal(Fpn /Fp ) is generated
by σp . The theorem follows.
Since Gal(Fpn /Fp ) is abelian, every subgroup is a normal subgroup. By the Galois correspon-
dence, every field extension K of Fp in Fpn is Galois. After all, such a K must be K ∼ = Fpd , where
d | n. All subgroups of Gal(Fpn /Fp ) are cyclic of the form hσpd i for some divisor d of n. Under the
Galois correspondence, the subgroup hσpd i of Gal(Fpn /Fp ) corresponds to the subfield
Thus,
Gal(Fpn /Fpd ) = hσpd i ∼
= Zn/d .
Note that n/d = [Fpn : Fpd ]. This proves the following generalization of Proposition 11.7.1.
Proposition 11.7.2
Let F be any finite field of order q and let K be an extension of F of degree m. Then K/F
is a Galois extension with Gal(K/F ) ∼ = Zm .
Definition 11.7.3
Let F be any field not necessarily finite. The splitting field of xn − 1 over F is the nth
cyclotomic field over F and is denoted F (n) . The roots of xn − 1 in F (n) are called the nth
roots of unity and we denote this subset by µn .
Proposition 11.7.4
Let F be a field of characteristic p. Suppose that n = pk m where p - m. Then µn = µm
and is a cyclic group of order m with the operation being multiplication in F (n) .
Proof. First suppose that k = 0 so that p - n. Then xn − 1 and its derivative Dx (xn − 1) = nxn−1
have no common roots. Hence, xn − 1 is separable. Thus, |µn | = n. To see that µn is a subgroup
of U (F (n) ), first note that µn is nonempty since 1 ∈ µn . Also, for any α, β ∈ µn ,
(αβ −1 )n = αn β −n = 1n 1−n = 1
In a field F of characteristic p (whether F is finite or not), there are no primitive nth roots of
unity when p | n. Consequently, we only consider nth roots of unity if p - n.
In any field F of characteristic p, if p - n we define the nth cyclotomic polynomial ΦF,n (x) in
a manner analogous to cyclotomic polynomials over Q. If F = Fp , then following the recursive
definition in Z[x],
Y
xn − 1 = Φd (x),
d|n
reduces modulo p. Thus, ΦFp ,n (x) is the polynomial in Fp [x] obtained by reducing the coefficients
of Φn (x) ∈ Z[x] modulo p.
Proof. For part (1), note that the discriminant ∆ of a polynomial in Z[x] is an integer polynomial
in the coefficients. Since f (x) ∈ Z[x], then ∆(f ) is an integer and ∆(f ) = ∆(f ) in Fp . If p - ∆(f ),
then ∆(f ) 6= 0 and so f is separable.
The polynomial f splits completely in the finite field Fpm if and only if f i splits completely in
m
Fpm for all i. This is equivalent to f i divides xp − x and therefore, by Theorem 7.7.12, di divides
m. Hence, f splits completely in the finite field Fpm if and only if lcm(d1 , d2 , . . . , dr ) | m.
For part (2), consider the universal Galois resolvent for a monic polynomial of degree n,
Y
Su (y) = (y − (u1 xσ(1) + · · · + un xσ(n) )) ∈ Z[x1 , . . . , xn , u1 , . . . , un , y].
σ∈Sn
Dedekind’s Theorem allows us to obtain a lower bound on the Galois group of a polynomial
f (x). By showing that the Galois group contains permutations of a given cycle type, it is sometimes
possible to conclude that Gal(f (x)) contains a certain subgroup of Sn .
Example 11.7.6. Consider the quintic polynomial f (x) = x5 − 3x2 + 3x + 3 ∈ Z[x]. By Eisenstein’s
Criterion, we see that f (x) is irreducible. Since f (x) is irreducible, the Galois group is a transitive
subgroup of S5 (acting on the set of roots of f (x)). All transitive subgroups of S5 have a 5-cycle.
Using factorization algorithms in a computer algebra system, we can factor this polynomial over
various finite fields. In particular, reduced modulo 23,
so GalF23 (f (x)) is not just isomorphic to Z2 but, under its injection into S5 , is conjugate to a
subgroup generated by a single transposition. By Dedekind’s Theorem, GalQ (f (x)) contains both a
5-cycle and a 2-cycle. By Exercise 3.5.42, GalQ (f (x)) ∼
= S5 . 4
Using Dedekind’s Theorem requires factoring f (x) in the finite field Fp . Berlekamp’s Algorithm
provides an efficient method to factor polynomials over finite fields. Many computer algebra systems
implement this algorithm, making Dedekind’s Theorem efficient for finding information about the
Galois group of a monic polynomial with integer coefficients. In particular, Berlekamp’s algorithm
efficiently determines when f (x) is irreducible, which in turn implies that f (x) is irreducible in Z[x].
11.8. SOLVABILITY BY RADICALS 605
5
4. Consider the polynomial f (x) = x + 20x − 32 ∈ Z[x]. Use a CAS to show that it is irreducible and
to calculate its discriminant. Deduce that the Galois group of f (x) over Q is A5 .
5. Give a table that lists the number of elements of a given cycle type for each of the transitive subgroups
of S5 . Deduce that if we know a transitive subgroup of S5 contains both a 4-cycle and a 3-cycle then
it is all of S5 .
6. Use the previous exercise to show that x5 + 3x4 + 1 ∈ Q[x] has a Galois group of S5 .
7. Consider the polynomial f (x) = x7 + 14x4 − 24 ∈ Z[x]. Use a CAS to calculate its discriminant.
Find the factorization modulo 5 to deduce that f (x) is irreducible. Consider f (x) modulo 41 and use
Exercise 3.5.44 to show that the Galois group is A7 .
8. The rings Z[x1 , x2 , . . . , xn ] and Q[x1 , x2 , . . . , xn ] are UFDs.
(a) Prove that if f ∈ Z[x1 , x2 , . . . , xn ] is irreducible and nonconstant, then it is irreducible as an
element of Q[x1 , x2 , . . . , xn ].
(b) Suppose that g ∈ Z[x1 , x2 , . . . , xn ] and that f is an irreducible factor of g as an element in
Q[x1 , x2 , . . . , xn ]. Prove that there exists c ∈ Q∗ such that cf is an irreducible factor of g in
Z[x1 , x2 , . . . , xn ].
11.8
Solvability by Radicals
Much of the early development of modern algebra arose out of the effort to understand solutions
to polynomial equations. One of the central goals involved finding explicit formulas for roots of
a polynomial equation and, in particular, a formula involving radicals. From antiquity until the
16th century, scholars only knew how to solve the quadratic equation (and some equations that
quickly reduced to it). The Cardano-Tartaglia-Ferrari method to solve the cubic and the quartic
(Section 7.3), first published in 1545, offered hope that similar methods might exist to solve higher
degree polynomials with radicals.
Many mathematicians subsequently attempted to find a formula for the roots of the general
quintic equation. Mathematicians used countless techniques, many of which are beyond the natu-
ral scope of most undergraduate programs (Elliptic functions, hypergeometric series, etc.). Some
methods met with success but formulas using radicals remained elusive.
As we have seen in earlier sections of this book, field theory and Galois theory solved many
problems in mathematics that had remained open for centuries. Field theory also offers a framework
to describe a real number obtained from the rationals by a combination of radicals. Here again,
Galois theory establishes a startling and unexpected result: There does not exist a formula using
radicals for solutions to a general polynomial p(x) ∈ C[x] of degree deg p ≥ 5.
606 CHAPTER 11. GALOIS THEORY
Definition 11.8.1
A field extension K/F is called radical if there is a chain of fields
where γimi ∈ Ki−1 for some positive integer mi , and this for all 1 ≤ i ≤ n.
We point out that cyclotomic extensions of any field are radical extensions. Indeed, a cyclotomic
extension of F is F (ζ), where ζ is a primitive root of unity, which means that ζ n = 1 for some n.
p
5
√ √
Example 11.8.2. The field extension Q( 1 − 3 + 2)/Q is a radical extension because of the
following chain of subfields:
K0 = Q,
K1 = K0 (γ1 ) with γ12 = 2 ∈ K0 ,
K2 = K1 (γ2 ) with γ22 = 3 ∈ K1 ,
√ √
K3 = K2 (γ3 , with γ35 = 1 − 3 + 2 ∈ K2 . 4
Definition 11.8.3
Let F be a field. We say that an element α (in some field extension of F ) is solvable by
radicals over F if α is in a radical extension of F .
This definition makes precise the notion that α is obtained by successive additions, subtractions,
multiplications, divisions, and nth roots, starting from elements in F .
Example 11.8.4. Consider the polynomial f (x) = x3 − 4x2 + x + 1 ∈ Q[x]. It is easy to check that
f (x) has no roots in Q so, since it is a cubic, it is irreducible. The discriminant is ∆(f ) = 169 = 132 .
By Theorem 7.3.2, since ∆(f ) > 0, then f (x) has three real roots, so the splitting field K of f (x)
over Q is a subfield of R. Furthermore, since ∆(f ) is a square, then by Proposition 11.6.2, K = Q(α),
where α is one of the roots of f (x). Hence, [K : Q] = 3. √
It turns out that K is not a radical extension. Assume K is radical. Then K = Q( m a) for
some a ∈ Q and some m ≥ 3. Since K √ is Galois over Q, then the polynomial xm − a would split
completely in K so that, in particular, m
aζm ∈ K. However, we already saw that K ⊆ R. Hence,
the assumption that K is a radical extension leads to a contradiction. 4
This example shows that if K is radical over F and L is a field such that F ⊆ L ⊆ K, then L is
not necessarily radical over F . However, every element of L is solvable by radicals. This motivates
the following more refined definition.
Definition 11.8.5
Let F be a field. An extension L/F is called solvable if F ⊆ L ⊆ K, where K is a radical
extension of F .
The splitting field K introduced in the previous example is a solvable extension of Q even though
it is not a radical extension. We extend this definition to a single polynomial.
Definition 11.8.6
Let f (x) ∈ F [x]. If the splitting field of f (x) is a solvable extension over F , then f (x) is
said to be solvable by radicals.
11.8. SOLVABILITY BY RADICALS 607
The main theorem of this section, Galois’ Theorem, depends on properties of radical and solvable
extensions, so we develop those properties here.
Proposition 11.8.7
Let K/F be a radial extension.
(1) If L/K is a radical extension, then L/F is radical.
(2) For any field extension L of F , the extension KL/L is radical.
For each i ≥ 1, the composite field Ki L is Ki−1 (γi )L = Ki−1 L(γi ). Furthermore, γimi ∈ Ki−1 ⊆ Ki L
so (11.15) is a chain that makes KL/L a radical extension.
For part (3), let L/F be a radical extension. By part (2), KL/L is a radical extension. However,
since L/F is radical, then by part (1) KL/F is also a radical extension.
Proposition 11.8.8
If K/F is a separable and radical extension, then the Galois closure of K is also a radical
extension over F .
Recall that Ki = Ki−1 (γi ) with γimi ∈ Ki−1 . Thus, σ(Ki ) = σ(Ki−1 )(σ(γi )) with σ(γi )mi ∈
σ(Ki−1 ). Thus, the chain in (11.16) shows that σ(K) is another radical extension of F .
Now E is the composite of σ(K) for all σ ∈ Gal(E/F ). A repeated application of Proposi-
tion 11.8.7 (11.8.7), establishes that E is a radical extension of F .
Corollary 11.8.9
If a finite and separable extension L/F is solvable, then so is the Galois closure of L.
Proof. If L/F is finite and separable, then L is contained in a separable and radical extension K of
F . By Proposition 11.8.8, the Galois closure E of K is a radical extension of F . The Galois closure
of L is the composite of all field σ(L) for σ ∈ Gal(E/F ) and hence is a subfield of E. Since E/F is
a radical extension, then the Galois closure of L is a solvable extension of F .
608 CHAPTER 11. GALOIS THEORY
Corollary 11.8.11
Suppose that char F = 0. The polynomial p(x) ∈ F [x] can be solved by radicals if and only
if GalF (p(x)) is solvable.
Though this textbook defined a solvable group in Section 9.1, from a historical perspective, it is
the property of solvable field extensions that inspired the group theoretic concept. Corollary 11.8.11
establishes a complete characterization of polynomials for which the roots can be expressed by a
combination of radicals. If F = Q, there exist polynomials with Galois group Sn for all positive
integers n. Consequently, Corollary 11.8.11 immediately leads to the following theorem, whose
historical importance cannot be overstated.
Theorem 11.8.12
The general equation of degree n in Q[x] cannot be solved by radicals for n ≥ 5.
This theorem closed the book on the search for a formula to general polynomial equations using
radicals by affirming that such a solution does not exist for polynomials of degree 5 or more. That
formulas do exist for polynomials of degree 2, 3, and 4 follows from the fact that S2 , S3 , and S4
(along with all their subgroups) are solvable groups.
This is extremely interesting because many students of algebra tend to hold the intuition that
algebraic elements consist of everything that can be expressed with radicals. Theorem 11.8.12
establishes that this is not enough. The set of elements that are solvable by radicals is a field in itself.
(See Exercise 11.8.7.) However, it is a strict subfield of the algebraic closure Q. In Exercise 11.7.6,
we showed that the Galois group of f (x) = x5 + 3x4 + 1 is S5 . Hence, the roots of f (x) cannot be
written using addition, subtractions, multiplications, divisions and nth roots, starting from rational
numbers.
In order to complete the proof of Theorem 11.8.10, we need to apply properties of the Galois
correspondence to solvable extensions. This begins with a characterization of cyclic extensions.
Proposition 11.8.13
Let F be a field of characteristic not dividing n which contains the nth roots of unity. Then
F (γ) where γ n ∈ F is a cyclic extension of F of degree dividing n.
Proof. Let µn denote the group of the nth roots of unity. Each automorphism σ ∈ G = Gal(F (γ)/F ),
there exists ζσ ∈ µn such that σ(γ) = ζσ γ. Now for any two σ, τ ∈ G,
Proposition 11.8.14
Let F be a field of characteristic not dividing n that contains the nth roots of unity. Any
cyclic extension of F of degree n is of the form F (γ), where γ n ∈ F .
Proof. Let K be a cyclic extension of F with Gal(K/F ) generated by the automorphism σ. Let
α ∈ K and consider the element
Thus, σ(γ) = ζγ. We observe now that σ(γ n ) = ζ n γ n = γ n . Hence, γ n is fixed by σ and therefore,
γn ∈ F .
By Proposition 11.1.9, since σ i : K → K for 0 ≤ i ≤ n − 1 are distinct automorphisms that fix
F , then they are linearly independent over F as functions. Hence, the function
is nonzero so there exists an α ∈ K such that the associated γ defined in (11.17) is nonzero. Since
σ i (γ) = ζ i γ, then no power σ i fixes γ. By the Galois correspondence, γ is not in any strict subfield
of K. Hence, K = F (γ). Since γ n ∈ F , the proposition follows.
for some α ∈ K and some primitive root of unity. This quantity is called the Lagrange resolvent of
α with σ. It is particularly useful in the situation when the base field F contains the nth roots of
unity.
Proof (of Galois’ Theorem). Let L be a Galois extension over a field F of characteristic 0.
First, suppose that L/F is a solvable extension. Then F ⊆ L ⊆ K, where K is a radical extension
as in (11.14). Expand this chain to the chain
where Ki0 = Ki−1 (ζmi ) with ζmi an mi th root of unity and Ki (γi ) with γimi ∈ Ki−1 . By the Galois
correspondence, we have
Gal(K/Ki−1 )/ Gal(K/Ki0 ) ∼
= Gal(Ki0 /Ki−1 )
is an abelian group. Also, by Proposition 11.8.13, Ki is a cyclic extension of Ki0 . Again by the
Galois correspondence, Gal(K/Ki0 ) E Gal(K/Ki ) and
Gal(K/Ki0 )/ Gal(K/Ki ) ∼
= Gal(Ki /Ki0 )
is a cyclic group. Hence, the chain (11.18) shows that Gal(K/F ) is a solvable group.
Since L/F is a Galois extension, Gal(K/L) is a normal subgroup of Gal(K/F ) and
By Exercise 9.1.8, quotient groups of solvable groups are solvable so Gal(L/F ) is solvable.
Now suppose that Gal(L/F ) is a solvable group. Let n = [L : F ] and let ζ be a primitive nth
root of unity over F . Adjoin ζ to both L and F to obtain the following diagram of fields.
L(ζ)
F (ζ)
Since L(ζ) = LF (ζ) and both F (ζ) and L are Galois over F , then by Proposition 11.3.5,
Gal(L(ζ)/F (ζ)) ∼
= Gal(L/L ∩ F (ζ)),
such that Gi−1 E Gi and Gi /Gi−1 is a cyclic group of prime order for all 1 ≤ i ≤ s. Set Li =
Fix(L(ζ), Gi ). The Galois correspondence produces a chain of subfields
The Galois correspondence also implies that Li−1 is a Galois extension of Li with Gal(Li−1 /Li ) ∼=
Gi /Gi−1 .
Since Li−1 /Li is a cyclic extension of prime degree pi , which divides | Gal(L/F )|. (See Exer-
cise 11.8.4.) Since ζ ∈ Li , then ζ to some appropriate power is a primitive pi th root of unity.
Hence, we can use Proposition 11.8.13 and deduce that Li−1 = Li (γi ) with γipi ∈ Li . Consequently,
L(ζ)/F (ζ) is a radical extension. Furthermore, F (ζ)/F is obviously a radical extension so L(ζ)/F
is a radical extension. However, F ⊆ L ⊆ L(ζ) so L/F is a solvable extension.
11.9. PROJECTS 611
Galois’ Theorem motivated the definition of solvable groups. Because the problem of solving
polynomial by radicals had such historical importance, Galois’ Theorem channeled attention toward
deciding properties of solvable groups. Consequently, theorems about solvable groups imply results
about solvability of polynomials by radicals. For example, consider the Feit-Thompson Theorem
that states that if a group has odd order, then it is solvable. Combined with Corollary 11.8.11,
the Feit-Thompson Theorem implies that every polynomial p(x) ∈ F [x], where char F = 0, whose
splitting field E has [E : F ] odd, is solvable by radicals.
(a) Prove that f (x) reduces to xp − x + 1 in Fp [x] and reduces to (x2 + x + 1)(x + 1)p−2 in F2 [x].
∼ Sp .
(b) Deduce that GalQ (f (x)) =
6. Suppose that xn − a ∈ Q[x] is irreducible. Prove that the splitting field E of xn − a has index [E : Q]
equal to nφ(n) or 21 nφ(n).
7. Let Q be the algebraic closure of Q. Let S be the subset of elements in Q that are solvable by radicals
over Q. (This is a strict subset by Theorem 11.8.12.) Prove that S is a subfield of Q.
8. Let C be the field of constructible numbers over Q. Prove that C is a subfield of S, the subfield of
elements in Q that are solvable by radicals. (See the previous exercise.)
9. Suppose that we have a chain of fields F ⊆ L ⊆ K ⊆ R, where the extension L/F is Galois and the
extension K/F is radical. Prove that [L : F ] ≤ 2.
p √
10. Let D be a square-free integer and let a ∈ Q − {1}. Prove that Q( a D) cannot be a cyclic extension
of degree 4 over Q.
11. Let f (x) ∈ Q[x] of degree n and let G = GalQ (f (x)). Define q(x) as the polynomial q(x) = f (x2 ).
Prove that GalQ (q(x)) is a subgroup of the wreath product Z2 oρ G where ρ : G → Sn corresponds to
G acting on the set of roots of f (x).
11.9
Projects
Project I. Galois Groups of Bicubics. Study the Galois groups of polynomials in Q[x] of the
form x6 + ax4 + bx2 + c. Can you determine the possible orders of some Galois groups for
certain values of a, b, or c? Can you determine the group structure of the Galois groups?
612 CHAPTER 11. GALOIS THEORY
Project II. Quaternion Galois Groups. Exercise 11.6.15 shows that there is no polynomial
p(x) ∈ Q[x] of degree less than 8 such that GalQ (p(x)) ∼ = Q8 . Try to find a polynomial of
degree 8 in Q[x] that has a Galois group isomorphic to Q8 . Try to extend your result to
characterize all irreducible polynomials of degree 8 whose Galois group is Q8 .
Project III. Lagrange Resolvents. Example 11.8.16 gave a calculation of a Lagrange resolvent
associated to a root of a nontrivial cubic polynomial in Q[x]. The Lagrange resolvent for the
root of a cubic is generally not a symmetric polynomial in the roots. Can you justify the
result of the numerical calculation for γ 3 of Example 11.8.16 from theoretical reasons? Can
you provide a similar formula γ 3 with arbitrary cubic p(x) ∈ Q[x] such that GalQ (p(x)) ∼
= Z3 ?
Can you generalize to higher degree polynomials?
Project IV. Nested Polynomials. Let p(x), q(x) ∈ Q[x]. Can you determine anything about
the Galois group of p(q(x)) from the Galois groups Gal(p(x)) and Gal(q(x))? Start by calcu-
lating some examples. (Feel free to use a CAS to assist with calculations.)
Project V. Dynatomic Polynomials. In the exercises of Section 7.5, we introduced the concept
of dynatomic polynomials. The study of Galois groups of dynatomic polynomials ΦP,n (x) given
a polynomial P (x) ∈ Q[x] is challenging. Try to discover what you can about the Galois groups
Gal(ΦP,n ) for various P and for various n. Can you find an upper bound on | Gal(ΦP,n )|? Can
you determine any internal structure to Gal(ΦP,n )?
12. Multivariable Polynomial Rings
Linear algebra encompasses many topics but, at the introductory levels, it studies solution sets to
systems of linear equations in multiple variables. This theory has applications in many branches of
mathematics, in computer science, and in the natural and social sciences. The structures introduced
in solving systems of linear equations generalized to the theory of vector spaces, and motivated the
concepts of linear transformation, kernel, image, subspaces, and so on. However, the study of solving
linear equations in multiple variables could also be generalized in a different direction, namely the
study of systems of polynomial equations in multiple variables.
In relatively recent years (recent for the history of mathematics), mathematicians have discovered
a number of algorithms that make the study of systems of polynomial equations far more tractable
than they might appear at first glance. The natural context in which to study systems of polynomial
equations is the context of multivariable polynomial rings or modules over such rings. This chapter
introduces methods used in applications of multivariable polynomial rings.
Sections 12.1 through 12.3 provide the theoretical underpinnings to studying systems of poly-
nomial equations. First, we introduce the concept of Noetherian modules and Noetherian rings,
which describe certain finiteness conditions present in multivariable polynomial rings. We then in-
troduce the notion of affine space and an affine variety, the set of solutions of a system of polynomial
equations. We also provide a complete correspondence between ideals and affine varieties.
In Sections 12.4 through 12.7, we present algorithms related to polynomial rings and introduce
the concept of a Gröbner basis of an ideal, a generating set that is optimal for the application of
many algorithms. We also illustrate the value of these algorithms for solving systems of polynomial
equations. Finally, Section 12.8 gives a brief introduction to algebraic geometry, a vast field that
stems from applying the theory of multivariable polynomial rings to study geometric concepts.
The ability to solve systems of polynomial equations has found innumerable applications in
computation. Consequently, mathematicians and scientists have developed many computer imple-
mentations of polynomial division, Buchberger’s algorithm, and other algorithms related to solving
systems of polynomials equations. Besides the implementation in commercial computer algebra
systems (e.g., Maple, Mathematica), some other freeware packages include CoCoA (Computational
Commutative Algebra), GAP (Groups, Algorithms, Programming), Macaulay 2, Magma, or Sage.
A variety of recent books address applications of Gröbner bases and computational algebra. See for
example [1, 16, 17, 26].
12.1
Introduction to Noetherian Rings
In order to study polynomial rings, we take a step back in abstraction and present a notion that is
restrictive enough to capture the finiteness conditions that give multivariable polynomial rings some
of their valuable properties but broad enough to include other rings besides F [x1 , x2 , . . . , xn ], where
F is a field.
In this section, the ring of coefficients R always denotes a commutative ring.
613
614 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS
Definition 12.1.1
Let (S, 4) be a poset.
Proposition 12.1.2
A poset (S, 4) satisfies the ascending (resp. descending) chain condition if and only if every
nonempty subset of S has a maximal (resp. minimal) element.
Proof. (⇐=) Suppose that the poset is such that every subset of S has a maximal element. Let
x1 4 x2 4 · · · be an increasing sequence in S. The elements of this sequence for a set, which must
have maximal element a. Suppose that xn = a. Then since a = xn 4 xi for all i ≥ n and since a is
maximal, then xi = a for all i = i ≥ n. Thus, the sequence is stationary.
(=⇒) Suppose there exists a nonempty subset T of S that does not have a maximal element.
Define a sequence as follows. Let x1 be any element in T . Let xi be any term in the sequence. Then
since T does not have a maximal element, there exists some element xi+1 such that xi 4 xi+1 and
xi+1 6= xi . This inductive definition creates an increasing sequence that is not stationary.
The proof of the equivalent for the descending chain condition is identical.
Example 12.1.3. Consider the poset (N∗ , |) of positive integers with the partial order of divisi-
bility. This poset does not satisfy the ascending chain condition since for example, 2, 22 , 23 , . . . is
an ascending chain that does not have a maximal element. On the other hand, (N∗ , |) satisfies the
descending chain condition. If a ∈ N∗ , then a has a finite number of divisors so any chain that
includes a can only have predecessors (divisors) that are strict a finite number of times. Hence, a
chain containing any positive integer a must become stationary. 4
Properties of ascending chains or descending chains can be applied in many algebraic contexts.
However, for Noetherian rings and modules, we consider chains of submodules.
Definition 12.1.4
Let R be any ring. An R-module M is called Noetherian (resp. Artinian) if its poset of
submodules (Σ, ⊆) satisfies the ascending (resp. descending) chain condition.
Example 12.1.5. Consider the ring of integers Z as a Z-module. Submodules of Z consist of its
ideals. Since Z is a PID, every ideal has the form (a) for some nonnegative a ∈ Z. For ideal
containment, (a) ⊆ (b) if and only if b | a. Hence, given any ideal, (a), the only ideals that can
contain it are generated by the divisors of a and hence there is only a finite number of them. Thus,
Z satisfies the ascending chain condition, so Z is a Noetherian module. However, the chain of ideals
(a) ⊇ (a2 ) ⊇ · · · ⊇ (ak ) ⊇ · · ·
never terminates. Hence, Z is not an Artinian module over Z. 4
Example 12.1.6. As a Z-module, the abelian group Q is neither Noetherian nor Artinian. From
the previous example, we already see that Q is not Artinian. However, the chain of submodules
1 1 1
Z⊆ Z ⊆ Z ⊆ ··· ⊆ kZ ⊆ ···
2 4 2
12.1. INTRODUCTION TO NOETHERIAN RINGS 615
is not stationary. Thus, Q, as a Z-module, does not satisfy the ascending chain condition. 4
Example 12.1.7. The previous example inspires us to find a Z-modules that satisfies the descending
chain condition but not the ascending chain condition. Consider the group Q/Z. Let p be a prime
number and define Gn the subgroup of the form
a n
Gn = |0 ≤ a < p .
pn
{0} = G0 ⊆ G1 ⊆ G2 ⊆ · · ·
is never stationary so Q/Z is not Noetherian. On the other hand, a Z-submodule of Q/Z is of the
form M = ab Z with gcd(a, b) = 1. However, there are only b elements in M . A strict submodule
of M will have d elements, where d | b. Since b has a finite number of divisors, every descending
chain of modules that includes M is eventually stationary. Hence, Q/Z satisfies the descending chain
condition, so is Artinian as a Z-module. 4
Example 12.1.8. Every finite abelian group G, as a Z-module, satisfies both the ascending and
the descending chain condition. This is simply because G has a finite set of subgroups, so every
nonempty subset of Sub(G) has a maximal element. 4
Proposition 12.1.9
Let M be a module and N a submodule. M is Noetherian (resp. Artinian) if and only if
N and M/N are Noetherian (resp. Artinian).
Proof. We only provide a proof for Noetherian modules since the proof for Artinian modules is
similar.
Suppose that M is Noetherian. Any ascending chain of submodules in N is also an ascending
chain of submodules in M so is stationary. Thus, N is Noetherian. By the Fourth Isomorphism
Theorem, a chain of submodules in M/N corresponds uniquely to a chain of submodules of M but
that contain N . Any such chain in M is stationary, so the chain in M/N is stationary. Thus, M/N
is Noetherian.
Conversely, suppose that N and M/N are Noetherian. Let
M0 ⊆ M1 ⊆ M2 ⊆ · · ·
be an ascending chain of submodules in M . Then the chain {Mi ∩ N }i≥0 is an ascending chain of
submodules in N , while {π(Mi )}i≥0 is an ascending chain of submodules in M/N , where π : M →
M/N is the canonical projection. Both of these chains are constant after a large enough index k.
Suppose that i ≥ k. Let a ∈ Mi+1 . Since π(Mi ) = π(Mi+1 ), then π(a) ∈ π(Mi ) so there exists
b ∈ Mi such that π(b) = π(a). Hence, π(a − b) = 0 so a − b ∈ ker π = N . Setting n = a − b, we
see that n ∈ Mi+1 ∩ N but since Mi = Mi+1 , then n ∈ Mi ∩ N . In particular, both n and b are in
Mi so a = b + n ∈ Mi . Consequently, Mi = Mi+1 , which shows that the ascending chain {Mi } is
stationary. So M is Noetherian.
Corollary 12.1.10
Let M1 , M2 , . . . , Mr be Noetherian (resp. Artinian) R-modules. The direct sum M1 ⊕M2 ⊕
· · · ⊕ Mr is a Noetherian (resp. Artinian) R-module.
As we have seen, many theorems concerning Noetherian modules also hold for Artinian modules
and vice versa. However, this is not always the case. There is no equivalent to the following
proposition for Artinian modules. It is precisely the equivalent condition in the following proposition
that makes Noetherian modules so interesting.
Proposition 12.1.11
Let R be a commutative ring. An R-module M is Noetherian if and only if every submodule
of M is finitely generated.
Proof. First, suppose that M is Noetherian. Let N be a submodule of M . Let (Σ, ⊆) be the poset
of all finitely generated submodules of N . Since M is Noetherian, then by Proposition 12.1.2, Σ
has a maximal element N 0 . Assume that N 0 is a strict submodule of N . Then consider the module
N 0 + Rx for some element x ∈ N − N 0 . Then N 0 + Rx is still finitely generated and a submodule of
N . Hence, it contradicts the maximality of N 0 . Thus, N 0 = N .
Conversely, suppose that every submodule of M is finitely generated. Consider an ascending
chain of submodules M1 ⊆ M2 ⊆ · · · of M . The set
[
N= Mn
n≥0
Definition 12.1.12
A commutative ring R is said to be Noetherian (resp. Artinian) if it is a Noetherian (resp.
Artinian) as an R-module.
Recall that when considering R as an R-module, the R-submodules of R are precisely the ideals
of R. Hence, a ring R is Noetherian (resp. Artinian) if it satisfies the ascending (resp. descending)
chain condition on ideals. By Proposition 12.1.11, a ring R is Noetherian if and only if every ideal
is finitely generated.
The following examples illustrate some of the finiteness conditions on Noetherian and Artinian
rings.
Example 12.1.13. Since a field F has only two ideals, namely (0) and F , then every chain of ideals
is stationary. Thus, every field is both Artinian and Noetherian. 4
Example 12.1.14. Every principal ideal domain (PID) R is Noetherian. This is because, by defi-
nition, every ideal I in R is generated by a single element. Since every ideal is finitely generated, by
Proposition 12.1.11, R is a PID. By Proposition 12.1.11, the condition for a ring to be Noetherian
is a direct generalization of the principal ideal condition. 4
Example 12.1.15. An example of a non-Noetherian ring is the polynomial ring R = F [x1 , x2 , . . .]
in a countable number of variables, where F is a field. The ascending chain of ideals
(x1 ) ⊆ (x1 , x2 ) ⊆ (x1 , x2 , x3 ) ⊆ · · ·
12.1. INTRODUCTION TO NOETHERIAN RINGS 617
never terminates so F [x1 , x2 , . . .] is not Noetherian. It is not Artinian either because (x1 ) ⊇ (x21 ) ⊇
· · · is a descending chain that never terminates.
Notice that the ring R itself is a finitely generated R-module, generated by 1, whereas the
submodule consisting of polynomials without a constant term is not finitely generated. Hence,
this gives an example where the R-module is finitely generated but not every submodule is finitely
generated. 4
Though Proposition 12.1.9 guarantees that ideals of a Noetherian ring must be Noetherian, an
arbitrary subring of a Noetherian ring need not be Noetherian. Consider the previous example.
Note that F [x1 , x2 , . . .] is an integral domain so we can construct its field of fractions F (x1 , x2 , . . .).
Since this is a field, it is Noetherian. Consequently, this gives an example where the subring of a
Noetherian ring is not Noetherian.
Proposition 12.1.9 has many consequences for Noetherian and Artinian rings. The following
proposition is a first one. A few other similar results appear in the exercises.
Proposition 12.1.16
Let R be a Noetherian (resp. Artinian) ring and let M be a finitely generated R-module.
Then M is a Noetherian (resp. Artinian) R-module.
π(r1 , r2 , . . . , rn ) = r1 x1 + r2 x2 + · · · + rn xn .
We now consider some properties of Noetherian rings that do not hold for Artinian rings.
Proposition 12.1.17
Let R be a subring of a ring S. Suppose that R is a Noetherian ring and that S is finitely
generated as an R-module. Then S is a Noetherian ring.
For the following proposition we recall the following notation for polynomials. If p(x) ∈ R[x],
then LC(p(x)) or more simply LC(p) denotes the leading coefficient of p(x) and LT(p(x)) or LT(p)
is the leading term. So if
p(x) = an xn + · · · + a1 x + a0 ,
then deg p(x) = n, LC(p) = an and LT(p) = an xn . If I is an ideal in R[x], we also define LC(I) as
the set of leading coefficients of polynomials that occur in I. It is easy to see that with I an ideal
in R[x], then LC(I) is an ideal in the coefficient ring R.
Proof. Let I be an arbitrary ideal in R[x]. Since R is Noetherian, LC(I) is finitely generated, say
by a1 , a2 , . . . , ak . Let f1 , f2 , . . . , fk be polynomials in I such that LC(fi (x)) = ai for i = 1, . . . , k.
Define mi = deg fi (x) and let m = max{m1 , m2 , . . . , mk }.
Call J = (f1 , f2 , . . . , fr ) the new ideal in R[x]. Note that J ⊆ I.
Let f ∈ I. Then LC(f ) ∈ LC(I) so there exist r1 , r2 , . . . , rk ∈ R such that LC(f ) = r1 a1 + r2 a2 +
· · · + rk ak . If n = deg f (x) ≥ m, then the polynomial
k
X
f (x) − ri xn−mi fi (x)
i=1
has degree strictly less than deg f (x) since the powers of x and the constants ri are such that the
subtraction cancels the leading term of f (x). Notice that the polynomial on the right is an element
of J. By repeating this process, we can subtract from f (x) an element of J and obtain a polynomial
g(x) of degree strictly less than m. In other words, there exists a polynomial g(x) with deg g(x) < m
such that f (x) − g(x) ∈ J.
As R submodules of R[x], we have shown that I = (I ∩ M ) + J, where M = R + Rx + · · · +
Rxm−1 . Now M is obviously finitely generated as an R-module so it is a Noetherian R-module by
Proposition 12.1.16. The intersection I ∩ M is a submodule of M so by Proposition 12.1.11, I ∩ M
is finitely generated, say by the set {g1 , g2 , . . . , g` }, as an R-module. It is clear that
{f1 , f2 , . . . , fk , g1 , g2 , . . . , g` }
Hilbert’s Basis Theorem leads us to the following essential corollaries that are essential for sub-
sequent study of multivariable polynomial rings.
Corollary 12.1.19
Let R be Noetherian. Then R[x1 , x2 , . . . , xn ] is Noetherian. More generally, every finitely
generated R-algebra is Noetherian.
Proof. The first part of the corollary follows by a repeated application of Hilbert’s Basis Theorem.
The second half of the corollary follows from Proposition 12.1.9.
Corollary 12.1.20
Let F be a field. Every ideal in F [x1 , x2 , . . . , xn ] is finitely generated.
Though the study of solution sets to systems of polynomial equations is more general and more
involved than linear equations, Hilbert’s Basis Theorem, and in particular Corollary 12.1.20, provides
a key result that makes subsequent useful algorithms possible.
{0} = M0 ⊆ M1 ⊆ · · · Mn−1 ⊆ Mn = M
where Mi is a submodule of Mi+1 such that Mi+1 /Mi is an irreducible R-module. Prove that M has
a composition series if and only if M satisfies both chain conditions.
8. Let R be a Noetherian ring and let D be a multiplicatively closed subset of R that does not contain 0.
Prove that D−1 R is a Noetherian ring. [Hint: Prove that the ideals in D−1 R are of the form D−1 I,
where I is an ideal of R.]
9. Let {Ij }j∈J be an arbitrary collection of ideals in a Noetherian ring R. Prove that if I is the least
ideal containing all Ij , then there exists a finite subset {j1 , j2 , . . . , jr } ⊆ J such that
10. Let R be Noetherian. Prove that R[[x]] is Noetherian. [Hint: Modify the proof of the Hilbert Basis
Theorem but instead of LC(I), consider the ideal of coefficients of the terms of least degree of each
power series f ∈ R[[x]].]
11. An ideal I in a ring R is said to be irreducible if whenever I = I1 ∩I2 , then I = I1 or I = I2 . Prove that
in a Noetherian ring, every ideal is a finite intersection of irreducible ideals. [Hint: Assume otherwise
and consider the poset of ideals that is not a finite intersection of irreducible ideals.]
12.2
Multivariable Polynomials and Affine Space
12.2.1 – Terminology for Multivariable Polynomials
We can think of the multivariable polynomial ring in two different ways.
We have encountered some theorems that affirm that if a ring R satisfies some property then the
ring R[x] satisfies that same property. For example, Theorem 6.5.5 states that if R is a UFD then
R[x] is as well; Hilbert’s Basis Theorem states that if R is Noetherian, then so is R[x]. In each case,
the immediate corollary is that R[x1 , x2 , . . . , xn ] satisfies that property. Such corollaries involve a
recursive application of the associated theorem and viewing R[x1 . . . , xn ] as R[x1 , . . . , xn−1 ][xn ]. In
this view of the multivariable polynomials, one writes a polynomial f ∈ R[x1 . . . , xn ] as
m
X
f (x1 , x2 , . . . , xn ) = gi (x1 , . . . , xn−1 )xin .
i=0
The above perspective has merit but, when F is a field, rings of the form F [x1 , x2 , . . . , xn ] have
additional desirable properties. In particular, F [x1 , x2 , . . . , xn ] is a unital associative algebra over F ,
a vector space equipped with a multiplication that is associative, and has an identity. The standard
basis of F [x1 , x2 , . . . , xn ] consists of all monomials
xα1 α2 αn
1 x2 · · · xn ,
where αi ∈ N for 1 ≤ i ≤ n. To simplify notation, we often write the above monomial as xα , where
α ∈ Nn . Hence, a polynomial f in F [x1 , x2 , . . . , xn ] is a (finite) linear combination
X
f= cα xα ,
α
where cα ∈ F . The product on F [x1 , x2 , . . . , xn ] follows from properties of distributivity and the
product xα · xβ = xα+β , where the addition α + β occurs in the monoid Nn . (We point out that the
product in F [x1 , x2 , . . . , xn ] is a convolution product on Fun(Nn , F ) as described in Section 5.4.3.)
Definition 12.2.1
(1) The n-tuple α ∈ Nn is called the multidegree of the monomial xα . We write mdeg xα =
α.
(2) The integer α1 +α2 +· · ·+αn is called the total degree of the monomial and is denoted
|α|.
Definition 12.2.3
A polynomial in F [x1 , x2 , . . . , xn ] is called homogeneous if all of the monomials have the
same total degree. For any polynomial f , the homogeneous component of degree d is the
sum of the terms of total degree d that appear in f .
The polynomial in Example 12.2.2 is obviously not homogeneous. The homogeneous component
of degree 5 is 7x2 y 3 + 2y 3 z 2 . Note that the elementary symmetric polynomials sk (x1 , x2 , . . . , xn )
introduced in Section 11.5.1 are homogeneous of total degree k.
12.2. MULTIVARIABLE POLYNOMIALS AND AFFINE SPACE 621
Multivariable polynomials serve a dual purpose. We can and do treat them purely as algebraic
objects. However, we can also consider them as functions from F n → F . Let c = (c1 , c2 , . . . , cn ) ∈ F n
and f ∈ F [x1 , x2 , . . . , xn ]. In the functional perspective, as usual, f (c) denotes f evaluated at c
and is an element of F . Though F n has the structure of a vector space over F , as a function
f : F n → F is not generally a linear transformation. It is not uncommon to consider the solution
set of f (x1 , x2 , . . . , xn ) = 0, but again this is generally just a subset of F n and not a subspace.
In the more algebraic perspective, the evaluation evc : F [x1 , x2 , . . . , xn ] → F defined by evc (f ) =
f (c) is a ring homomorphism. Hence, we can consider the kernel ker evc . This is an ideal in
F [x1 , x2 , . . . , xn ], the ideal of all polynomials that evaluate to 0 at c.
Definition 12.2.4
Let F be a field. The set F n is called the n-dimensional affine space over F . It is alternately
denoted by AnF .
From a set-theoretic perspective, there is no need for this terminology. However, in classical
algebraic geometry, additional geometric structure is imposed on F n and the terminology of affine
space refers to that additional structure, which differs from the vector space structure.
In the affine space, we care about solution sets to systems of polynomial equations.
Definition 12.2.5
Let F be a field and let S ⊆ F [x1 , x2 , . . . , xn ] be a subset of polynomials, not necessarily
finite. Then we define
Example 12.2.6. Consider the polynomial p(x, y) = 4x2 − x4 + 4y 2 − y 4 − 3 ∈ R[x, y]. As a subset
of affine space R2 , the variety of V(p) is depicted below.
Example 12.2.7. Consider the polynomial p(x, y, z) = (x2 + y 2 − z 3 )2 − (x2 + y 2 + 3z 2 ) ∈ R[x, y, z].
As a subset of affine space R3 , the affine variety of V(p) is depicted below.
622 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS
x y
4
Example 12.2.8. In the ring R[x, y, z] consider the two polynomials f1 (x, y, z) = x2 + y 2 − z 2 and
f2 (x, y, z) = y 2 + z − 4. The affine variety V(f1 , f2 ) is depicted on the left below.
z
y
y
y
x
x
Since V(f1 , f2 ) is the set of points in R3 that satisfy both f1 (x, y, z) = 0 and f2 (x, y, z) = 0,
then V(f1 , f2 ) = V(f1 ) ∩ V(f2 ). The diagram on the right shows the varieties V(f1 ) and V(f2 )
separately to illustrate their intersection. 4
Definition 12.2.9
An affine variety V in AnK is called a hypersurface if V = V(f ), where f is an irreducible
polynomial. (If n = 2 a hypersurface is called a curve and if n = 3, a hypersurface is called
a surface.)
Example 12.2.10. As an example that illustrates the importance of the field, consider the poly-
nomial p(x, y) = x2 + y 2 − 1 ∈ F [x, y].
If F = R, then the hypersurface V(p) in R2 is the usual unit circle in the affine real plane.
If F = Q, then V(p) in Q2 consists of rational Pythagorean pairs. It is possible to show that all
solutions to x2 + y 2 − 1 = 0 in Q are of the form
a 2 − b2 2ab
x= and y= .
a 2 + b2 a 2 + b2
The unit circle in R2 might give some sense of V (p) in Q2 , but the betweenness (or continuity)
assumptions in the Euclidean space is that the unit circle in R2 has no holes. This is not the case
for geometry in Q2 .
If F = F7 , it is not particularly easy to visualize the affine variety V(p) in the affine space F27 .
The pairs of points that are solutions to x2 + y 2 = 1 is
(0, 1), (0, 6), (1, 0), (6, 0), (2, 2), (2, 5), (5, 2), and (5, 5).
12.2. MULTIVARIABLE POLYNOMIALS AND AFFINE SPACE 623
x or
The latter diagram used {−3, −2, −1, 0, 1, 2, 3} as the distinct set of representatives in F7 .
If F = C, then we would need 2 C-dimensions or 4 real dimensions to fully depict p(z, w) = 0.
If we write z = x1 + iy1 and w = x2 + iy2 , then the equation z 2 + w2 − 1 = 0 breaks into real and
complex components as
In this perspective, we can view (though visualizing is another story) V(p) as a variety described
by two polynomial equations in the four-dimensional real affine space R4 . 4
The construction V is in fact a function from the set of subsets of F [x1 , x2 , . . . , xn ] to the set
of subsets of F n . This function, which corresponds to finding solutions to systems of polynomials,
satisfies many ring-theoretic properties that make ring theory the best context in which to study
systems of polynomial equations.
Suppose that S ⊆ S 0 are subsets of F [x1 , x2 , . . . , xn ]. If c ∈ V(S 0 ) then f (c) = 0 for all f ∈ S.
Thus, c ∈ V(S). This implies that the affine variety construction V is an inclusion-reversing function
from the poset of subsets (P(F [x1 , x2 , . . . , xn ]), ⊆) to the poset of subsets (P(F n ), ⊆). In other words,
Given a set S ∈ F [x1 , x2 , . . . , xn ], consider the ideal I = (S) and the variety V(I) associated to the
ideal I. From (12.1), V(I) ⊆ V(S). Now if c ∈ V(S), then f (c) = 0 for all f ∈ S. On the other
hand, every polynomial p ∈ I is of the form
p = g1 f1 + g2 f2 + · · · + gn fn
and so c ∈ V(I). Thus, V(S) ⊆ V(I). We have proven the following important proposition.
Proposition 12.2.11
For all subsets S ∈ F [x1 , x2 , . . . , xn ], the affine variety V(S) is equal to V(I), where I = (S).
Hilbert’s Basis Theorem asserts that every ideal I ∈ F [x1 , x2 , . . . , xn ] is generated by a finite
number of elements f1 , f2 , . . . , fs . Along with Proposition 12.2.11, this leads to the surprising result
that every affine variety V(S) is of the form V(f1 , f2 , . . . , fs ), where (f1 , f2 , . . . , fs ) = (S), or in
other words, that every affine variety is the solution set to a finite system of polynomials
f (x , x , . . . , xn ) = 0
1 1 2
..
.
f (x , x , . . . , x ) = 0.
s 1 2 n
624 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS
We point out that the study of affine varieties directly generalizes the introductory linear algebra
topic of solving systems of linear equations. Systems of linear equations simply involve polynomials
fi of the form
fi (x1 , x2 , . . . , xn ) = ai1 x1 + ai2 x2 + · · · + ain xn − bi ,
Definition 12.2.12
Let Z ⊆ F n be a subset of points in the affine space. Then we define
The subset of polynomials I(Z) is nonempty since 0 ∈ I(Z). If f, g ∈ I(Z), then f (c) − g(c) = 0
for all c ∈ Z, so f − g ∈ I(Z). Furthermore, if f ∈ I(Z) and p ∈ F [x1 , x2 , . . . , xn ], then p(c)f (c) = 0
for all c ∈ Z, so pf ∈ I(Z). We have shown the following proposition.
Proposition 12.2.13
Let Z ⊆ F n be a subset of the affine space. The subset I(Z) is an ideal in F [x1 , x2 , . . . , xn ].
As we will see, the functions I and V share many parallel properties. A first similarity is that
I is an inclusion-reversing function from the poset of subsets (P(F n ), ⊆) to the poset of subsets
(P(F [x1 , x2 , . . . , xn ]), ⊆). In other words,
Z ⊆ Z 0 =⇒ I(Z 0 ) ⊆ I(Z).
Indeed, if f ∈ I(Z 0 ), then f (c) = 0 for all c ∈ Z 0 . In particular, f (c) = 0 for all c ∈ Z since Z ⊆ Z 0 .
Hence, if f ∈ I(Z 0 ), then f ∈ I(Z).
Since the I and V functions have opposite domains and codomains, as depicted below,
we might wonder if they are inverse functions. This cannot be the case because V is not injective.
We can see this from the fact that V(S) = V(I), where I is the ideal generated by S. Even if we
restrict V to the set of ideal in F [x1 , x2 , . . . , xn ], the variety function V still fails to be injective. For
example, in R[x, y] with corresponding affine space A2R , the ideals (x, y) and (xn , y m ), where n and
m are any positive integers, give the same variety, namely the origin {(0, 0)}. On the other hand,
I({(0, 0)}) = (x, y), so I(V(xn , y m )) = (x, y).
Having defined the term “affine variety,” it is important to note that not every subset of F n can
arise as an affine variety. For example, consider the set Z as a subset of the one-dimensional affine
space R1 . Affine varieties in the affine space of R are solution sets to ideals of polynomials in R[x].
Since R[x] is a PID, then affine varieties in R correspond to solution sets of a polynomials. Hence,
the varieties in R correspond to ∅ (for the polynomial 1), R (for the polynomial 0), and any finite
subset of R.
As we continue to develop the algebraic-geometric structure of affine space AnF , we point out
that we are no longer interested in subsets of F n and subsets of F [x1 , x2 , . . . , xn ]. Instead, we care
12.2. MULTIVARIABLE POLYNOMIALS AND AFFINE SPACE 625
V (12.2)
Maple Function
with(plots); Imports a library of plotting commands, among which are the
following three procedures.
implicitplot Plots the solution set to an algebraic equation in two variables
in a specified domain of those two variables.
implicitplot3d Plots the solution set to an algebraic equation in three vari-
ables in a specified domain of those three variables.
intersectplot Plots the intersection of two surfaces in R3 , specified either
as function graphs, parametric surfaces, or solutions to equa-
tions.
with(PolynomialIdeals); The package PolynomialIdeals contains a data structure
and a variety of commands that are useful for manipulating
ideals in polynomial rings of multiple variables.
(a1 x + b1 y + c1 z − d1 , a2 x + b2 y + c2 z − d2 ) = (a3 x + b3 y + c3 z − d3 , a4 x + b4 y + c4 z − d4 ).
Prove that the solution set of this system is precisely 4 points. (In particular, it is not a curve.)
2
10. Let F be any field. Consider the set of n × n matrices Mn×n (F ) as the affine space F n .
(a) Prove that SLn (F ) is an affine variety in Mn×n (F ).
(b) Prove that set of orthogonal matrices On (F ) is an affine variety.
(c) Find an explicit set of equations that define O3 (F ).
12.3
The Nullstellensatz
The correspondence between affine varieties and ideals in F [x1 , x2 , . . . , xn ] (12.2) is not a bijection.
The main theorems concerning this correspondence are called Nullstellensatz, coming in the so-called
weak form and the strong form. The German term “Nullstellensatz” literally means the Theorem
(“satz”) of the Locations (“stellen”) of Zeros (“null”). These theorems turn out to be profound
generalizations of the Fundamental Theorem of Algebra.
Proposition 12.3.1
Let A be a Noetherian ring. Let A ⊆ B ⊆ C be rings such that C is finitely generated
as an A-algebra and such that C is finitely generated as a B-module. Then B is finitely
generated as an A-algebra.
for constants βij ∈ B. Furthermore, since C is a ring, the product of generators of C over B leads
k
to structure constants γij ∈ B satisfying
n
X
k
vi vj = γij vk . (12.4)
k=1
Proposition 12.3.2
Let F be a field and let E be a finitely generated F -algebra. If E is a field then it is a finite
extension of F .
Proof. Suppose that the field E is given by E = F [α1 , α2 , . . . , αn ]. Assume that E is not algebraic
over F . Then at least one of the generators is transcendental over F . In fact, we can renumber
the generators of E so that for some r ≥ 1, the generators α1 , . . . , αr are algebraically independent
over F and that αr+1 , . . . , αn are algebraic over K = F (α1 , α2 , . . . , αr ). Then E is a finite extension
of K, that is a finite-dimensional vector space over K, and also finitely generated as a K-module.
Since F ⊆ K ⊆ E, by Proposition 12.3.1, K is finitely generated as a F -algebra, so in fact K =
F [β1 , β2 , . . . , βs ], where βi ∈ F (α1 , α2 , . . . , αr ).
Now each βi is of the form
fi (α1 , α2 , . . . , αr )
βi =
gi (α1 , α2 , . . . , αr )
for polynomials fi , gi ∈ F [x1 , x2 , . . . , xr ]. Recall that F [x1 , x2 , . . . , xr ] = F [α1 , α2 , . . . , αr ] is a UFD.
There are a variety of ways to see that F [x1 , x2 , . . . , xr ] has an infinite number of prime (irreducible)
elements. Consequently, there exists an irreducible polynomial h that does not divide any of the gi .
By properties of addition and multiplication of fractions, every rational expression f /g ∈ K, when
in reduced form, has a denominator that is divisible by some divisors of g1 , g2 , . . . , gr . However,
the rational expression 1/h, which is in F (α1 , α2 , . . . , αr ), does not satisfy this property. This is a
contradiction. Therefore, E is algebraic over F . Since E is algebraic and generated over F by a
finite number of elements, then E is a finite extension of F .
When F is algebraically closed, the Weak Nullstellensatz has many important equivalent formu-
lations.
Corollary 12.3.4
Let F be an algebraically close field. The maximal ideals in F [x1 , x2 , . . . , xn ] are of the
form (x1 − c1 , . . . , xn − cn ) for some point (c1 , c2 , . . . , cn ) ∈ F n .
628 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS
Corollary 12.3.5
Let F be an algebraically closed field. Suppose that f1 , f2 , . . . , fs ∈ F [x1 , x2 , . . . , xn ] are
such that the system of equations
f (x , x , . . . , xn ) = 0
1 1 2
..
. (12.5)
f (x , x , . . . , x ) = 0
s 1 2 n
for some polynomials pi ∈ F [x1 , x2 , . . . , xn ]. But then f (c) = 0. Hence, c ∈ V(I) and thus the
system (12.5) has c = (c1 , c2 , . . . , cn ) as a solution.
Another way of stating this corollary is that if F is an algebraically closed field and I is an ideal
F [x1 , x2 , . . . , xn ], then
V(I) = ∅ =⇒ I = F [x1 , x2 , . . . , xn ].
The version of the Nullstellensatz given in Corollary 12.3.5 shows why the Nullstellensatz serves the
role of the Fundamental Theorem of Algebra for multivariable polynomials. It says that any ideal I
of polynomials that is strictly less that C[x1 , x2 , . . . , xn ] has some common zeros in Cn .
We point out that the property of the field being algebraically closed is a necessary requirement.
For example, the polynomial x2 + y 2 + 1 ∈ R[x, y] has no solutions and yet the ideal (x2 + y 2 + 1)
is not all of R[x, y].
Definition 12.3.6
√
An ideal I in a commutative ring R is called a radical ideal if I = I.
f m = p1 f1 + p2 f2 + · · · + ps fs . (12.6)
Consider the ideal I˜ = (f1 , . . . , fs , 1 − yf ) in the new ring F [x1 , . . . , xn , y]. We prove that V(I)
˜ = ∅.
n+1
Let (b1 , . . . , bn , bn+1 ) ∈ F .
Case 1: (b1 , . . . , bn ) ∈ V(I). Then f (b1 . . . , bn ) = 0 and, evaluated at (b1 , . . . , bn , bn+1 ), the poly-
nomial 1 − yf is 1. In particular, (b1 , . . . , bn , bn+1 ) ∈ ˜
/ V(I).
Case 2: (b1 , . . . , bn ) ∈ / V(I). Then for at least one of the generating polynomials fi , we must have
fi (b1 , . . . , bn ) 6= 0. Viewing fi as a polynomial in x1 , . . . , xn , y, though a constant with respect
to y, gives fi (b1 , . . . , bn , bn+1 ) 6= 0. This shows that (b1 , . . . , bn , bn+1 ) ∈ ˜
/ V(I).
The two cases cover all points in the affine space so V(I) ˜ = ∅.
By the Weak Nullstellensatz (Corollary 12.3.5), we deduce that I˜ = F [x1 , . . . , xn , y]. In particu-
˜ Thus, there exist polynomials q1 , . . . , qs , r ∈ F [x1 , . . . , xn , y] such that
lar, 1 ∈ I.
s
X
1 = r(x1 , . . . , xn , y)(1 − yf ) + qi (x1 , . . . , xn , y)fi .
i=1
Considering this expression as an element in F (x1 , . . . , xn )[y], set y = f1 . This gives the identity as
a rational expression
s
X 1
1= qi x1 , . . . , xn , fi (x1 , . . . , xn ).
i=1
f
Let m be the maximum power of y appearing in any polynomial qi . Then multiplying this rational
expression by f m clears the denominators and returns an element in F [x1 , . . . , xn ], namely
s
X 1
fm = f m qi x1 , . . . , xn , fi (x1 , . . . , xn ).
i=1
f
m
√ pi = f qi (x1 , . . . , xn , 1/f (x1 , . . . , xn )) establishes the desired result (12.6). Thus,
Hence, setting
I(V(I)) ⊆ I and the theorem follows.
Example 12.3.8. Consider the polynomials f1 (x, y) = (x − 1)2 + y 2 − 1 and f2 (x, y) = x. The
solutions separately of these polynomials correspond respectively to a circle of radius 1 centered at
(1, 0) and the y-axis. If we consider the solution set of both polynomials, we can start with the ideal
I = (f1 , f2 ) and consider the variety V(I). Geometrically, V(I) corresponds to the intersection of the
circle described above and the y-axis. Hence, V(I) = {(0, 0)}. It is easy to see that I(V(I)) = (x, y).
On the other hand, note that f1 (x, y) = x2 − 2x + y 2 . Since x2 − 2x = (x − 2)x, we see that
I ⊆ (x, y 2 ). But x = f2 (x, y) ∈ I and y 2 = f1 (x, y) − (x − 2)f2 (x, y), so (x, y 2 ) ⊆ I. This shows that
I = (x, y 2 ). So this is the simplest expression of I. Note that this differs √ from (x, y) since y ∈ / I.
However, this does give an example of the Strong Nullstellensatz that I = (x, y) = I(V(I)). 4
We summarize the results in this section to give a complete correspondence between affine vari-
eties and ideals.
630 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS
are inclusion-reversing functions. The function I is an injection with V(I(V )) = V for all
affine varieties V in AnF .
Proof. All that is left to show is that I is injective. Let V be an affine variety V(f1 , f2 , . . . , fs ).
Suppose that f ∈ I(V ). By definition f (c) = 0 for all c ∈ V . Thus, c ∈ V(I(V )) for all c ∈ V and so
V ⊆ V(I(V )). Conversely, f1 , f2 , . . . , fs ∈ I(V ) by definition of I(V ). Thus, (f1 , f2 , . . . , fs ) ⊆ I(V ).
Since V is inclusion reversing, we deduce that V(I(V )) ⊆ V(f1 , f2 , . . . , fs ) = V .
Proof. Let V be the affine variety V = V(f1 , f2 , . . . , fs ). √ Then V = V(I), where I is the ideal
I = (f1 , f2 , . . . , fs ). By the Strong Nullstellensatz, I(V ) = I. In particular, the image of I from
the set of affine varieties in AnF is the set of radical ideals. By Theorem
√ 12.3.9, V(I(V )) = V . Again
by the Strong Nullstellensatz, if I is a radical ideal, then I(V(I)) = I = I.
Together, the above theorems give strong results about affine varieties, and by extension to
solution sets of systems of polynomial equations. These results are in some sense stronger than
properties of systems of linear equations studied in linear algebra. For example, a system of the
form (
a11 x + a12 y + a13 z − b1 = 0
a21 x + a22 y + a23 z − b2 = 0
corresponds to the intersection of two planes in R3 . If the planes are in general position, their
intersection is a line. The pair of equations does not correspond uniquely to the solution set since
there are many pairs of planes that intersect in L. According to Theorem 12.3.10, if we work over
C, though the system of equations will not uniquely correspond to the solution set, the ideal
which is a radical ideal, does correspond uniquely to the affine variety (solution set) L.
12.4. POLYNOMIAL DIVISION; MONOMIAL ORDERS 631
12.4
Polynomial Division; Monomial Orders
The study of systems of polynomial equations in F [x1 , x2 , . . . , xn ], where F is a field, generalizes two
areas of algebra studied in previous courses: (1) polynomial equations in one variable; (2) systems
of linear equations. Techniques used in each of these areas inspire algorithms that are relevant for
solving systems of polynomial equations.
Recall that F [x] is a Euclidean domain with the degree serving as a Euclidean function. Conse-
quently, F [x] is a PID and also a UFD. Because F [x] is a Euclidean domain, it is easy to tell when a
polynomial f (x) ∈ F [x] is in an ideal I = (p(x)) of F [x]: if and only if the remainder of f (x), when
divided by p(x), is 0. Furthermore, the Euclidean Algorithm gives a method to calculate the great-
est common divisor of two polynomials. In contrast, though the ring of multivariable polynomials
F [x1 , x2 , . . . , xn ] is a UFD (Theorem 6.5.5), for n > 1, it is not a PID and hence it is not a Euclidean
domain. Consequently, it is much harder to tell when a polynomial f ∈ F [x1 , x2 , . . . , xn ] is in an
ideal. Consequently, among problems we would like to solve in the study of F [x1 , x2 , . . . , xn ], is the
Ideal Membership Problem: how to decide if f ∈ I.
The Gauss-Jordan elimination algorithm on a system of linear equations gives the solutions to the
system as a parametrization. Hence, as a part of solving systems of polynomial equation, we consider
the Problem of Parametrizing Varieties: Given a system of polynomial equations in x1 , x2 , . . . , xn ,
find parameters t1 , t2 , . . . , tm and rational functions g1 , g2 , . . . , gn such that
xi = gi (t1 , t2 , . . . , tm )
632 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS
for all (t1 , t2 , . . . , tm ) in some set give solutions to the system of equations. As a reverse and related
problem, we will consider the Implicitization Problem: Given a parametrization,
x = g1 (t1 , t2 , . . . , tm )
1
..
.
x = g (t , t , . . . , t ),
n n 1 2 m
where gi are rational functions, find polynomials f1 , f2 , . . . , fs in F [x1 , x2 , . . . , xn ] such that the set
parametrized by the gi functions is the solution set to fi (x1 , x2 , . . . , xn ) = 0 with 1 ≤ i ≤ s.
In this section and the next two, we introduce a variety of algorithms associated to multivariable
polynomials rings over a field.
In other words, the Gauss-Jordan algorithm performs certain operations first on x1 , then on x2 ,
and so forth. The choice of monomial order in the case of the single variable is natural, especially
because of the Euclidean division on polynomials. The choice of ordering variables in the Gauss-
Jordan elimination algorithm is arbitrary.
To define a partial order on monomials xα in the variables x1 , x2 , . . . , xn is tantamount to choosing
a partial order < on Nn so that xα < xβ if and only if α < β. From now on, when we discuss partial
orders on monomials, we view them equivalently as orders on Nn .
Not every partial order on the monomials of x1 , x2 , . . . , xn will be useful for algorithms. Often, the
partial order on monomials is necessary to decide the leading term of a polynomial, i.e., the greatest
monomial with respect to a specified order. To ensure that an algorithm can always proceed, two
monomials should always be comparable. This means that for all α, β ∈ Nn we would like for α 4 β
or β 4 α. Hence, we will only consider total orders on Nn .
As another requirement, we need for algorithms to terminate. By Proposition 1.4.21, this re-
quirement translates into the property that with respect to 4 on Nn every nonempty subset S ⊆ Nn
contains a least element, namely that 4 is a well-ordering.
Finally, for many algorithms on monomials it turns out to be convenient that an order on the
monomials is preserved during multiplication. In other words,
xα 4 xβ =⇒ xα xγ 4 xβ xγ for all γ ∈ Nn .
Definition 12.4.1
A partial order 4 on Nn is called a monomial order if
(1) 4 is a total order (α 4 β or β 4 α for all α, β ∈ Nn );
(2) 4 is a well-ordering (every nonempty subset of Nn has a least element);
There are a variety of monomial orders commonly used in algorithms on multivariable polynomial
rings. In the following examples, we prove that the lexicographic order is a monomial order but leave
proofs for other examples to the exercises. The exercises also discuss other monomial orders.
12.4. POLYNOMIAL DIVISION; MONOMIAL ORDERS 633
Example 12.4.2 (Lexicographic, I). The natural numbers (N, ≤) is a totally ordered set. Sec-
tion 1.4.6 described the lexicographic order on Cartesian products of posets. The lexicographic order
≤lex on Nn is defined by α = β if and only if αi = βi for all 1 ≤ i ≤ n and
For example, in N4 we have (2, 7, 1, 8) ≤lex (3, 1, 4, 1) and (1, 7, 5, 21) ≤lex (1, 7, 7, 2).
We show that the lexicographic order (often abbreviated to the lex order ) is a monomial order.
Let α, β ∈ Nn be distinct and let j be the least index for which αj 6= βj . Since ≤ is a total order on
N, we deduce that αj < βj or else αj > βj . Therefore, α ≤lex β or β ≤lex α and so ≤lex is a total
order. Also, for all γ ∈ Nn , the first index in which α + γ differs from β + γ is j and since αj < βj ,
then αj + γj < βj + γj so α + γ ≤lex β + γ.
Finally, we can show that ≤lex is a well-ordering. Let S be any nonempty subset of Nn . Call
S = S0 . Recursively define the following integers and sets
Assuming that Si−1 is nonempty, the integer ci exists by virtue of the well-ordering of ≤ on N and
therefore, Si exists and is nonempty. By induction, ci and Si exist for all 1 ≤ i ≤ n. By construction,
S0 ⊇ S1 ⊇ · · · ⊇ Sn
Note that equivalently, α <lex β if and only if the leftmost nonzero entry of β − α is positive. 4
Example 12.4.3 (Lexicographic, II). The previous example described the lexicographic order
on Nn but it is not as general as it could be. According to Example 12.4.2,
However, we can also define a lexicographic order in which the variables are ordered differently.
As a specific example, consider the lexicographic order on monomials in x, y, z such that y > z >
x. Writing the terms of the polynomial f (x, y, z) in Example 12.4.2 in decreasing order gives
Since there are n! ways of ordering the variables x1 , x2 , . . . , xn , there are n! lexicographic monomial
orders on n variables. 4
Example 12.4.4 (Graded Lexicographic). Lexicographic orders are a natural order to devise
for monomials. However, it is sometimes desirable to group together monomials of the same total
degree. For example, in the expression
monomials of same total degree are gathered. These are the homogeneous components of g.
634 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS
With respect to some order on the variables (as described in Example 12.4.3), we define the
graded lexicographic order ≤grlex on Nn by
(
|α| < |β| if |α| =
6 |β|
α <grlex β ⇐⇒
α <lex β if |α| = |β|.
In other words, the graded lexicographic (or more briefly grlex ) order first distinguishes monomials
by their total degrees and then among monomials of the same total degree, distinguishes by a lex
order.
Writing the terms of g(x, y, z) in decreasing grlex order with x > z > y gives
Example 12.4.5 (Graded Reverse Lexicographic). With respect to some order on the vari-
ables, the graded reverse lexicographic order ≤grevlex on Nn is defined by α <grevlex β if and only if
α1 + α2 + · · · + αn < β1 + β2 + · · · + βn , or α1 + α2 + · · · + αn = β1 + β2 + · · · + βn and for the least
xi such that αi 6= βi , we have βi < αi .
Reverse lexicographic order corresponds to reversing both the order on the variables and the
partial order of comparison for the powers. For example, assuming that x1 > x2 > x3 > x4 , over
N4 , we have
α = (4, 1, 2, 3) <grevlex (0, 5, 1, 3) = β
because the rightmost entry in which α and β differ—the third entry—has β3 < α3 . So the graded
reverse lexicographic (or more briefly grevlex ) order distinguishes between monomials first by total
degree and then by a reverse lexicographic.
As a specific example, note that
yz 4 >grevlex xy 2 z 2
because they have the same total degree but, starting from the smallest variable (y), the power on
y of yz 4 , namely 1, is less than the power on y of xy 2 z 2 .
Writing the terms of the polynomial g(x, y, z) in decreasing grevlex order with x > z > y gives
Similarly, writing the terms of g(x, y, z) in decreasing grevlex order with y > z > x gives
As we saw in the above examples, since a monomial order ≤ is a total order, we can follow
the habit with polynomials of a single variable of writing the terms of a multivariable polynomial
in decreasing order with respect to ≤. It is useful to have a notation for the leading term of a
polynomial with respect to a given monomial order.
Definition 12.4.6
Let 4 be a fixed monomial order on Nn . Let p ∈ F [x1 , x2 , . . . , xn ] and suppose that aα xα
is the term of p with the largest (with respect to 4) multidegree. Then
Definition 12.4.1(3) leads to the important result that is useful for many algorithms.
12.4. POLYNOMIAL DIVISION; MONOMIAL ORDERS 635
Proposition 12.4.7
Let f, g ∈ F [x1 , x2 , . . . , xn ]. With respect to any monomial order,
(1) mdeg(f g) = mdeg(f ) + mdeg(g);
p(x) = a(x)q(x) + r(x) where r(x) = 0 or deg r(x) < deg a(x).
Recall that q(x) is called the quotient and r(x) is the remainder.
From the perspective of ideals in F [x], polynomial division corresponds to finding an element
r(x) with r(x) = 0 or deg r(x) < deg a(x) such that f (x) − r(x) ∈ (a(x)). This shows that the ideal
membership problem in F [x] is trivial. Every ideal I in F [x1 , x2 , . . . , xn ] is principal so I = (a(x)).
Thus, f (x) ∈ I if and only if the remainder of f (x) when divided by a(x) is 0.
In the context of a multivariable polynomial ring F [x1 , x2 , . . . , xn ], ideals are no longer neces-
sarily principal. However, by Hilbert’s Basis Theorem, every ideal I is finitely generated with say
I = (a1 , a2 , . . . , as ). So a multivariable polynomial division should allow for multiple divisors. An
algorithm for such a division should take a polynomial f , a list of polynomials a1 , a2 , . . . , as and
return a list q1 , q2 , . . . , qs and a polynomial r such that
f = a1 q1 + a2 q2 + · · · + as qs + r
and no term of r is divisible by LT(ai ) for all 1 ≤ i ≤ s. Note that the reference to a leading term
means that this division algorithm must be done in reference to a specific monomial order 4.
The following algorithm implements a multivariable polynomial division.
g←f
r←0
for j ← 1 to s
do qj ← 0
whileg 6= 0
i←1
while i ≤ s
if LM(ai ) | LM(g)
qi ← qi + LT(g)/LT(ai )
then g ← g − (LT(g)/LT(ai ))ai
do
do i←1
else i ← i + 1
if i = s +
1
r ← r + LT(g)
then
g ← g − LT(g)
return (r, (q1 , q2 , . . . , qs ))
636 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS
Every time g is replaced by g − (LT(g)/LT(ai ))ai , the leading term of (LT(g)/LT(ai ))ai is LT(g)
so the difference polynomial g − (LT(g)/LT(ai ))ai removes the leading term of g. Therefore, each
time g is changed, either g becomes 0 or
LT(g)
LT g − ai ≺ LT(g).
LT(ai )
If instead no LT(g) is not divisible by any LT(ai ), then in the conditional statement “if i = s + 1”
the leading term of g is passed to the remainder r. Then again, the new leading term of g is strictly
less. Consequently, through each iteration of the outermost while loop, the leading terms of g
create a strictly decreasing sequence of monomials, starting with LT(f ). Since a monomial order
is a well-ordering, the sequence of leading monomials terminates by Proposition 1.4.21. Thus, the
algorithm terminates.
It is not hard to check that the identity g = a1 q1 + a2 q2 + · · · + as qs + r is preserved through
each while loop. Consequently, f − r = a1 q1 + a2 q2 + · · · + as qs at the end of the algorithm. These
remarks prove the following proposition.
Proposition 12.4.8
The algorithm MultiPolyDivision terminates. None of the terms of r are divisible a
leading term LT(ai ). Furthermore, r = 0 or mdeg(r) 4 mdeg(f ).
Definition 12.4.9
We call r the remainder of f divided by the s-tuple G = (a1 , a2 , . . . , as ) and we denote r
by rem (f, G).
It is important to notice that the algorithm MultiPolyDivision depends not only on the
monomial order chosen, but also on the order of the polynomials in the list (a1 , a2 , . . . , as ). The
algorithm always tries to divide g by a1 and once the leading term of g is not divisible by the leading
term of a1 , attempts to divide g by a2 , and so forth.
Example 12.4.10. Consider f (x, y) = 2x2 y + 3xy 2 + 4y 2 ∈ R[x, y] and let I = (x2 − xy + 1, y 2 − 1).
We set a1 = x2 −xy+1 and a2 = y 2 −1. Let us use the lexicographic order with x > y. Implementing
the above division algorithm and keeping track of terms in a vein similar to polynomial long division,
we get the following calculation.
q1 = 2y
q2 = 5x + 4 r
x2 − xy + 1
2x2 y+3xy 2 +4y 2
y2 − 1
2x2 y−2xy 2 +2y
5xy 2 +4y 2 −2y
5xy 2 −5x
5x +4y 2 −2y → 5x
2
4y −2y
4y 2 −4
−2y+4 → 5x − 2y
4 → 5x − 2y + 4
0
The order in which terms get added to q1 , q2 , and r is: (1) 2y is added to q1 because LT(a1 ) = x2
divides 2x2 y; (2) 5x is added to q2 because LT(a1 ) = x2 does not divide 5xy 2 but LT(a2 ) = y 2 does;
(3) the term 5x moves over to the remainder column because it is not divisible by either LT(a1 ) or
12.4. POLYNOMIAL DIVISION; MONOMIAL ORDERS 637
LT(a2 ); (4) 4 is added to q2 because LT(a2 ) divides 4y 2 ; (5) after that none of the terms in g are
divisible by leading terms of a1 or a2 so the rest moves over to r.
The result of this calculation is that
As the following example shows, the monomial order changes the result of the polynomial division
algorithm.
Example 12.4.11. Consider the same polynomial and the same ideal as in the previous example
but use the lexicographic order with y > x. The polynomial division algorithm would be the
following. (Note that we continue to list the terms of polynomials in decreasing order with respect
to the monomial order.)
q1 = −3y − 5x
q
2=4 r
−xy + x2 + 1
3xy 2 +4y 2 +2x2 y
y2 − 1
3xy 2 −2x2 y−3y
4y 2 +5x2 y+3y
4y 2 −4
5x2 y+3y +4
5x2 y−5x3 −5x
3y +5x3 +5x + 4 → 3y + 5x3 + 5x + 4
0
In the last stage, we passed all the terms of the intermediate polynomial g to r because none of them
are divisible by LM(a1 ) = −xy or LM(a2 ) = y 2 . Interestingly enough, the term 5x3 is divisible by
x2 , but, in the lex order with y > x, the term x2 in a1 is not the leading term.
This division algorithm leads to the polynomial division of
q1 = 3x + 4
q2 = −2x r
y2 − 1
3xy 2 +4y 2 +2x2 y
−xy + x2 + 1
3xy 2 −3x
4y 2 +2x2 y+3x
4y 2 −4
2x2 y+3x +4
2x2 y−2x3 −2x
2x3 +5x+4 → 2x3 + 5x + 4
0
638 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS
It is essential to notice that remainder r is a different polynomial in each of the above three
cases. In other words, the division algorithm depends both on the monomial order chosen for the
division algorithm and on the order in which the generators of the ideal are listed for the purposes
of the algorithm.
This nonuniqueness of the remainder makes the Ideal Membership Problem more challenging
than in the case of the Euclidean domain F [x]. The difference of (12.7) and (12.8), and then divided
by 5, gives the combination
(x + y)(x2 − xy + 1) + x(y 2 − 1) = x3 + y.
In particular, x3 + y is in the ideal (x2 − xy + 1, y 2 − 1). However, with respect to the lex order
with y > x, none of the terms of y + x3 is divisible by the leading terms of a1 = −xy + x2 + 1 and
a2 = y 2 − 1. Consequently, it is not true that a polynomial p is in some ideal I if its remainder is 0
when divided by a generating set of I.
Maple Function
with(Groebner); Imports a library of commands for ideals of multivariable polynomial
rings, including polynomial division and Gröbner bases. The following
command is in the Groebner package.
NormalForm Multivariable polynomial division. The command NormalForm(f,G,T),
where f is a polynomial, G is a list of polynomials, and T is a mono-
mial order, implements the multivariable polynomial division algorithm
described in this section.
Maple has commands to define monomial orderings necessary for these
algorithms. Consult Maple help files to see how to define them.
7. Let w ∈ (Q>0 )n . Define the weighted lexicographic (briefly wlex) order on Nn , weighted by w, as
(
w · α < w · β if w · α 6= w · β
α <w β ⇐⇒
α <lex β if w · α = w · β,
where the lexicographic order is with respect to some order on the variables x1 , x2 , . . . , xn .
(a) Prove that for any weighted vector w and any order on the variables, the order ≤wlex is a
monomial order.
(b) Prove that with w = (1, 1, . . . , 1), we recover the graded lexicographic order.
8. Let w = (1, 2, 3) and consider the w-weighted lexicographic order defined in Exercise 12.4.7.
9. Let w ∈ (R>0 )n such that w1 , w2 , . . . , wn are linearly independent over Q. Define the weighted
lexicographic order on Nn weighted by w, written ≤w , as
α <w β ⇐⇒ w · α < w · β.
α <M β ⇐⇒ M α <lex M β,
18. The image of the parametric curve ~r(t) = (t, t2 , t3 ) with t ∈ R is called a twisted cubic.
(a) Prove that the twisted cubic is an affine variety by explicitly showing that it is V(y − x2 , z − x3 ).
(b) Let f ∈ R[x, y, z]. Prove that the result of the division algorithm of f by a1 = y − x2 and
a2 = z − x3 using the lexicographic order with z > y > x gives a polynomial r(x).
19. Consider the circle C in the yz-plane of radius 1 and center (y, z) = (2, 0).
(a) Show that C is an affine variety C = V(x, (y − 2)2 + z 2 − 1) in R3 .
(b) Let f (x, y, z) = (x2 + y 2 + z 2 − 5)2 − 16(1 − z 2 ). Prove that f (x, y, z) is the torus obtained by
rotating around the z-axis the circle in the xz-plane of radius 1 and center (2, 0, 0).
(c) Show that I = (x, (y − 2)2 + z 2 − 1) is a radical ideal.
(d) Show from geometric reasoning that f ∈ I.
(e) Find q1 and q2 such that f = xq1 + ((y − 2)2 + z 2 − 1)q2 .
12.5
Gröbner Bases
The previous section concluded with the observation that x3 + y ∈ (x2 − xy + 1, y 2 − 1) but that,
using the lexicographic order with y > x, the polynomial y +x3 is its own remainder when divided by
the pair (yx − x2 + 1, y 2 − 1). Hence, the polynomial division algorithm was generally not sufficient
to solve the ideal membership problem.
Example 12.5.1. As a more striking example, consider the ideal I = (xy 2 + 1, x2 y − 1) in C[x, y].
The polynomial x + y is in I because
x + y = x(xy 2 + 1) − y(x2 y − 1).
However, x + y is its own remainder when divided by the pair (xy 2 + 1, x2 y − 1) with respect to any
monomial order. (See Proposition 12.5.5.) In Exercise 12.5.14, we show that I = (x + y, y 3 − 1).
Now x + y, xy 2 + 1, and x2 y − 1 divided by the pair (x + y, y 3 − 1) and with the lex order with x > y
all have a remainder of 0. 4
The above examples illustrate that the generating set (basis) of I affects the output of the
multivariable polynomial division algorithm. We are led to think that there may be a basis that is
better than others. The problem with some bases is that the leading terms of the generators might
not divide the leading terms of all polynomials in the ideals. This section studies this interplay more
closely and shows that there always exists a “better” generating sets for ideals.
Definition 12.5.2
An ideal I ⊆ F [x1 , x2 , . . . , xn ] is called a monomial ideal if there is a subset A ∈ Nn such
that I = (xα | α ∈ A).
For example, I = (x3 yz, xy 2 , z 4 ) is a monomial ideal in F [x, y, z]. Since an ideal is closed under
addition, monomial ideals do not consist of just monomials. The definition makes no assumption
that the set A is finite. Hilbert’s Basis Theorem tells us that any ideal I, including monomial
ideals, is generated by a finite number of polynomials but it is not obvious that a monomial ideal is
generated by a finite number of monomials. We need to characterize monomial ideals.
As a point of notation, we will generally denote a finite list of multidegrees by α(1), α(2), . . . , α(s)
to distinguish from the indices of each n-tuple in Nn . Thus, for each i with 1 ≤ i ≤ s, we have
α(i) = (α(i)1 , α(i)2 , . . . , α(i)n ).
12.5. GRÖBNER BASES 641
α2
(2, 3)
(5, 1)
α1
Proposition 12.5.3
Let I = (xα | α ∈ A) be a monomial ideal in F [x1 , x2 , . . . , xn ]. Then f ∈ I if and only if
every term of f is divisible by some monomial xα with α ∈ A.
Proof. By Hilbert’s Basis Theorem, and in particular Corollary 12.1.20, I is finitely generated with
I = (f1 , f2 , . . . , fr ) for some polynomial fi ∈ I. Let S = {xα | α ∈ A0 } be the set of all monomials
occurring in the polynomials fi with 1 ≤ i ≤ r. Thus, I ⊆ (S). Furthermore, the set A0 is finite
since each polynomial consists of a finite number of terms. By Proposition 12.5.3, each monomial in
S is in the monomial ideal I. Thus, S ⊆ I and therefore (S) ⊆ I. Hence, I = (S), so I is generated
by a finite number of monomials occurring in S.
Dickson’s Lemma and Proposition 12.5.3 show that every monomial ideal I in F [x1 , x2 , . . . , xn ]
consists of polynomials whose terms are multiples of a certain finite set of monomials. For monomial
ideas in a polynomial ring of 2 variables, this lends itself well to a visual diagram. For example,
I = (x2 y 3 , x5 y) consists of all polynomials whose monomials are in the shaded area in Figure 12.1.
Dickson’s Lemma can be proved without reference to Hilbert’s Basis Theorem. However, because
of the order of our presentation, Dickson’s Lemma follows immediately from it. Surprisingly, this
lemma leads to a simpler characterization of monomial ideals.
Proposition 12.5.5
Let 4 be a partial order on Nn that satisfies
(1) 4 is a total order;
(2) if α 4 β, then α + γ 4 β + γ for all γ ∈ Nn .
Then 4 is a well-ordering if and only if 0 4 α for all α ∈ Nn .
642 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS
Proof. Suppose that 4 is a well-ordering. Let δ be the least element in Nn . Assume that α is not
the 0 element in Nn . Then δ ≺ 0. However, by adding δ to both sides of the inequality, 2δ ≺ δ. This
contradicts the minimality of δ. We conclude that δ = 0.
Conversely, suppose that 0 4 α for all α ∈ Nn . By (2) and 0 4 α, then β 4 α + β for all
β ∈ Nn . In other words, xβ divides xα+β . Let A ⊆ Nn be any subset. By Dickson’s Lemma, the
ideal I = (xα | α ∈ A) is equal to (α(1), α(2), . . . , α(s)) for some finite list of s monomials. By
Proposition 12.5.3, every element in A is 4-less than or equal some α(i). By (1), 4 is a total order
so {α(1), α(2), . . . , α(s)} has a least element δ. Hence, all elements α ∈ A satisfy δ 4 α. Therefore,
every subset A of Nn has a least element, which proves that 4 is a well-ordering.
Definition 12.5.6
Fix a monomial order 4 on Nn . A finite subset G = {g1 , g2 , . . . , gs } of an ideal I ⊆
F [x1 , x2 , . . . , xn ] is a Gröbner basis of I with respect to 4 if, as ideals,
where (LT(I)) is the monomial ideal generated by the leading terms of elements in I.
Proposition 12.5.7
Fix a monomial order 4. Every ideal I in F [x1 , x2 , . . . , xn ] other than {0} has a Gröbner
basis. Furthermore, if G = {g1 , g2 , . . . , gs } is a Gröbner basis of I, then I = (g1 , g2 , . . . , gs ).
Proposition 12.5.8
Fix a monomial order. Let G be a Gröbner basis of an ideal I ⊆ F [x1 , x2 , . . . , xn ] and let
f be a polynomial. There exists a unique r ∈ F [x1 , x2 , . . . , xn ] such that f = g + r with
g ∈ I and such that no term of r is divisible by any monomial LT(gi ).
Proof. Let G = {g1 , g2 , . . . , gs }. The existence of r with the stated property follows from the
procedure of polynomial division: the algorithm returns r, q1 , q2 , . . . , qs ∈ F [x1 , x2 , . . . , xn ] such that
f = q1 g1 + q2 g2 + · · · + qs gs + r
Definition 12.5.9
The unique polynomial r in Proposition 12.5.8 is called the normal form of f by the Gröbner
basis G.
Because the r described in Proposition 12.5.8 are unique, the result of the polynomial division
algorithm is independent of the order chosen for the elements of G. From a notational standpoint,
when G is a Gröbner basis, rem (f, G) is well-defined without specifying an order on the elements of
G. Furthermore, because of Proposition 12.5.8, Gröbner bases solve the Ideal Membership Problem
in the following sense.
Corollary 12.5.10
Fix a monomial order 4. Let I be an ideal in F [x1 , x2 , . . . , xn ]. Then f ∈ I if and only if
the remainder of f is 0 when divided by a Gröbner basis G of I with respect to 4.
Having proven the existence and some first nice properties about Gröbner bases, we are missing
an essential component to make these results practical: (1) How can we tell if a generating set of
an ideal is a Gröbner basis? (2) Given a set of generators for an idea I, how can we find a Gröbner
basis of I? We finish the section by addressing the first question but must leave the second question
for the following section.
By usual properties of minimum and maximum, xα xβ = lcm(xα , xβ ) gcd(xα , xβ ). For any two
polynomials f, g ∈ F [x1 , x2 , . . . , xn ], the least common multiple gcd(LM(f ), LM(g)) divides both
644 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS
Definition 12.5.11
Fix a monomial order on Nn . The S-polynomial of f, g ∈ F [x1 , x2 , . . . , xn ] is
LT(g) LT(f )
S(f, g) = f− g. (12.9)
gcd(LM(f ), LM(g)) gcd(LM(f ), LM(g))
By construction, S(f, g) is in any ideal that contains both f and g. Also, since both polynomials
in the difference in (12.9) have the same leading term, the difference cancels out these leading terms.
Hence, the terms of S(f, g) come from only the nonleading terms of f and g.
Example 12.5.12. Let f = 2x3 yz − 3x2 z 2 + 7yz and g = xy 2 + 7xyz 2 − 2 in R[x, y, z]. First
suppose that we use the lexicographic order with x > y > z. We note that gcd(LM(f ), LM(g)) =
gcd(x3 yz, xy 2 ) = xy. Then
Suppose now that we use the graded lexicographic order with x > y > z. With this monomial order,
gcd(LM(f ), LM(g)) = gcd(x3 yz, xyz 2 ) = xyz. Then
In the next section, the S-polynomials will play a key role in finding a Gröbner basis of an ideal.
However, they also give us a characterization for when a given set of polynomials {g1 , g2 , . . . , gs } is
a Gröbner basis of the ideal (g1 , g2 , . . . , gs ). We conclude this section with a proof and examples of
this characterization.
Lemma 12.5.13
Fix a monomial order. Let f1 , f2 , . . . , fs ∈ F [x1 , x2 , . . . , xn ] with mdeg fi = δ for all i and
let c1 , c2 , . . . , cs ∈ F . If
mdeg(c1 f1 + c2 f2 + · · · + cs fs ) ≺ δ,
Ps
then i=1 ci fi is an F -linear combination of the 2s polynomials S(fi , fj ) with 1 ≤ i <
Proof. The hypotheses require that the leading monomial of each fi is the same. Since mdeg(c1 f1 +
c2 f2 + · · · + cs fs ) ≺ δ, then
Then we have pij = fi0 − fj0 . Let us also call more simply pi = pi,i+1 so that fi0 = pi + fi+1
0
. With
this property, we have
c1 f1 + c2 f2 + · · · + cs fs
= c1 LC(f1 )f10 + c2 LC(f2 )f20 + · · · + cs−1 LC(fs−1 )fs−1
0
+ cs LC(fs )fs0
= c1 LC(f1 )(p1 + f20 ) + c2 LC(f2 )f20 + · · · + cs−1 LC(fs−1 )fs−1
0
+ cs LC(fs )fs0
= c1 LC(f1 )p1 + (c1 LC(f1 ) + c2 LC(f2 ))f20 + · · · + cs−1 LC(fs−1 )fs−1
0
+ cs LC(fs )fs0
= c1 LC(f1 )p1 + (c1 LC(f1 ) + c2 LC(f2 ))(p2 + f30 ) + · · · + cs−1 LC(fs−1 )fs−1
0
+ cs LC(fs )fs0
..
.
= c1 LC(f1 )p1 + (c1 LC(f1 ) + c2 LC(f2 ))p2 + · · · + (c1 LC(f1 ) + · · · + cs−1 LC(fs−1 ))ps−1
+ (c1 LC(f1 ) + c2 LC(f2 ) + · · · + cs LC(fs ))fs0 .
By (12.10), the last term is 0. Since pi = S(fi , fi+1 )/(LC(fi )LC(fi+1 )), the linear combination of
polynomials c1 f1 + c2 f2 + · · · + cs fs is an F -linear combination of the S-polynomials S(fi , fj ), and
more precisely of the s − 1 S-polynomials, S(fi , fi+1 ) for 1 ≤ i ≤ s − 1.
rem (S(gi , gj ), G) = 0,
Now since mdeg(qi − LT(qi )) ≺ mdeg(qi ), then mdeg((qi − LT(qi ))gi ) ≺ mdeg(qi gi ) = δ. We now
consider only the first sum in (12.12), starting with the two observations that: mdeg(LT(qi )gi ) = δ
for all 1 ≤ i ≤ t; and the assumption that mdeg f ≺ δ implies that the first sum in (12.12) satisfies
the conditions of Lemma 12.5.13. Consequently, there exist constants aij ∈ F such that
t
X X
LT(qi )gi = aij S(LT(qi )gi , LT(qj )gj ). (12.13)
i=1 1≤i<j≤t
We now relate the S-polynomials in (12.13) to the S(gi , gj ) S-polynomials. Since mdeg LT(qi )gi =
mdeg LT(qj )gj = δ, then
S(LT(qi )gi , LT(qj )gj ) = LC(qj gj )LT(qi )gi − LC(qi gi )LT(qj )gj ,
and mdeg S(LT(qi )gi , LT(qj )gj ) ≺ δ. Note that xδ = LM(gi )LM(qi ). We further have
By the hypothesis that S(gi , gi ) has a remainder of 0 when divided by {g1 , g2 , . . . , gs } listed in
any order, we deduce that there exist polynomials bijk ∈ F [x1 , x2 , . . . , xn ] such that for all i and j
with 1 ≤ i < j ≤ t,
s
X
S(gi , gj ) = bijk gk .
k=1
The polynomials bijk arise from the division algorithm and by that algorithm, mdeg(bijk gk ) ≺
mdeg S(gi , gj ) for all i, j, k with 1 ≤ i < jt and 1 ≤ k ≤ s. Consequently,
xδ xδ
mdeg bijk gk 4 mdeg S(gi , gj ) ≺ δ, (12.14)
lcm(LM(gi ), LM(gi )) lcm(LM(gi ), LM(gi ))
where the last strict inequality holds because lcm(LM(gi ), LM(gi )) is precisely the leading terms that
are canceled in the difference in the S-polynomial S(gi , gj ). In the summation,
t s
!
X X xδ X
LT(qi )gi = aij LC(qi )LC(qj ) bijk gk
lcm(LM(gi ), LM(gi ))
i=1 1≤i<j≤t k=1
s δ
X X x
= aij bijk gk ,
lcm(LM(gi ), LM(gi ))
k=1 1≤i<j≤t
denote by qk0 the coefficient polynomial of gk so that this expression becomes q10 g1 + q20 g2 + · · · + qs0 gs .
By (12.14), mdeg(qi0 gi ) ≺ δ. Combining this result of with the decomposition of f in (12.12),
produces another linear combination f = q100 g1 + q200 g2 + · · · + qs00 gs in which mdeg(qi00 gi ) ≺ δ. This
contradicts the minimality definition of δ. Thus, the assumption that mdeg(f ) ≺ δ is false so we
conclude that mdeg f = δ.
12.5. GRÖBNER BASES 647
Returning to (12.11) but with the knowledge that mdeg f = δ, we have LM(f ) = LM(qi gi ) =
LM(qi )LM(gi ) for some i. Then we deduce that the leading term LT(f ) is divisible by some
LT(gi ). Thus, LT(f ) ∈ (LT(g1 ), LT(g2 ), . . . , LT(gs )). Since f was arbitrary in I, we deduce that
(LT(I)) ⊆ (LT(g1 ), LT(g2 ), . . . , LT(gs )) and hence these monomial ideals are equal. This proves that
{g1 , g2 , . . . , gs } is a Gröbner basis of I.
Example 12.5.15. Consider Example 12.5.1 that motivated the section. We pointed out during
the course of this section that {xy 2 + 1, x2 y − 1} is not a Gröbner basis of I = (xy 2 + 1, x2 y − 1).
In that example, we also contended that I = (x + y, y 3 − 1). Let us use the lexicographic order with
x > y. For this pair, there is only one S-polynomial, namely
S(x + y, y 3 − 1) = y 3 (x + y) − x(y 3 − 1) = x + y 4 .
S(y + x, y 3 − 1) = y 2 (y + x) − (y 3 − 1) = y 2 x + 1.
Since the remainder is not 0, then G is not a Gröbner basis of I with respect to the lex order with
y > x. 4
9. In light of Exercise 12.5.8, use a diagram as in Figure 12.1, find a minimal generating set for
(a) the intersection (x8 y, x3 y 4 , xy 6 ) ∩ (x6 y 2 , y 7 );
(b) the product ideal (x8 y, x3 y 4 , xy 6 )(x6 y 2 , y 7 );
(c) the sum ideal (x8 y, x3 y 4 , xy 6 ) + (x6 y 2 , y 7 ).
10. In the polynomial ring F [x, y, z], let I1 = (x3 y 2 z, xyz 4 ) and I2 = (y 3 z, xy 2 z 3 ). Find a minimal
generating set for: a) I1 + I2 ; b) I1 I2 ; c) I1 ∩ I2 .
Let I = (xα | α ∈ A) be √
11. √ a monomial ideal in F [x1 , x2 , . . . , xn ], where A is a subset of Nn . Prove that
I is a monomial ideal I = (xs(α) | α ∈ A), where
(
1 if αi ≥ 1
s(α) = (s1 , s2 , . . . , sn ) where si =
0 if αi = 0.
12. Calculate the S-polynomials of the following pairs, with respect to the stated monomial order.
(a) a1 = x3 + y 3 − 3xy and a2 = x4 + y 4 − 1 with lexicographic order with x > y.
(b) a1 = 2xy 3 − 7y 3 + x2 and a2 = x3 − 5xy with lexicographic order with x > y.
(c) a1 = 2xy 3 − 7y 3 + x2 and a2 = x3 − 5xy with grlex order with x > y.
(d) a1 = 2xy 3 − 7y 3 + x2 and a2 = x3 − 5xy with grevlex order with x > y.
13. Calculate the S-polynomials of the following pairs, with respect to the stated monomial order.
(a) a1 = 3x2 y + 2xyz 2 and a2 = x3 yz − 4xy 2 with lex order with x > y > z.
(b) a1 = 3x2 y + 2xyz 2 and a2 = x3 yz − 4xy 2 with grlex order with x > y > z.
14. Let F be any field. Show that (xy 2 + 1, x2 y − 1) = (x + y, y 3 − 1) as ideals in F [x, y].
15. In Example 12.5.15 we showed that {x + y, y 3 − 1} is a Gröbner basis of (x + y, y 3 − 1).
(a) Show that y 3 (x + y) − x(y 3 − 1) = 1(x + y) + y(y 3 − 1).
(b) Proposition 12.5.8 shows that when G = {g1 , g2 , . . . , gs } is a Gröbner basis of the ideal (G), then
the remainder of f divided by G is unique (regardless of the order taken for the elements in G).
Use the first part of this exercise to show that the quotients q1 , q2 , . . . , qs of the remainder are
not unique (and hence depend on the order chosen for the elements of G).
16. Let G be a Gröbner basis of an ideal in F [x1 , x2 , . . . , xn ]. Use Proposition 12.5.8 to prove the following
results about remainders.
(a) rem (f + g, G) = rem (f, G) + rem (g, G) for all f, g ∈ F [x1 , x2 , . . . , xn ].
(b) rem (f g, G) = rem (rem (f, G) · rem (g, G) , G) for all f, g ∈ F [x1 , x2 , . . . , xn ].
17. Let f1 = xyz 2 + 3xz − 7, f2 = x3 − 2y 2 z + x, and f3 = 2xy + z 3 in Q[x, y, z]. Consider the ideal
I = (f1 , f2 , f3 ).
(a) Using the lex order with x > y > z, find some f ∈ I such that LT(f ) ∈
/ (LT(f1 ), LT(f2 ), LT(f3 )).
(b) Using the lex order with z > y > x, find some f ∈ I such that LT(f ) ∈
/ (LT(f1 ), LT(f2 ), LT(f3 )).
18. Suppose that I is a principal ideal in F [x1 , x2 , . . . , xn ]. Show that a set {g1 , g2 , . . . , gs } ⊆ I such that
one of the gi generates I is a Gröbner basis of I.
19. Let I be an ideal in F [x1 , x2 , . . . , xn ]. Prove that a set {g1 , g2 , . . . , gs } ⊆ I is a Gröbner basis of I if
and only if for all f ∈ I, there exists i ∈ {1, 2, . . . , s} such that LT(gi ) divides LT(f ).
20. Consider the polynomials f1 = x2 − y and f2 = x3 − z, along with the ideal I = (f1 , f2 ) ∈ R[x, y, z].
(a) Prove that {f1 , f2 } is a Gröbner basis of I with respect to the order lex with y > z > x.
(b) Prove that {f1 , f2 } is not a Gröbner basis of I with respect to the order lex with x > y > z.
21. Consider the polynomials f1 = x2 + y 3 − 2y and f2 = y 4 − 2y 2 + 1, along with the ideal I = (f1 , f2 ) ∈
R[x, y].
(a) Prove that {f1 , f2 } is a Gröbner basis of I with respect to the order lex with x > y.
(b) Prove that {f1 , f2 } is not a Gröbner basis of I with respect to the order grlex with x > y.
22. Consider the polynomials f1 = xy − xz and f2 = xz − yz, along with the ideal I = (f1 , f2 ) ∈ R[x, y, z].
Use the lex monomial ordering with x > y > z.
(a) Show that {f1 , f2 } is not a Gröbner basis of I.
(b) Let f3 = rem (S(f1 , f2 ), (f1 , f2 )). Show that {f1 , f2 , f3 } is a Gröbner basis of I.
12.6. BUCHBERGER’S ALGORITHM 649
12.6
Buchberger’s Algorithm
Proposition 12.5.7 affirms that every ideal I in F [x1 , x2 , . . . , xn ] has a Gröbner basis. The proof
(in the order presented in this book) relied on Dickson’s Lemma, which relied on Hilbert’s Basis
Theorem, the proof of which was not constructive. Consequently, the proof of the existence of a
Gröbner basis offered no algorithm to construct a basis. This section introduces such an algorithm
along with a few refinements to the concept of a Gröbner basis.
go ← true
whilego
go ← false
n ← |G|
for i ← 1 to n − 1
do j ← i + 1 to n
do for
i , gj ), G) 6= 0
if rem (S(g
do G ← G ∪ {rem (S(gi , gj , G)}
then
go ← true
return (G)
The algorithm takes a set (or an s-tuple) G = {g1 , g2 , . . . , gs } of polynomials and repeat-
edly adjoins nonzero polynomials of the form rem (S(gi , gj ), G). The algorithm terminates when
rem (S(gi , gj ), G) = 0 for all gi , gj ∈ G. By Buchberger’s Criterion, the output of this algorithm is
a Gröbner basis of the ideal (G). This algorithm does in fact terminate by virtue of the following
proposition.
Proposition 12.6.1
Suppose that G is a generating subset of an ideal I in F [x1 , x2 , . . . , xn ] and suppose that
G = G0 ⊆ G1 ⊆ G2 ⊆ · · ·
is a chain of subsets such that the set difference Gi − Gi−1 = {rem (S(a, b), Gi−1 )} for
some a, b ∈ Gi−1 with rem (S(a, b), Gi−1 ) 6= 0. Then this chain of subsets terminates.
Furthermore, the maximal element of this chain is a Gröbner basis of I.
Proof. Since S(a, b) ∈ (Gi−1 ) for any a, b ∈ Gi−1 , the ideal generated by Gi−1 is equal to the ideal
generated by Gi . Hence, (Gi ) = I for all i.
650 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS
Since the S-polynomial S(a, b) in Gi − Gi−1 does not have a remainder of 0 when divided by
Gi−1 , then LT(rem (S(a, b), Gi−1 )) ∈ / LT(Gi−1 ). Thus, (LT(Gi−1 )) is a strict subset of the monomial
ideal (LT(Gi )). Since F [x1 , x2 , . . . , xn ] is Noetherian, the chain of monomial ideals
terminates, say at (LT(Gt )). Consequently, Gt is such that rem (S(a, b), Gt ) = 0 for all a, b ∈ Gt .
By Buchberger’s Criterion, this means that Gt is a Gröbner basis of I.
The GroebnerBasis algorithm provides a first algorithm for finding a Gröbner basis of an ideal.
We will discuss natural improvements later but we present few examples first.
Example 12.6.2. Consider the ring R[x, y] and use the graded lexicographic order with x > y.
Consider the polynomials g1 = xy 2 − 3x2 and g2 = 2y 3 − 5x. Start with G = {g1 , g2 } and follow
the steps of the algorithm. This first time through, the while loop has only one calculation in the
double for loop:
S(g1 , g2 ) = 2yg1 − xg2 = −6x2 y + 5x2 ,
which is its own remainder when divided by (g1 , g2 ). Therefore, we set g3 = −6yx2 + 5x2 and replace
G with {g1 , g2 , g3 }. The go variable was changed to true so we do another iteration of the while
loop.
At the next time through the while loop, we can calculate S(g1 , g2 ) but it will have a remainder
of 0 when divided by {g1 , g2 , g3 } (in any order since) since S(g1 , g2 ) = g3 . Next, we calculate
25
S(g1 , g3 ) = 6xg1 + yg3 = −18x3 + 5x2 y rem −18x3 + 5x2 y, {g1 , g2 , g3 } = 18x3 − x2 ,
and
6
25 2
so we set g4 = −18x3 − 6 x and replace G with {g1 , g2 , g3 , g4 }. Then we calculate
Example 12.6.3. Consider the ring R[x, y, z] and use the lexicographic order with x > y > z.
Consider the polynomials g1 = xyz 2 − 3y 2 z and g2 = 2x2 z + 4xy + 1. Start with G = {g1 , g2 } and
follow the algorithm. This first time through the while loop has only one calculation in the double
for loop,
rem (S(g1 , g2 ), G) = rem −10xy 2 z − yz, G = −10xy 2 z − yz,
so set g3 = −10xy 2 z − yz and replace G with G = {g1 , g2 , g3 }. This finishes the while loop but the
go variable has switched to true so we continue.
This time, the nested for loops run through three calculations. The first is
since G now includes −10xy 2 z − yz. So G does not change. Next, we calculate
Since this is nonzero, we set g4 = 30y 3 z + yz 2 and replace G with G = {g1 , g2 , g3 , g4 }. Then we
calculate
Since this is nonzero, we set g5 = −20xy 3 + xyz − 5y 2 and replace G with G = {g1 , g2 , g3 , g4 , g5 }.
The go variable had been switched to true so we repeat the while loop again.
This time, as we run through the 10 combinations of the nested for loops, all S-polynomials
S(gi , gj ) with 1 ≤ i < j ≤ 5 have a remainder of 0 when divided by G. Thus, {g1 , g2 , g3 , g4 , g5 } is a
Gröbner basis of the ideal (g1 , g2 ). 4
As presented so far, there is some inefficiency in the pseudocode given above for Buchberger’s
Algorithm. It is inefficient to calculate an S-polynomial S(gi , gj ) for the same pair (i, j) more than
once. First, if rem (S(gi , gi ), G) = 0 at some stage, it will remain 0 at a later stage when G is a
larger set of polynomials. Second, if gk = rem (S(gi , gi ), G) 6= 0 at some stage, then at a later stage
the set G will contain gk and hence rem (S(gi , gi ), G) will be 0 at the later stage. This inefficiency
can be remedied as follows.
go ← true
jstart ← 1
whilego
go ← false
n ← |G|
for i ← 1 to n − 1
j ← max(i + 1, jstart) to n
do for
do
i , gj ), G) 6= 0
if rem (S(g
do G ← G ∪ {rem (S(gi , gj , G)}
then
go ← true
jstart ← n + 1
return (G)
Proposition 12.6.4
Suppose that I ⊆ F [x1 , x2 , . . . , xn ] is an ideal with I = (a1 , a2 , . . . , as ). Then I =
(a1 , a2 , . . . , as ), where
a1 = rem (a1 , {a2 , . . . , as }) .
Furthermore, if {a1 , a2 , . . . , as } is a Gröbner basis, then so is {a1 , a2 , . . . , as }.
If a1 = 0, then the set {a2 , . . . , as } is a generating set of I and is a Gröbner basis if {a1 , a2 , . . . , as }
is a Gröbner basis.
Proposition 12.6.4 inspires the following refinement to a Gröbner basis.
Definition 12.6.5
A Gröbner basis G is called reduced if
(1) LC(g) = 1 for all g ∈ G;
(2) g = rem (g, G − {g}) for all g ∈ G.
Part (2) of Definition 12.6.5 can be restated to say that for all g ∈ G, no term of g is divisible
by LM(g 0 ) for any g 0 ∈ G − {g}.
It is not hard to check that, except for needing to divide each element of the Gröbner basis by
the leading coefficient, the Gröbner bases in Examples 12.6.2 in 12.6.3 are reduced.
Proposition 12.6.4 allows for further simplifications during Buchberger’s Algorithm, as depicted
in the following example.
Example 12.6.6. Set a1 = xy 2 + 3y − 2 and a2 = x2 y + x + 1 and consider the ideal I = (a1 , a2 ) in
R[x, y]. We choose the lexicographic monomial order with x > y. We start with the generating set
G = {a1 , a2 }. Before beginning Buchberger’s Algorithm, we observe that rem (a1 , {a2 }) = a1 and
rem (a2 , {a1 }) = a2 . Hence, no term of ai is divisible by LM(aj ) for 1 ≤ i, j ≤ 2 with i 6= j.
Proceeding with Buchberger’s algorithm, the first step is to calculate
rem (S(a1 , a2 ), {a1 , a2 }) = rem (2xy − 2x − y, {a1 , a2 }) = 2xy − 2x − y.
Since this is not 0, we can set a3 = 21 (2xy − 2x − y) = xy − x − 12 y, where we divided by the leading
coefficient of the S-polynomial. Joining a3 to G gives G = {a1 , a2 , a3 }. This time, we observe that
LM(a3 ) | LT(a1 ) and also LM(a3 ) | LT(a2 ). Consequently, we can replace a1 , a2 , and a3 with some
remainders. In each row below, we replace ai with the multiple of rem (ai , G − {ai }) that is monic.
(Be aware that, in doing these repeated polynomial divisions, G is changing at each row.)
a1 a2 a3
Initially 2
xy + 3y − 2 2
x y+x+1 xy − x − 12 y
Replace a1 x + 12 y 2 + 72 y − 2 x2 y + x + 1 xy − x − 12 y
Replace a2 x + 12 y 2 + 72 y − 2 y 5 + 14y 4 + 41y 3 − 58y 2 + 2y + 12 xy − x − 12 y
Replace a3 x + 12 y 2 + 72 y − 2 y 5 + 14y 4 + 41y 3 − 58y 2 + 2y + 12 y 3 + 6y 2 − 10y + 1
Replace a2 x + 12 y 2 + 72 y − 2 0 y 3 + 6y 2 − 10y + 1
go ← true
whilego
go ← false
for i ←
1 to |A|
p ← rem (ai , A − {ai })
if p = 0
do then A ← A − {ai }
do
A ← (A − {ai }) ∪ {p}
else if LM(p) 6= LM(ai )
then go ← true
return (A)
with the following refinement: If ever p = 0, replace the corresponding aj,i with 0. This keeps the
`j as an s-tuple of polynomials even though if a polynomial is 0 it is removed from the set A. Note
that `sk represents A after the kth time through the while loop.
Proposition 12.6.7
Fix a monomial order 4. For any finite subset A ⊆ F [x1 , x2 , . . . , xn ], the algorithm
ReduceSet applied to A terminates. Furthermore, at the end of the algorithm, p =
rem (p, A − {p}) for all p ∈ A.
Proof. The algorithm ReduceSet terminates if and only if there exists some k ∈ N such that
LM(a(k+1)s,i ) = LM(aks,i ) for all 1 ≤ i ≤ s. We need to prove that this condition occurs and that
when it does occur, the resulting set A has the property that p = rem (p, A − {p}) for all p ∈ A.
Since a(k+1)s,i is the remainder of aks,i when divided by some ordered (s−1)-tuple of polynomials,
by Proposition 12.4.8, either a(k+1)s,i = 0 or mdeg a(k+1)s,i 4 mdeg aks,i . Therefore, for each i
(corresponding to each polynomial ai ) the sequence of monomials is decreasing:
Since the monomial order is a well-ordering, then each of these chains terminates. Hence, for each
i, the set
Si = {mdeg(aj,i ) | aj,i 6= 0 and j ≥ 0}
is finite. Consequently, there exists a K such that LM(aks,i ) = LM(aKs,i ) for all k ≥ K. Thus, at
the (K + 1)th through the while loop the algorithm will terminate.
Finally, after the last iteration k of the while loop, since LM(aks,i ) = LM(a(k−1)s,i ) for all
1 ≤ i ≤ s, no term of aks,i is divisible by LM(aks,i0 ) for i 6= i0 . Hence,
An algorithm that produces a reduced Gröbner basis of an ideal given a generating set of that
ideal simply needs to apply the ReduceSet procedure to the generating set G at the end or
possibly at other appropriate places in an implementation of Buchberger’s Algorithm. Inserting the
ReduceSet procedure at other places in the Buchberger Algorithm may reduce the size of G in
the middle of the algorithm, allowing for fewer calculations of S-polynomials. Though there are a
number of choices for how to specifically implement the procedure, Example 12.6.6 implemented the
following algorithm.
G ← ReduceSet(G)
while ∃{g1 , g2 } ⊆ G(rem (S(g1 , g2 ), G) 6= 0)
do G ← ReduceSet(G ∪ {rem (S(g1 , g2 ), G)})
return (G)
Reduced Gröbner bases are not desirable just for their simplicity; they benefit from the following
nice property.
Proposition 12.6.8
Let I be a nontrivial ideal in F [x1 , x2 , . . . , xn ]. For a given monomial order 4, there exists
a unique reduced Gröbner basis of I.
Proof. By Proposition 12.5.7 every ideal has a Gröbner basis G. The ReduceSet algorithm termi-
nates after a finite number of steps. Applied to G and dividing by leading coefficients if necessary,
the result is a reduced Gröbner basis of I. Hence, every ideal has a reduced Gröbner basis.
To prove uniqueness, suppose that G and G0 are two reduced Gröbner bases of I. Suppose that
G = {g1 , g2 , . . . , gs } and G0 = {g10 , g20 , . . . , gt0 }. The sets of leading monomials LM(G) and LM(G0 )
are both generating sets of the monomial ideal (LT(I)). Furthermore, since G is a reduced Gröbner
basis, for any i, the leading monomial LM(gi ) is not divisible by LM(gj ) for any j 6= i. Thus, LM(G)
is a minimal basis of (LT(I)), as defined in Exercise 12.5.3. Similarly, LM(G0 ) is a minimal basis
of (LT(I)). In Exercise 12.5.3, we showed that every monomial ideal has a unique minimal basis.
Thus, as sets of monomials LM(G) = LM(G0 ). In particular, s = t and, possibly after reordering,
the sets G and G0 are such that LT(gi ) = LM(gi ) = LM(gi0 ) = LT(gi0 ) for all 1 ≤ i ≤ s.
For any i, consider gi − gi0 . Since gi − gi0 ∈ I, the remainder rem (gi − gi0 , G) = 0 since G is
a Gröbner basis of I. In the difference, gi − gi0 , the leading terms cancel. However, since G and
G0 are reduced, none of the nonleading terms in gi or in gi are divisible by any monomials in
LT(G) = LT(G0 ). Hence,
gi − gi0 = rem (gi − gi0 , G) = 0.
Gröbner bases solved the problem of ideal membership: If I is an ideal and G a Gröbner basis of
I with respect to some monomial ideal 4, then f ∈ I if and only if rem (f, G) = 0. The existence and
uniqueness of a reduced Gröbner basis for an ideal offers a computational solution to an otherwise
pesky problem: how to tell if I = (f1 , f2 , . . . , fs ) is equal to the ideal I 0 = (f10 , f20 , . . . , ft0 ).
Corollary 12.6.9
Fix a monomial order 4 on Nn . Two ideals in I and I 0 in F [x1 , x2 , . . . , xn ] are equal if and
only if they have the same reduced Gröbner basis.
12.6. BUCHBERGER’S ALGORITHM 655
Maple Function
SPolynomial The command SPolynomial(a,b,T), where a and b are polynomials, and T
is a monomial order, calculates the S-polynomial of a and b with respect to
T.
Basis The command Basis(A,T), where A is a list of polynomials and T is a
monomial order, calculates a Gröbner basis of the ideal (A) that is nearly
reduced. In Maple’s implementation, if the polynomials in A have rational
coefficients, then Basis returns a set of polynomials A0 ⊆ Z[x] such that for
{p/LC(p) | p ∈ A0 } is a reduced Gröbner basis of (A).
(c) Conclude also that if h = rem (S(f, g), (f, g)) 6= 0, then (LM(f ), LM(g)) is a strict subset of the
ideal (LM(f ), LM(g), h).
2. Using the algorithm GroebnerBasis, find the Gröbner basis for the ideal in Example 12.6.2 using
the lexicographic order with x > y.
3. By implementing various reductions, say using the ReducedGroebnerBasis algorithm, find the
reduced Gröbner basis for the ideal in Example 12.6.2 using the lexicographic order with x > y.
4. Find the reduced Gröbner basis of the ideal (xz 2 −2y 2 +5, xy −3z −1) with respect to the lexicographic
order with x > y > z.
5. Find the reduced Gröbner basis of the ideal (xz 2 −2y 2 +5, xy −3z −1) with respect to the lexicographic
order with z > x > y.
6. Consider the polynomial ring F5 [x, y]. Find the reduced Gröbner basis of the ideal (2xy + 3xy 2 +
1, 2x2 + xy 3 + 4) with respect to:
(a) the lexicographic order with x > y.
(b) the graded lexicographic order with x > y.
7. A Gröbner basis G of an ideal I is called minimal if (a) LC(g) = 1 for all g ∈ G; and (b) for all g ∈ G,
the monomial LM(g) is not in the monomial ideal (LT(G − {g})). Prove that G is a minimal Gröbner
basis if and only if LC(g) = 1 for all g ∈ G and no proper subset of G is a Gröbner basis of I.
8. Prove that G is a minimal Gröbner basis if and only if LC(g) = 1 for all g ∈ G and LT(G) is a minimal
basis of the monomial ideal (LT(G)). [See Exercises 12.6.7 and 12.5.3.]
9. Prove that the reduced Gröbner basis of an ideal I is a minimal Gröbner basis. [See Exercise 12.6.7.]
10. In this exercise, we consider the Gröbner basis corresponding to a system of linear equations. Let F
be a field and n a positive integer. Consider the system of m linear equations
a11 x1 + a12 x2 + · · · + a1n xn − b1 = 0
a21 x1 + a22 x2 + · · · + a2n xn − b2 = 0
..
.
am1 x1 + am2 x2 + · · · + amn xn − bm = 0.
Call fi = ai1 x1 + ai2 x2 + · · · + ain xn − bi and consider the ideal I = (f1 , f2 , . . . , fm ). Set gi to be the
polynomial corresponding to the ith row after the Gauss-Jordan elimination. (Some of the gi may be
the 0 polynomial.)
656 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS
12.7
Applications of Gröbner Bases
Previous sections of this chapter, developed the theory of rings of multivariable polynomials over a
field. We started from an abstract perspective of Noetherian rings and established Hilbert’s Basis
Theorem, a corollary of which is that F [x1 , x2 , . . . , xn ], where F is a field, is a Noetherian. We
found that any algorithm for dividing polynomials in a multivariable context does not solve the
ideal membership problem. This is resolved when we use a Gröbner basis as a generating set of an
ideal. With Buchberger’s Algorithm and its variants, we can: (1) find a Gröbner basis G of an ideal
I = (S), such that S ⊆ G; (2) find the reduced Gröbner basis G of an ideal with respect to any
monomial ideal.
With the theory of Gröbner bases at our disposal, along with an algorithm to compute it, we
now turn to applications that become computationally possible.
In the examples that follow, all the computations are performed using a computer algebra system.
(In some computer algebra systems, when a generating set of a polynomial ideal involves polynomials
with only rational coefficients, the implementation of Buchberger’s Algorithm provides a Gröbner
basis that is reduced, except that the polynomials are not necessarily monic but scaled by a factor so
that the coefficients are integers. Consequently, we sometimes call a Gröbner basis G reduced if one
can obtain a reduced Gröbner basis by dividing each polynomial g ∈ G by its leading coefficient.)
Example 12.7.1. Consider the ideal I = (x3 +yz, 2xy+xz) in R[x, y, z] and consider the polynomial
f = 2x3 y − yz 2 . We propose to work with the lexicographic monomial order with x y z. We
can calculate that
rem f, (x3 + yz, 2xy + xz) = −2y 2 z − yz 2 .
However, the set {x3 + yz, 2xy + xz} is not a Gröbner basis of I with respect to ≤lex . Therefore, we
cannot conclude one way or the other whether f ∈ I. The reduced Gröbner basis of I with respect
to ≤lex is G = {x3 + yz, 2xy + xz, 2y 2 z + yz 2 }. Another calculation shows that rem (f, G) = 0.
Consequently, we deduce that f ∈ I. The polynomial division of f by G gives the quotients of
Call a1 = x2 +9y 2 −9 and a2 = 25x2 +4y 2 −100 and let I be the ideal (a1 , a2 ) in the polynomial ring
R[x, y]. From the perspective of varieties, the solution to this system consists of the intersection of
two ellipses in the affine space R2 . (See Figure 12.2.) The reduced Gröbner basis of I with respect
to the lexicographic order with x > y is
2 864 2 125
G= x − ,y − .
221 221
Since V(I) = V(G), the intersection of these two ellipses corresponds to the four points
r r ! r r ! r r ! r r !
864 125 864 125 864 125 864 125
, , − , , ,− , − ,− .
221 221 221 221 221 221 221 221 4
658 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS
y
2
-2 -1 1 2
x
-1
-2
as a subset of R2 . The solution of this system is a variety that corresponds to the intersection of the
cubic a1 = x3 + y 3 − 8/3 = 0 and the circle a2 = x2 + y 2 − 20/9 = 0. Figure 12.3 shows these two
curves and we observe that this variety consists of exactly four points. The reduced Gröbner basis
(scaled to clear denominators) of I = (a1 , a2 ) with respect to ≤lex with x > y is
G = {729y 6 − 2430y 4 − 1944y 3 + 5400y 2 − 1408, 352x + 405y 5 + 486y 4 − 450y 3 − 1620y 2 + 352y}.
We know that I = (G). This Gröbner basis has a polynomial that has terms only involving y.
Hence, all solutions (x, y) to the system of equations must satisfy
729y 6 − 2430y 4 − 1944y 3 + 5400y 2 − 1408 = 0
⇐⇒ (3y)6 − 30(3y)4 − 72(3y)3 + 600(3y)2 − 1408 = 0
⇐⇒ (3y − 2)(3y − 4)((3y)4 + 6(3y)3 − 2(3y)2 − 132(3y) − 176) = 0.
We see that two of the solutions to (12.15) have y = 32 and y = 43 . It is not hard to check that the
polynomial f (z) = z 4 + 6z 3 − 2z 2 − 132z − 176 is irreducible over Q and has two real roots and
two complex roots. We can use Newton’s method to find the real roots to f (z) numerically or we
could use the Cardano-Ferrari method to solve a quartic explicitly. Once we find four real roots to
729y 6 − 2430y 4 − 1944y 3 + 5400y 2 − 1408 = 0, the corresponding x of the solutions are obtained
from the second polynomial in our Gröbner basis, namely,
1
x=− (405y 5 + 486y 4 − 450y 3 − 1620y 2 + 352y).
352
We find that the rational solutions are 32 , 43 and 43 , 32 . There are two other points with non-
rational coordinates that are approximately (1.4071, −0.4922) and (−0.4922, 1.4071). 4
Example 12.7.4. The reader might suspect that the coefficients in the system of equations in Ex-
ample 12.7.3 were chosen so that (12.15) had two points in the solution that had rational coordinates.
In this example, we change the problem to make it more general and we ask a different question.
Consider instead the system of equations
(
x3 + y 3 − a = 0
(12.16)
x2 + y 2 − b = 0,
12.7. APPLICATIONS OF GRÖBNER BASES 659
where a and b are unspecified parameters. Changing the parameter a modifies the shape of the cubic
curve. Curves of the form x3 + y 3 = a all look similar √to the cubic curve in Figure 12.3,
√ with an
asymptote of x + y = 0, but having the x-intercept of ( 3 a, 0) and a y-intercept of (0, 3 a). On the
other hand, changing b simply affects the radius of the circle.
By looking at the graphs of the cubic and the circle, it appears that for a fixed a, there exist b1
and b2 such that
• at b = b1 the system (12.16) has exactly two solutions (and the circle has the same tangent
line as the cubic at the points of intersection);
• at b = b2 the system (12.16) has exactly three solutions (two regular intersections and one
intersection point at which the circle and the cubic curve have the same tangent line);
The system (12.17) corresponds to a variety in the affine space R5 , described by an ideal in
R[x, y, a, b, λ]. We use the lexicographic order with x > y > λ > a > b. This has the effect of
attempting to eliminate x, then y, then λ, and so forth. With respect to this monomial order, the
reduced Gröbner basis of the ideal corresponding to (12.17) is
G ={2a4 − 3a2 b3 + b6 , 2bλ − 3a, 4λa3 − 9a2 b2 + 3b5 , −27a2 b + 8a2 λ2 + 9b4 , −81a2 + 16aλ3 + 27b3 ,
3a2 y + 3ab2 − 3yb3 − 2λa2 , 9ba + 6aλy − 9yb2 − 4aλ2 , 16λ3 y − 54ay − 18aλ + 27b2
3y 2 − 2λy, 8λ2 y − 9by + 9xb − 9a, −b2 + ax + ay, −3b + 2λy + 2λx, 2λy + 3x2 − 3b}.
(12.18)
The first polynomial listed in G is 2a4 − 3a2 b3 + b6 = (b3 − a2 )(b3 − 2a2 ). Since this√polynomial√must
3 3
be 0 when the circle and the cubic have common tangents, this occurs when b = a2 or b = 2a2 .
From our intuition from the graphs of the curves, we conclude that these are precisely the values of
b1 and b2 described above.
Note that the second polynomial in G is 2bλ − 3a. If we assume that b 6= 0, then 2bλ − 3a = 0
implies that λ = 3a 3a
2b . Furthermore, it is easy to check that under the assumption that λ = 2b , the
4 2 3 6
third, fourth, and fifth polynomials given in G are multiples of 2a − 3a b + b . 4
Example 12.7.5. We revisit Example 12.7.3 once more to illustrate yet another strategy. Recall
that our approach in Example 12.7.3 allowed us to easily find two rational solutions to (12.15) but
we “gave up” on searching for an exact solution since it appeared to involve the roots of a quartic.
We now take a different approach that is possible by virtue of the fact that the polynomials in
(12.15) are symmetric. This means that if (x0 , y0 ) is a solution to the system, then (y0 , x0 ) is also
a solution.
In Section 11.5.1, we encountered the elementary symmetric polynomials. For the case of two
variables, there are only two elementary symmetric polynomials, namely s1 = x + y and s2 = xy.
660 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS
x + y 3 − 38 = 0
3
x2 + y 2 − 20 = 0
9 (12.19)
x + y − s1 = 0
xy − s2 = 0
G = {3s31 − 20s1 + 16, 18s2 − 9s21 + 20, 18y 2 − 18ys1 + 9s21 − 20, x + y − s1 }.
Note that 3s31 − 20s1 + 16 = (s1 − 2)(3s21 + 6s1 − 8) so the first equation in (12.20) implies that s1 is
1√ 1√
2, −1 − 33, or −1− 33.
3 3
From here, we will be able to deduce s2 , y and then x. If in fact we only want to find the solutions
(x, y) of intersection, the system (12.20) allows us to avoid calculating s2 .
Case 1: s1 = 2. Then from the third equation in (12.20), y solves 18y 2 − 36y + 16 = 0. Solving
this quadratic gives y = 32 or 43 . From the fourth equation in (12.20), we see that x = s1 − y,
which leads to two solutions for (x, y), namely (2/3, 4/3) and (4/3, 2/3).
√ √ √
Case 2: s1 = −1 − 13 33. Then y solves 18y 2 + (18 + 6 33)y + (22 + 6 33) = 0. However, this
quadratic polynomial has a negative discriminant so has no real roots.
√ √ √
Case 3: s1 = −1 + 13 33. Then y solves 18y 2 + (18 − 6 33)y + (22 − 6 33) = 0. This gives us the
explicit roots for y:
√ √ √ √ √
q
1 1
q
−(18 − 6 33) ± (18 − 6 33)2 − 4 · 18(22 − 6 33) = −3 + 33 ± −2 + 6 33 .
36 6
Example 12.7.6. Suppose that numbers a, b, and c solve the following system of equations
a + b + c = 10
a2 + b2 + c2 = 13 (12.21)
3
3 3
a + b + c = 20.
Theorem 11.5.3 affirms that every symmetric polynomial in the variables x1 , x2 , . . . , xn can be ex-
pressed by g(s1 , s2 , . . . , sn ), where g is a polynomial in n variables and si is the ith symmetric
12.7. APPLICATIONS OF GRÖBNER BASES 661
To determine the value of a4 + b4 + c4 corresponds to finding the value of t. Calculating the reduced
Gröbner basis of
with respect to the lexicographic order with a > b > c > t amounts to attempting to successively
eliminate the variables x, y, z, and t. This reduced Gröbner basis is
G = {6t − 4307, −650 + 261c − 60c2 + 6c3 , 87 − 20c − 20b + 2cb + 2c2 + 2b2 , a + b + c − 10}.
The variety corresponding to (12.22) is equal to the variety V(G). Hence, whenever a, b, c solve
(12.21), then t solves 6t − 4307 = 0. Thus, we conclude that
4307
a4 + b4 + c4 = . 4
6
Example 12.7.7. Recall the concept of eigenvalues and eigenvectors associated to an n × n matrix
A ∈ Mn (F ). An element λ ∈ F , or possibly in a field extension of F , is called an eigenvalue of A
if there exists a nonzero vector ~v such that A~v = λ~v . Such a vector ~v is called an eigenvector of
eigenvalue λ. Eigenvalues are obtained as the roots of the characteristic polynomial of A, namely
det(xI − A).
From the perspective of systems of polynomial equations, if A is a given matrix, then the equation
A~v −λ~v = ~0 consists of a system of n equations in the n+1 variables λ, v1 , v2 , . . . , vn . These equations
are nonlinear but rather quadratic, since the ith equation has a term λvi .
As a specific example, consider the matrix
−5 4 4
A = 12 −8 −6 .
−24 17 15
We use the lexicographic monomial order with v1 > v2 > v3 > λ. Whether by hand or using a
computer algebra system, we find that the reduced Gröbner basis associated to the ideal generated
by these three polynomials in R[v1 , v2 , v3 , λ] is {g1 , g2 , g3 }, where
g1
= λ3 v3 − 2λ2 v3 − 5λv3 + 6v3
g2 = 5v + 2 + 2λ2 v3 − 3λv3 − 9v3
= 60v1 + 17λ2 v3 − 23λv3 − 114v3 .
g3
662 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS
This Gröbner basis gives us a good understanding of the solution to this system but from a direction
different from that usually present in linear algebra. Notice that
3 2
g1 = 0
v3 (λ − 2λ − 5λ + 6) = 0
1
g2 = 0 =⇒ v2 = 5 (−2λ2 + 3λ + 9)v3 (12.23)
1 2
g3 = 0 v1 = 60 (−17λ + 23λ + 114)v3 .
In this system, we see that solutions to the eigenvalue/eigenvector problem come from v3 = 0 or
λ3 − 2λ2 − 5λ + 6 = 0. Note that if v3 = 0, then the other two equations give v2 = v1 = 0. Hence, we
find that ~0 is a solution regardless of λ. The equation in λ is (λ + 2)(λ − 1)(λ − 3) = 0, so the matrix
has three distinct eigenvalues: −2, 1, and 3. However, our approach using Gröbner bases gives us
much more information than just the eigenvalues. The second and third equations in (12.23) give
formulas for coordinates for the eigenvectors, namely
1 2
60 (−17λ + 23λ + 114)
~v = t 51 (−2λ2 + 3λ + 9)
1
whenever λ is −2, 1, or 3. 4
where the fi and gi are rational functions, Gröbner bases provide a strategy to solve the impliciti-
zation problem. Consider the system of polynomial equations
x1 g1 (t1 , t2 , . . . , tm ) − f1 (t1 , t2 , . . . , tm ) = 0
..
.
x g (t , t , . . . , t ) − f (t , t , . . . , t ) = 0
n n 1 2 m n 1 2 m
in the polynomial ring F [x1 , . . . , xn , t1 , . . . , tm ]. Calculating the reduced Gröbner basis of the set of
these polynomials with the lexicographic order t1 > · · · > tm > x1 > · · · > xn has the effect of finding
a system of polynomials that eliminate the variables in the order t1 > · · · > tm > x1 > · · · > xn .
Hence, if the parametrized set Z is a variety, we suspect that the set S of polynomials that define it
will appear in the Gröbner basis as the polynomials that do not involve the parametrizing variables
t1 , t2 , . . . , tm .
in R[x, y, z, t] along with the ideal I generated by these three polynomials. The reduced Gröbner
basis of I with respect to the lexicographic order with t > x > y > z consists of the three polynomials
p1 = 2t − y + z
p2 = 4x − y 2 + 2yz − z 2
p3 = y 3 − 3y 2 z + 3yz 2 − 12y − z 3 + 4z.
Hence, the variety in R4 corresponding to (12.24) solves p2 = 0 and p3 = 0. We conclude that the
curve Z in R3 is the variety V(p2 , p3 ) and, furthermore, the parameter that gives a particular point
(x, y, z) is given by t = 12 (y − z). 4
When parametrizing curves or surfaces, it is often convenient to use the sine or cosine function.
For example, a simple parametrization of the unit circle is ~x : [0, 2π) → R2 with ~x(t) = (cos t, sin t).
However,
1 − u2
2u
~r(u) = , with u ∈ R (12.25)
1 + u2 1 + u2
also parametrizes the unit circle, though it misses the point (−1, 0). The vector function ~r(u) gives
the intersection of the circle with the line through (1, u) and (0, −1). The point (−1, 0) arises as the
limit limu→∞ ~r(u).
u
y
~r(u)
Now consider the ideal I = ((1 + u2 )x − (1 − u2 ), (1 + u2 )y − 2u) in R[x, y, u]. Calculating the
reduced Gröbner basis of I with respect to the lexicographic order with u > x > y has the effect of
attempting to eliminate the variable u. This basis is
{x2 + y 2 − 1, uy − 1 + x, ux − y + u}.
The first polynomial in this basis gives the equation of the unit circle. So we might conclude that
(12.25) corresponds to the variety expressed by the single equation x2 + y 2 − 1 = 0. However, we
664 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS
need take care to consider the meaning of the remaining two equations. The last two equations
should be used to solve for the parameter u. We get two solutions
1−x y
u= and u= .
y 1−x
It is important to consider what these equation mean for when u is not defined. The first, yu = 1−x,
does not allow us to solve for u if y = 0. The second u(1 − x) = y does not allow us to solve for
u when x = −1. Hence, for any point on the variety V(x2 + y 2 − 1), it is possible to solve for u,
except over the subvariety V(x − 1, y). We conclude that the parametrization in (12.25) gives the
unit circle except for the point (−1, 0).
Example 12.7.9. As a last example, consider the Lissajous curve Z depicted in Figure 12.4 and
parametrized by
~r(t) = (cos 3t, sin 2t) for t ∈ [0, 2π].
In order to use the above strategy to find an equation that gives the Lissajous figure as a variety, we
first must write the functions cos 3t and sin 2t as polynomials in cos t and sin t. It is easy to show
that cos 3t = 4 cos3 t − 3 cos t and the sin 2t = 2 sin t cos t. Following the above strategy, we now
replace cos t with (1 − u2 )/(1 + u2 ) and sin t with 2u/(1 + u2 ), with the assumption that u ∈ R. This
gives the parametrization
2 3
(1−u2 )
x = 4 (1−u )
(
(1+u2 )3 − 3 (1+u2 )
2
(2u) (1−u )
y = 2 (1+u 2 ) (1+u2 ) .
From the above discussion of this alternative parametrization for the unit circle, we expect this
parametrization to miss the point on Z corresponding to t = π. The reduced Gröbner basis of
{9y 2 − 4x2 − 24y 4 + 4x4 + 16y 6 , −12y 2 + 16y 5 t − 8y 2 x2 + 16y 4 − 16y 3 t + 8y 2 x + 3yt + 2x + 6x2 − 8x3 ,
8y 4 − 2x3 + 4y 2 x − 6y 2 − xyt + 4y 3 xt + 2x2 , 3y + 8y 4 t − 4yx2 − 4y 3 − 8xy 2 t − 6y 2 t + 2xt + 2x2 t,
10y 2 t + yt2 − 4xt + 4xy − 8y 4 t + 4yx2 + 8xy 2 t − 7y + 8y 3 , xt2 − 1 + 4yt + t2 + x}.
The first equation in the above list defines a variety V(9y 2 − 4x2 − 24y 4 + 4x4 + 16y 6 ) such that
Z is a subset of this variety (and in fact is equal to this variety). 4
12.7. APPLICATIONS OF GRÖBNER BASES 665
In the following exercises, it is expected that the reader will obtain a relevant Gröbner basis using a computer
algebra system.
1. Consider the ideal I = (xy +z 2 , x2 −3xz +2y) in the ring R[x, y, z]. Decide if the following polynomials
are in the ideal.
(a) 9x3 y − 3x3 z + 2x2 y − 4y 2
(b) 2xy + 4yz + x3 + 9yx2
2. Consider the ideal I = (x3 + xy 2 + 2x, 3x2 + y 3 − 1) in the ring R[x, y]. Decide if the following
polynomials are in the ideal.
(a) y 3 − 3y 2 − 7
(b) 3x5 + 3x3 y 2 − x3 − 2xy 3 − xy 2
3. Solve the system of equations
2 2 2
x + y + z = 9
x2 − 2y 2 = 1
x − z 2 = 2.
4. Recall that a critical point of a function f (x, y) is a point (x, y) such that the gradient ∇f = (fx , fy )
is undefined or zero. Find the critical points of f (x, y) = x3 + 3xy 2 + 2y 3 − x.
5. We propose to explicitly solve the system of equations
(
x3 + xy 2 + 2x = 0
3x2 + y 3 − 1 = 0.
(a) Find the reduced Gröbner basis G with respect to the lexicographic order with x > y.
(b) Notice that G contains a polynomial of degree 6 in just the variable y. Show that this sextic
polynomial factors into two cubics over Q.
(c) Use methods developed elsewhere in this textbook to get the explicit real roots.
(d) Use this information to find all the solutions to the above system of equations. [Hint: There are
exactly three real roots.]
2
6. Find the intersection of the sphere x2 + y 2 + z 2 = 4 with the ellipse x2 + y 2 + z9 = 1. Use a reduced
Gröbner basis calculation to find parametrizations of the two components of this intersection. Explain
your choice of monomial order.
2 2
7. Find the intersection of the sphere x2 + y 2 + z 2 = 4 with the ellipse x2 + y4 + z9 = 1. Use a reduced
Gröbner basis calculation to find parametrizations of the two components of this intersection. Explain
your choice of monomial order.
8. Continue working with Example 12.7.4.
(a) Explicitly find the points (x, y) of intersection of the cubic curve and the circle that have the
same tangent lines. (These points should be given in terms of a and b.)
√
3
(b) Explain from the polynomials
√ in the reduced Gröbner basis (12.18) why b1 = a2 has two
3 2
solution points and b2 = 2a has only one solution point.
(c) Explain geometrically why these results make sense.
9. Consider the following strategy using systems of polynomial equations to find the tangent line to a
curve at a point. Suppose that a curve Z in R2 is defined by a polynomial equation f (x, y) = 0. The
tangent line is the solution to a polynomial p = ax + by + c = 0 such that the gradient (with respect
to x and y) satisfies ∇f = λ∇p. Hence, a point (x, y) on Z has the tangent line ax + by + c = 0 if
f (x0 , y0 ) = 0
ax + by + c = 0
0 0
fx (x0 , y0 ) = λpx (x0 , y0 )
fy (x0 , y0 ) = λpy (x0 , y0 ).
Explain why it is useful to calculate the Gröbner basis with the lexicographic order of λ > a > b >
c > x0 > y0 to find the coefficients a, b, and c implicitly in terms of x0 , y0 . Apply this strategy to the
parabola y = x2 + 1 and show that the resulting tangent line is what we expect from calculus.
666 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS
10. Use Gröbner bases to find the formula for the tangent line to the curve y 2 = x3 − x at any given point
(x0 , y0 ). [See Exercise 12.7.9.]
11. Explain a method using Gröbner bases to find the equation of the tangent plane to a surface f (x, y, z) =
0 at a given point (x0 , y0 , z0 ). Demonstrate this method on the surface x3 + y 3 + z 3 − 4 = 0.
12. Find the distance from the point (5, 3) to the parabola y = x2 using polynomial equations. [Hint: Use
the polynomial equations y − x2 and (x − 5)2 + (y − 3)2 − d2 , as well as two equations using gradients
that express the condition that at a point (x, y) where two curves intersect, those curves have the
same tangent line. This becomes a system of equations in 4 variables, x, y, λ, d. Find a Gröbner basis
that includes an equation only in d. Solve this equation to find the desired distance.]
13. Use Gröbner basis techniques to find the lines that are bitangent to the parabolas y = x2 and 2y =
x2 − 4x + 8.
14. Use Gröbner basis techniques to find the inverse of a 2 × 2 matrix. In other words, recover when a
matrix is invertible and obtain formulas for the entries of an inverse matrix.
15. Write x5 + y 5 + z 5 as a polynomial of the elementary symmetric polynomials s1 , s2 , s3 and explain
your method.
16. A torus of large ring radius R = 4 and cross-section radius r = 1 can be parametrized by
~
X(u, v) = ((2 + cos u) cos v, (2 + cos u) sin v, sin u),
for (u, v) ∈ [0, 2π]2 . Show that this torus is a variety in R3 and give an explicit equation for the torus
using the techniques described in Section 12.7.3.
17. A space cardioid is the curve parametrized by
~r(t) = ((1 + cos t) cos t, (1 + cos t) sin t, sin t) for t ∈ [0, 2π].
Express the space cardioid as a variety in R3 and fully justify your result.
12.8
A Brief Introduction to Algebraic Geometry
12.8.1 – What Is Algebraic Geometry?
Geometry comes in a number of flavors.
Euclidean geometry deals with points, lines, triangles, circles, polygons, conics, rays, planes,
spheres, and various other subsets of points in Rn and concerns itself with results that can be proven
from Euclid’s five famous postulates. Though Euclidean geometry stands as a great achievement
of antiquity, the postulates and common notions in Euclid’s Elements involved some assumptions
that required clarification (e.g., axioms of betweenness). The work of those who proposed systems
of axioms that removed the original logical gaps still fell under the label of Euclidean geometry.
When Lobachevsky and Bolyai discovered non-Euclidean geometry, they discarding Euclid’s con-
troversial Fifth Postulate (Parallel Postulate) and proved that there exist consistent geometries that
have an alternative postulate. Nonetheless, classical non-Euclidean geometry continued to see a sim-
ilar proof style as Euclidean geometry. Along the way and subsequently, other types of geometries
emerged: projective geometry, finite geometries, inversive geometry, transformational geometry, etc.
Cartesian coordinates introduced an alternative method to the object of interest in Euclidean
geometry. Using Cartesian coordinates, along with algebra or analysis, to establish geometric results
became known as analytic geometry. In contrast, geometry that did not employ coordinates became
first known as pure geometry, and later as synthetic geometry.
Analytic geometry opened a door to differential geometry. In broad strokes, differential geometry
studies that which can be known about sets of points (often considered as subsets in Rn but not
necessarily) using calculus and analysis. It is not possible to “do calculus” on an arbitrary set of
points, so differential geometry concerns itself with differentiable manifolds, sets that can be locally
12.8. A BRIEF INTRODUCTION TO ALGEBRAIC GEOMETRY 667
parametrized by functions that are in some sense differentiable. In particular, curves and surfaces are
one- and two-dimensional manifolds. Riemannian geometry is a subbranch of differential geometry
in which one studies contexts where it is possible to “do calculus” and have a concept of metric (or
distance). The objects of study in Riemannian geometry are called Riemannian manifolds.
Algebraic geometry also descends from analytic geometry but in a different way. Instead of
starting from parametrized surfaces, algebraic geometry starts with algebraic varieties, which are by
nature defined in relation to some set of variables. At its beginning, the purpose of this flavor of
geometry is to study properties of algebraic varieties (and the more general object of schemes) using
techniques and theory from algebra, and in particular ring theory.
The problems that motivate investigations in algebraic geometry are not unlike those in differ-
ential geometry. On the one side, one studies local properties of varieties, namely properties of the
variety that have meaning in arbitrarily small neighborhoods of the variety around a given point.
On the other side lie global properties, properties that are true of the variety as a whole.
Commutative algebra is a branch of algebra that narrows its focus to the study of commutative
rings and modules thereof. Since algebraic varieties are zero sets of ideals in the commutative
ring F [x1 , x2 , . . . , xn ], commutative algebra is an essential support to algebraic geometry. Concepts
developed in commutative algebra—such as localizations, valuations, graded rings, regular rings,
flatness, derivations, and many others—have consequences for geometric information. Conversely,
geometric concepts such as tangent spaces, singularities, coordinate functions, incidence, geometric
classification problems, and so on motivated research in commutative algebra.
The field of algebraic number theory is not unlike algebraic geometry in the perspective of using
modern algebra to study number theory. However, since rings of numbers are generally commuta-
tive, algebraic number theory also borrows heavily from commutative algebra. Consequently, these
three branches—commutative algebra, algebraic geometry, and algebraic number theory—developed
together. Hence, many concepts that arise naturally in on field have an interpretation for objects of
study in one of the other two.
Algebraic geometry is a vast field of mathematics. Consequently, just as with some of the other
“brief introduction” sections in this book, this section only intends to whet the reader’s appetite
with a few introductory concepts. The literature offers a number of excellent books on algebraic
geometry: [6, 22, 37, 38, 44].
A subset U ⊆ Rn is called open if for all p ∈ U , there exists r > 0 such that Br (p) ⊆ U . Similarly,
a subset F ⊆ Rn is called closed if F is an open subset.
Notice that Rn and ∅ are both open and closed. Furthermore, open sets satisfy the following two
properties. (1) If U1 Sand U2 are open, then U1 ∩ U2 is open. (2) If {Ui }i∈I is a collection of open
sets, then the union i∈I Ui is again open.
Let D be an open subset of Rn . Recall that a function f : D → Rm is called continuous at ~c
if for all ε > 0, there exists δ > 0, such that k~x − ~ck < δ implies kf (~x) − f (~c)k < ε. We also call
f continuous if it is continuous at all points ~c ∈ D. Using open balls, we can restate the definition
of continuity to say that for all ε > 0, there exists δ > 0 such that f (Bδ (~c)) ⊆ Bε (f (~c)). This
restatement also implies that Bδ (~c) ⊆ f −1 (Bε (f (~c))). Finally, this latter state is also equivalent to
saying that for all open sets U 0 ∈ Rm , the set f −1 (U 0 ) is open. 4
The properties of open sets in Rn and the last equivalent formulation of continuous turn out to
lead to a wealth of concepts. As with many structures in algebra, it is natural to label this collection
of useful properties and to study them independently from a specific instance.
668 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS
Definition 12.8.2
A topological space is a pair (X, τ ), where X is a set and τ is a subset of the power set
P(X) such that
(1) X ∈ τ and ∅ ∈ τ ;
(2) the intersection of any two subsets U1 , U2 ∈ τ is again in τ ;
(3) the union of any collection {Ui }i∈I of subsets in τ is again in τ .
The set of subsets τ is called the topology of this topological space. A set U ∈ τ is called
open in the τ topology and a set F ⊆ X is called closed if F is open.
These definitions justify calling the motivating Example 12.8.1 the Euclidean topology. Further-
more, in the Euclidean topology on R, the common definition of open or closed as applied to an
interval is precisely when it is open or closed in the Euclidean topology.
By induction, the axiom (2) of intersections implies that the intersection of a finite number of
open sets is again open. However, it is not true that the intersection of an arbitrary collection of
open sets is open. For example,
∞
\ 1 1
− ,1 + = [0, 1],
n=1
n n
which is not an open set in R, even though (−1/n, 1 + 1/n) is an open interval for all n ∈ N∗ .
Definition 12.8.3
Let (X, τ ) be a topological space and let x ∈ X. A neighborhood of x is any set V ⊆ X
such that there exists an open set U ∈ τ such that x ∈ U ⊆ V .
Using the Euclidean topology as inspiration, a neighborhood of x ∈ Rn is any set that contains
an open ball Bδ (x) for some δ > 0. Consequently, if we use the intuitive notion of “near x” to mean
less than some positive distance δ away from x, then a neighborhood of x is any set that contains
the set of points that are near x. The concept of neighborhood of a point makes precise this intuitive
notion of nearness.
Just like in algebraic structures, one does not typically care about arbitrary functions from one
topological space to another. Instead, we consider functions that “preserve the structure.”
Definition 12.8.4
Let (X, τ ) and (Y, τ 0 ) be two topological spaces. A function f : X → Y is called continuous
if f −1 (U ) ∈ τ for all U ∈ τ 0 .
The term continuous is inspired from, and directly generalizes the same term in analysis. This
definition can also be restated to say that f is continuous if and only if for all x ∈ X and for all
neighborhoods V of f (x), the set f −1 (V ) is a neighborhood of x. In this intuitive sense, continuous
functions preserve nearness.
Example 12.8.5 (Finite Complement). Let X be any set and consider the subset τ ⊆ P(X)
defined as
τ = {A ∈ P(X) | X − A is finite} ∪ {X, ∅}.
By definition, X and ∅ are in τ . For A, B ∈ τ , by DeMorgan’s laws, X − (A ∩ B) = A ∩ B = A ∪ B.
Since A and B are both finite, then their union is finite. Let {Ai }i∈I be a collection of sets in τ .
Then [ \
Ai = Ai .
i∈I i∈I
If i0 ∈ I, then Ai0 is finite and so the above intersection must be finite with a cardinality less than
or equal to |Ai0 |. Thus, the union of {Ai }i∈I is again in τ .
12.8. A BRIEF INTRODUCTION TO ALGEBRAIC GEOMETRY 669
We have shown that τ is a topology. This is called the finite complement topology on any set X.
The open subsets in this topology are ∅ and any set whose complement is finite. 4
Example 12.8.6 (Restricted Topology). Let (X, τ ) is a topological space and let Y ⊆ X be any
subset. Define
τ |Y = {V ∈ P(Y ) | ∃U ∈ τ, V = Y ∩ U }.
It is obvious that both ∅ and Y are in τ |Y . Let V1 , V2 ∈ τ |Y . Then V1 = Y ∩ U1 and V2 = Y ∩ U2
for some U1 , U2 ∈ τ . By associativity and idempotence, V1 ∩ V2 = Y ∩ (U1 ∩ U2 ), so V1 ∩ V2 ∈ τ |Y .
Furthermore, if {Vi }i∈I is a collection of sets in τ |Y with Vi = Y ∩ Ui with i ∈ I, then
!
[ [ [
Vi = (Y ∩ Ui ) = Y ∩ Ui ,
i∈I i∈I i∈I
S
which is in τ |Y because the union Ui ∈ τ .
This establishes that τ |Y is a topology on Y , called the restriction of τ to Y or the subset topology.
A set that is “closed in Y ” is a subset F ⊆ Y such that complement in Y is open in Y , namely
Y − F = Y ∩ U where U ∈ τ . Hence, F = Y ∪ U . Since F ⊆ Y ,
F = Y ∩ F = Y ∩ (Y ∪ U ) = (Y ∩ Y )) ∪ (Y ∩ U ) = Y ∩ U .
Hence, as subset F of Y is closed in the restricted topology if and only if F = Y ∩ F 0 , where F 0 is
a subset of X that is closed in τ . 4
Proposition 12.8.7
Let (X, τ ) be a topological space.
(1) Both X and ∅ are closed.
Proof. All three parts follow from the definition of open sets and DeMorgan laws.
Proposition 12.8.7 makes it possible to define a topology on a set by specifying closed subsets
instead of the open sets.
Let K be a field and consider the affine space AnK . In Exercise 12.3.2 we proved that the union
of two varieties is again a variety and that the intersection of an arbitrary collection of varieties is
again a variety. Furthermore, K n = V(0) and ∅ = V(1). Thus, affine varieties satisfy the three
conditions in Proposition 12.8.7.
Definition 12.8.8
The Zariski topology on AnK is the topology in which the closed sets are precisely affine
varieties in AnK .
In fact, it is the Zariski topology that motivates the alternate notation AnK in contrast to the
vector space notation K n .
Example 12.8.9. Let K be a field and consider the Zariski topology on A1K . An affine variety in
A1K consists of a solution set of a polynomial in K[x]. However, a polynomial can have only a finite
number of roots. Conversely, for any finite subset of K, there exists a polynomial that has precisely
that finite set as roots. Consequently, the closed sets in the Zariski topology on A1K are finite sets
of points. Thus, this topology is the finite complement topology. 4
670 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS
Definition 12.8.10
Let (X, τ ) be a topological space. A nonempty subset Y ⊆ X is irreducible if it cannot be
expressed as the union Y = Y1 ∪ Y2 of two proper subsets Y1 and Y2 that are closed in τ |Y .
In every topology (X, τ ), the singleton sets {x} are irreducible sets. Without much intuition,
we might suspect (erroneously) that the singleton sets are the only irreducible sets in a topology.
Indeed, this is the case for the Euclidean topology on R. Suppose that Y contains two distinct points
a, b with a < b. Set Y1 = Y ∩ (−∞, (a + b)/2] and Y2 = Y ∩ [(a + b)/2, +∞). By Example 12.8.6, Y1
and Y2 are closed in Y . Furthermore, both subsets are proper because b ∈ / Y1 and a ∈/ Y2 .
However, in the finite complement topology τF C on R, the whole set R is itself irreducible. The
closed subsets of R in τF C are finite subsets. Since R is not the union of two finite subsets, R is
irreducible in τF C . In fact, a set Y is irreducible in the finite complement topology if and only if Y
is a singleton set or infinite.
In the Zariski topology, the concept of irreducible has a ring theoretic interpretation.
Proposition 12.8.11
Let V be an affine variety in AnK (assuming the Zariski topology). Then V is irreducible if
and only if I(V ) is a prime ideal.
Proof. By Exercise 12.3.2, V(IJ) = V(I) ∪ V(J) for two ideals I, J ⊆ K[x1 , x2 , . . . , xn ] and I(V1 ∪
V2 ) = I(V1 ) ∩ I(V2 ) for any two affine varieties V1 , V2 ⊆ AnK .
Suppose that V is not irreducible. Then V = V1 ∪ V2 , where V1 and V2 are affine varieties with
V1 ( V and V2 ( V . Setting I1 = I(V1 ) and I2 = I(V2 ), we have I = I(V ) = I1 ∩ I2 and also I ( I1
and I ( I2 . However, since I1 I2 ⊆ I and I1 * I and I2 * I, then I(V ) is not prime.
Conversely, suppose that I(V ) is not a prime ideal. Then there exist ideals I1 and I2 such that
I1 I2 ⊆ I(V ) but I1 * I and I2 * I. Then V(I) ⊆ V(I1 I2 ) = V(I1 ) ∪ V(I2 ) so in particular,
V = (V ∩ V(I1 )) ∪ (V ∩ V(I2 ))
where V ∩ V(Ii ) is closed in V . But V(Ii ) * V(I) = V so V(Ii ) ∩ V 6= V , which implies that
V ∩ V(I2 ) and V ∩ V(I2 ) are proper subsets. Thus, V is not irreducible.
It is often possible to study properties of a geometric object from functions on that object.
Polynomial functions in the ring K[x1 , x2 , . . . , xn ], when evaluated on an affine variety V , form a
ring of polynomial functions from ψ : V → K. However, we consider equivalent two such polynomial
functions ψ and φ if φ(c) = ψ(c) for all c ∈ V . This condition is tantamount to φ(x) − φ(x) ∈ I(V ),
which in turn is the same as φ(x) = ψ(x) in K[x1 , x2 , . . . , xn ]/I(V ).
Definition 12.8.12
Let V be an affine variety in AnK . The coordinate ring of V , denoted K[V ] is the quotient
ring K[x1 , x2 , . . . , xn ]/I(V ).
Some properties of affine varieties are readily apparent in the coordinate ring. If V is a vari-
ety consisting of a single point, then K[V ] is a field. By the strong form of the Nullstellensatz,
when K is algebraically closed, a variety has K[V ] = K if and only if V consists of a point. By
Proposition 12.8.11, V is an irreducible affine variety if and only if K[V ] is an integral domain.
For example, consider the intersection of the variety V(y − x2 ) and V(y − 2x + 1). Note that
y = 2x − 1 is the tangent line to y − x2 at (1, 1). From a set-theoretic perspective, the intersection
is the point {(1, 1)}. Consider the ideals I = (y − x2 ) and J = (y − 2x + 1) in R[x, y]. Under the
ideal-variety correspondence, the intersection of the varieties satisfies
Consider the ideal I + J instead of the set-theoretic intersection of the varieties. This sum ideal
is I + J = (y − 2x + 1, (x − 1)2 ). In fact, {y − 2x + 1, (x − 1)2 } is a Gröbner basis of I + J with
respect to the lexicographic order with y > x. Furthermore, using this Gröbner basis, we can show
that neither x −√1 nor y − 1 is in I + J. By the strong form of the Nullstellensatz, √ we know that
I(V(I + J)) = I + J = (x − 1, y − 1). The ideal I + J is a proper ideal of I + J and in this
distinction carries more information.
We interpret the generators {y − 2x + 1, (x − 1)2 } by saying that the intersection of the parabola
y − x2 = 0 and the line y − 2x + 1 = 0 is a “point with a tangent vector.” The quotient ring
K[x1 , x2 , . . . , xn ]/(I + J) is K[x]/((x − 1)2 ). In Example 5.6.6, we observed that writing elements
in R = K[x]/((x − 1)2 ) as a + b(x − 1) shows that elements in R add and multiply as the tangent
lines of representative sum and product functions. Consequently, we can interpret the information
in K[x]/((x − 1)2 ) as containing not just the possible values of a function at a point but something
akin to the value and the derivative of a function at a point.
This observation (along with considerably more algebraic machinery) leads to a developed concept
of the Zariski tangent space to a variety at a point.
Definition 12.8.13
Let R be a commutative. The prime spectrum of R, denoted Spec R as the set of prime
ideals of R.
Definition 12.8.14
For any subset S ⊆ R, define the variety in Spec R associated to S as V (S) = {P ∈
Spec R | S ⊆ P }.
Proposition 12.8.15
Let R be a commutative ring. Then varieties in Spec R satisfy the following properties.
√
(1) If I is the ideal generated by S, then V (S) = V (I) = V ( I).
(2) Both the empty set and all of Spec R are varieties: V (0) = Spec R and V (1) = ∅.
!
[ \
(3) If {Si }i∈I is a collection of subsets in R, then V Si = V (Si ).
i∈I i∈I
This proposition shows that the collection of varieties V (S) form the closed sets of a topology
on Spec R.
672 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS
Definition 12.8.16
The topology defined on Spec R by taking the varieties V (S) as the closed subsets is called
the Zariski topology on Spec R.
The Zariski topology on the affine space AnK and on Spec R use the same name because they are
defined in such similar manner. Suppose that K is algebraically closed and let R = K[x1 , x2 , . . . , xn ].
Then the points in K n correspond to maximal ideals in R. These maximal ideals are contained in
Spec R, but Spec R includes many more elements, namely all prime ideals of R. The prime ideals
of K[x1 , x2 , . . . , xn ] correspond to irreducible varieties in AnK . Consequently, there is a bijective
correspondence between the elements in Spec R and the irreducible affine varieties in AnK .
12.9
Projects
Project I. Jordan Canonical Form and Gröbner Bases. Repeat the calculations in Ex-
ample 12.7.7 but with different matrices with different Jordan canonical forms. Do some
calculations with matrices that have nontrivial Jordan canonical blocks and matrices with
eigenspaces that are more than one-dimensional. From these calculations, deduce some pat-
terns that connect the Jordan canonical form and the generalized eigenspaces with the result
of the Gröbner basis calculations. Prove as much as you can pertaining to this connection.
Project II. Euclidean Geometry and Gröbner Bases. Use Gröbner bases methods to ap-
proach standard concepts in Euclidean geometry. For example, given two points, can you
find the equation for the bisector line? Given the coordinates of a triangle, can you give a
method to find the center and the radius of the circumscribed circle or the inscribed circle to
the triangle? Can you prove certain results from simply doing a Gröbner basis? For example,
can you prove that if C1 is a circle of center A intersects a circle C2 of center B in two points
D and E, then the line AB is perpendicular to DE? What other calculations and proofs can
you obtain?
Project III. Tangency and Gröbner Bases. Consider curves in the plane, curves in space,
or surfaces in space. Section 12.7 hinted at methods for finding tangent lines to curves in
the plane using Gröbner bases. Clearly explain this strategy and see if you can extend it with
examples or with theory to more general situations such as curves in space or surfaces in space.
In some examples, match the results obtained using Gröbner bases to methods that employ
12.9. PROJECTS 673
calculus. If you focus on curves in the plane, can you connect the results from Gröbner bases
techniques to the formula of slope arising from implicit differentiation?
Project IV. Distance between Lines. Use Gröbner bases techniques to obtain a formula of
the distance between two nonintersecting lines in R3 . If there are three lines in R3 , no two of
which intersect, there exists a sphere of least radius that is tangent to all three lines. Can you
find such a sphere for some interesting examples? Can you generalize your examples? Is this
sphere always unique?
Project V. Solving Nonlinear Recursive Functions. Consider a recursive relation on F = Fp
defined by
an = h(an , . . . , a2 , a1 )
for some h ∈ Fp [x1 , x2 , . . . , xn ]. A sequence satisfying the recurrence relation is completely
defined once we specify values for ai with 1 ≤ i ≤ n. Suppose that for a sequence (an )n≥1 ,
we know the terms an+1 , an+2 , . . . but not the initial values ai with 1 ≤ i ≤ n. Show how
Gröbner bases techniques allow us to solve for a1 , a2 , . . . , an , knowing subsequent terms in the
sequence. Give some interesting explicit examples.
Project VI. Envelopes of Families of Curves. Suppose that f (λ, x, y) ∈ R[λ, x, y] represents
a family of curves Cλ in R2 defined by f (λ, x, y) = 0. An envelope of a family of curves is
curve Γ such that for P ∈ Γ, there exists λ such that P ∈ Cλ and the tangent line to Γ at P
is the same as the tangent line to Cλ at P .
In [56], the author shows that the family of curves in R2 parametrized by (x, y) = (t cos α, (1 −
t) sin α) with parameter α and t ∈ R, has an envelope parametrized by (cos3 u, sin3 u). Note
that this particular example of family of curves can be rewritten as the lines
2λ(1 + λ2 )x + (1 − λ4 )y − (1 − λ2 )2 = 0,
Throughout this book, we studied algebra with an emphasis on the concept of an algebraic structure.
As mentioned in the preface, the term “algebraic structure,” though not uncommon, does not have
a mathematically precise definition that would make it possible to say whether something is an
algebraic structure or not. Categories formalize and generalize what we previously called (for lack
of a better term) an algebraic structure.
Categories take one step further in abstraction. The objects of interest are no longer a single
group or a single ring, but the class of all groups or the class of all commutative rings. Consequently,
the theory of categories underscores the unity between different structures of interest in mathematics.
For the purposes of this textbook, this chapter serves as a culmination point. However, the theory
of categories is a rich theory with applications in every branch of mathematics. Consequently, this
chapter only offers an introduction that draws from and generalizes many of the constructions in
this textbook. Section 13.1 introduces the concept of a category and presents many examples. Then
Section 13.2 defines functors and shows examples again taken from this earlier parts of this book.
For further reading, we suggest the classic text [43].
Categories do not stand as the be all and end all of mathematical reasoning. Though categories
provide a consistent framework to develop theory for each new mathematical context, each category
presets its own intrinsically interesting theorems and areas of investigation (e.g., the Jordan-Hölder
program, the classification of finite simple groups, the study of divisibility in rings, the Jordan
canonical form, the Fundamental Theorem of Algebra, and countless other results).
13.1
Introduction to Categories
13.1.1 – Axioms for Categories
675
676 CHAPTER 13. CATEGORIES
Many authors denote a specific category by a boldfaced code or acronym that evokes the English
terminology that designates that category. This textbook follows this habit of notation. However,
these codes or acronyms are not universally standard.
Example 13.1.3 (Sets). In the category Set of sets, the objects consist of sets and the arrows
consist of functions between sets. In fact, the data of a category appears modeled after sets and
functions between sets, including the notion of composition of functions and an identity function
on each set. The fact that function composition is associative in the sense required by the category
axioms requires a small proof given in Proposition 1.1.15.
We point out one minor technicality with functions to or from the empty set. Recall that a
function f : A → B is defined as a relation from A to B such that for all a ∈ A, there exists a
unique b ∈ B with f (a) = b. For all sets X, this definition allows for one and only one function
f : ∅ → X, namely the empty function or the empty relation. On the other hand, because of this
definition, if X is nonempty, there exist no functions f : X → ∅, not even the empty function.
Hence, Hom(X, ∅) = ∅. 4
Example 13.1.4 (Groups). In the category Grp of groups, the objects consist of all groups (a
set G along with a binary operation ∗ on G satisfying the group axioms) and the arrows consist of
group homomorphisms. It is easy to check that these collections of objects and of arrows constitute
a category. 4
Example 13.1.5 (Rings). In the category Ring of rings, the objects consist of all rings and the
arrows consist of ring homomorphisms. 4
Taking a cue from set theory and group theory, we label the following properties of arrows.
Definition 13.1.6
In a category, an arrow/morphism f : X → Y is called
(1) an isomorphism or invertible if there exists a morphism g : Y → X such that f ◦ g =
idY and g ◦ f = idX ; the arrow g is called an inverse of f ;
(2) an endomorphism if X = Y , and the class of endomorphisms on X is denoted by
End(X);
As examples, recall that in the category of sets, we called an isomorphism between sets a bijection
and, in set theory terms, an automorphism is a permutation.
These initial examples begin to illustrate how the formalism of the various algebraic structures
that formed the beginning of this textbook fall under the consistent framework of categories. In
the terminology for categories, the reader should recognize the terms morphism (shortened from
homomorphism), isomorphism, endomorphism, and automorphism.
13.1. INTRODUCTION TO CATEGORIES 677
Proposition 13.1.7
If f : X → Y is an invertible arrow, then it has a unique inverse arrow.
Proof. Suppose that g1 and g2 are two inverses of the arrow f . Then
f g
X Z
g◦f
g◦f
f g h
X Y Z W
h◦g
Arrow diagrams can effectively depict many situations in category theory and, at times, even
offer visual proofs of certain relationships.
Definition 13.1.8
A category B is called a subcategory of a category C if every object in Ob(B) is in Ob(C)
and if for any two objects X and Y in Ob(B) every arrow f : X → Y in the category B is
also an arrow in the category C. A subcategory is called a full subcategory if HomB (X, Y ) =
HomC (X, Y ) for all objects X, Y in Ob(B).
The category AbGrp of abelian groups is a full subcategory of Grp. It is important to note
that though an abelian group has more axioms (requirements) than an arbitrary group, no extra
data is necessary to describe an abelian group. Furthermore, if G and H are abelian groups and if
ϕ : G → H is a group homomorphism, then ϕ is a homomorphism of abelian groups.
Consider the category RingId in which the objects are rings with an identity 1 6= 0 but in
which the arrows are ring homomorphisms ϕ : R → S such that ϕ(1R ) = 1S . The class of objects
Ob(RingId) is a subclass of Ob(Ring) but the class of arrows in RingId is a strict subclass of
Arr(Ring). (The ring homomorphism ϕ : Z → Z/6Z defined by ϕ(n) = 3̄n̄ does not map 1 to 1.)
Hence, RingId is a subcategory of Ring but not a full subcategory.
In contrast, consider the category Set∗ of pointed sets. The objects of Set∗ consist of a pair
(X, x0 ) where X is a set and x0 is a selected element in X. A morphism in Set∗ from (X, x0 )
678 CHAPTER 13. CATEGORIES
to (Y, y0 ) is defined as a function f : X → Y between sets such that f (x0 ) = y0 . The category
Set∗ is not a subcategory of Set for two reasons. First, the data of a pointed set consists of more
information than just a set; the object (X, x0 ) such that x0 ∈ X is not a set. Second, though part
of the data of an arrow from a pointed set (X, x0 ) to (Y, y0 ) consists of a function f : X → Y , not
every arrow in HomSet (X, Y ) is in HomSet∗ (X, Y ).
Definition 13.1.9
A category C is called a small category if Ob(C) and Arr(C) are sets and is called large
otherwise. A large category C is called locally small if for any two objects X and Y , the
set of arrows HomC (X, Y ) is a set.
The category Set is a large category but is locally small; in fact, for any two sets X and Y , each
function f : X → Y corresponds to a subset of X × Y , so Hom(X, Y ) is a subset of P(X × Y ).
Questions about whether a category is large or small or locally small often involve challenging
questions in the foundations of set theory, which are beyond the scope of this text.
Example 13.1.10 (Vector Spaces). Let F be a field. Vector spaces over F form a category VecF
where the objects are vector spaces and the arrows of VecF are linear transformations between
vector spaces. The terms isomorphisms, endomorphisms, and automorphisms in general categories
are consistent with the terminology from linear algebra. 4
Example 13.1.11 (Posets). Posets form a category Poset. Objects consist of posets, i.e., a pair
(S, 4) where S is a set and 4 is a partial order on S. (See Definition 1.4.1.) The arrows Arr(Poset)
consist of monotonic functions between posets. (See Definition 1.4.17.) 4
Example 13.1.12 (Left R-Modules). Let R be a ring. Left R-modules form a category denoted
by LModR . The arrows are left R-module homomorphisms. Note that if R and S are nonisomorphic
rings, then the category of left R-modules and left S-modules are distinct categories. Not only might
the objects be different but the homomorphisms satisfy different rules in that they are linear with
respect to different scalars. 4
Example 13.1.13 (Group Actions). Let G be a group. Group actions can be viewed as a cate-
gory SetG in which an object of Ob(SetG ) is a set S along with a pairing G × S → S satisfying the
axioms of group actions (Definition 8.1.1). Morphisms of group actions are G-invariant functions or
homomorphisms of G-sets, defined in Definition 8.1.16. 4
13.1. INTRODUCTION TO CATEGORIES 679
The following three examples discuss categories occurring regularly in calculus and analysis.
Though it is simple to define these categories, the study of their properties constitutes a fundamental
theme for branches of analysis and topology.
Example 13.1.14. Consider the category whose objects consist of open subsets of R and whose
arrows consist of continuous functions between open subsets of R. Much of the study of continuous
functions occurs in this category. In order for this data to form a category, the composition of two
continuous functions must be continuous, which is a nontrivial theorem. When we study differen-
tiable functions, we work in the subcategory consisting again of open subsets of R but in which the
morphisms consist of continuously differentiable functions between open subsets of R. 4
Example 13.1.15 (Metric Spaces). At an abstract level, much of geometry deals with a set
equipped with a notion of distance. Recall the notion of a metric space from Section 3.9.1. The
category MetSp of metric spaces consists of objects that are metric spaces and the arrows are
isometries between metric spaces. 4
Example 13.1.16 (Topological Spaces). Section 12.8.2 introduced the concept of a topological
space as background behind the Zariski topology. Topological spaces along with continuous functions
between them as the arrows form a category, often labeled Top. As a point of terminology, in
topology, an isomorphisms between two topological spaces is called a homeomorphism. 4
The subsequent examples of categories illustrate the flexibility in what a category can be.
Example 13.1.17 (Empty). The empty category, sometimes denoted 0 is the category consisting
of no objects and no arrows. 4
Example 13.1.18 (Category of a Poset). Every poset is itself a category in the following sense.
Let S = (S, 4) be a poset. The objects Ob(S) consist of the elements of the set S and there exists a
single arrow x → y between two elements x, y ∈ S if and only if x 4 y. In other words, HomS (x, y)
is the empty set if x 64 y but HomS (x, y) contains a single arrow whenever x 4 y. 4
Example 13.1.19 (Ordinal Numbers). Suppose that we denote n = {1, 2, . . . , n}. Under the
category of a poset as described in the previous example, the sets 1, 2, . . ., and N are categories
when equipped with the usual inequality ≤ partial order. For example, 5 = {1, 2, 3, 4, 5} is a category
with 5 objects and a unique arrow f : a → b in Hom(a, b) if a ≤ b. For each a ∈ n, the unique arrow
in Hom(a, a) is the identity arrow. 4
Definition 13.1.20
A category C is called discrete if for any two objects X and Y in Ob(C),
(
{idX } if X = Y
HomC (X, Y ) =
∅ 6 Y.
if X =
Example 13.1.21 (Directed Graphs). Every directed graph defines a category in the following
sense. Recall Definition 10.11.1 for a directed graph, also called a quiver. In the category Cat(Q)
associated to a directed graph Q = (V, E, h, t) the objects are the vertices (elements of V ) and the
arrows consist of all the paths in Q. Recall that the paths consist of stationary paths ev , one for
each vertex v ∈ V (these are the identity morphisms) and all sequences of arrows strung together
head to tail. For example, the category associated to the following directed graph
Y
f g
X Z
h
680 CHAPTER 13. CATEGORIES
has three objects, namely X, Y , and Z and exactly 7 arrows, namely eX , eY , eZ , f, g, h, gf . The
existence of the composition gf is implied by the axioms for categories so from the given diagram
we assume that h and gf are distinct arrows with gf not explicitly pictured, except as a path of two
directed edges. If an arrow has an inverse, it is not uncommon to depict an arrow f and its inverse
f −1 as a double arrow edge. 4
Example 13.1.22. Let F be a field. Consider the category C of vector spaces over a field F
equipped with a bilinear form. The objects of C are obvious from our description: a pair (V, h , i),
where V is a vector space over the field F and where h , i : V × V → F is a bilinear form. However,
the arrows of this category are not explicitly defined. Though the definition of a category does not
impose how to define the arrows of C it is natural to define the morphisms as follows. A morphism
from (V, h , iV ) to (W, h , iW ) is a linear transformation T : V → W such that
As a particular example, if · is the dot product on Rn , the automorphisms T of (Rn , ·) are precisely
the orthogonal linear transformations on Rn . If the matrix of T with respect to the standard basis
is A, then
v1 · v2 = T (v1 ) · T (v2 ) ⇐⇒ [v1 ]> [v2 ] = [v1 ]> A> A[v2 ]
for all v1 , v2 ∈ Rn . This implies that A> A = I, which is the definition of an orthogonal matrix. 4
Example 13.1.23 (Morphisms of a Category). Let C be any category. Define the category
Mor(C) of morphisms of C as follows. The objects in Mor(C) are the arrows of C, i.e., Ob(Mor(C)) =
Arr(C). An arrow ϕ in Mor(C) from f : A → B to g : C → D consists of a pair (ϕdom : A →
C, ϕcod : B → D) that make the following diagram commutative.
ϕdom
A C
f g
B ϕcod D
In other words, ϕcod ◦ f = g ◦ ϕdom . We leave it as an exercise to prove that Mor(C) is a category,
namely, that this definition of arrows on Mor(C) implies the existence of an identity arrow and a
composition operation that is associative. (See Exercise 13.1.23.) 4
Definition 13.1.24
An arrow a : X → Y in a category C is called
(1) monic if whenever two arrows f, f 0 : U → X satisfy a ◦ f = a ◦ f 0 , then f = f 0 ;
In other words, an arrow is monic when it is left cancellable and it is epic when it is right
cancellable. The etymology for each of these adjectives come from the Greek prefixes of mono-,
which means “alone” or “single,” and epi-, which means “over” or “onto.” These reason for the
use of these adjectives comes from the following characterization of monic and epic arrows in the
category of sets.
13.1. INTRODUCTION TO CATEGORIES 681
Proposition 13.1.25
In Set, monic morphisms are precisely injective functions and epic morphisms are surjective
functions.
Proof. First suppose that m : X → Y be a monic arrow in Set. Let x, x0 ∈ X with m(x) = m(x0 ).
Consider the a set U = {1} of a single element and the two functions f, f 0 : U → X such that
f (1) = x and f 0 (1) = x0 . Clearly m ◦ f = m ◦ f 0 since it is equal as a function on all inputs. Since m
is monic, then f = f 0 and hence x = f (1) = f 0 (1) = x0 . Thus, m is injective. Conversely, suppose
that m : X → Y is an injective function. Consider two injective functions such that m ◦ f = m ◦ f 0 .
Then m(f (x)) = m(f 0 (x)) for all x ∈ X. Since m is injective, we deduce that f (x) = f 0 (x) for all
x ∈ X, which means that f = f 0 . Thus, m is monic.
For epics, first suppose that e : X → Y be an epic arrow in Set. Consider the two functions
g, g 0 : Y → {1, 2} defined as g(y) = 1 and
(
0 1 if y ∈ Im e
g (y) = ξIm e =
2 if y ∈ / Im e.
Definition 13.1.26
Let C be a category.
(1) An object I in Ob(C) is called initial if for each object X in Ob(C), there exists
exactly one morphism f : I → X.
(2) An object T in Ob(C) is called terminal if for each object X in Ob(C), there exists
exactly one morphism g : X → T .
Example 13.1.27. As mentioned in Example 13.1.3, there exists exactly one function ∅ → X for
all sets X. Hence, ∅ is an initial object in Set. If a set A contains at least one element a, then
g, g 0 : A → {1, 2} that satisfy g(a) = 1 and g 0 (a) = 2 present two distinct functions with domain A.
Hence, ∅ is the only initial object in Set.
For terminal objects, note that ∅ cannot be terminal since Hom(X, ∅) = ∅ if X is not empty.
However, all singleton sets {a} are terminal since Hom(X, {a}) consist of the constant function
f (x) = a whenever X 6= ∅ and Hom(∅, {a}) is the empty function. Conversely, if A contains more
than one element, then Hom(X, A) contains at least two constant functions. Thus, the terminal
objects in Set are precisely the singleton sets. 4
Example 13.1.28. As another example, consider the category ID of integral domains, which is a
full subcategory of Ring. In ID, the ring (Z, +, ×) is an initial object. Let R be an integral domain
and suppose that ϕ : Z → R is a ring homomorphism. By properties of ring homomorphisms,
ϕ(0) = 0. By Exercise 5.4.19, ϕ(1) = 1R . But then, for all positive n, ϕ(n) = n · 1R , and for all
negative n, ϕ(n) = −(|n| · 1R ). Thus, there exists a unique ring homomorphism from Z to R. 4
682 CHAPTER 13. CATEGORIES
13.2
Functors
There are many constructions in mathematics in which we associate one object to another object
for the purposes of studying properties of the first object. These objects will almost always exist in
the context of certain categories. This general principle of mapping an object in one cateogry to an
object in another is codified with the concept of functors.
Definition 13.2.1
Let A and B be two categories. A covariant functor from A to B is a rule F : A → B that
to each object X in Ob(A) associates a unique object F (X) in Ob(B), and to each arrow
f : X → Y in Arr(A) associates a unique arrow F (f ) : F (X) → F (Y ) such that
(1) (identity) F (idX ) = idF (X) for each object X in Ob(A);
(2) (composition) if f : X → Y and g : Y → Z are two arrows in Arr(A), then
F (g ◦ f ) = F (g) ◦ F (f ).
13.2. FUNCTORS 683
Example 13.2.2. Consider the category Set of sets. The power set rule is a covariant functor
P : Set → Set that to each set X associates the power set P(X) and to each function f : X → Y ,
associates the function P(f ) : P(X) → P(Y ) defined by
P(f )(A) = {f (a) ∈ Y | a ∈ A}
for all A ∈ P(X). In this specific instance, P(f )(A) is often written more simply just as f (A). It
is easy to see that P(idX ) = idP(X) . Furthermore, if f : X → Y and g : Y → Z are functions and
A ⊆ X, then
P(g ◦ f )(A) = {g ◦ f (a) ∈ Z | a ∈ A} = P(g)({f (a) ∈ Y | a ∈ A}) = (P(g) ◦ P(f ))(A). 4
Example 13.2.3. Consider the category Grp of groups. Suppose we simply ignored the group
structure and only considered the set theoretic structure of groups and homomorphisms. This
mental process is called the forgetful functor from Grp to Set, which forgets the group structure
and only remembers the set structure.
The forgetful functor exists from many algebraic structures to Set. 4
The label of covariant means that the functor preserves the direction of arrows when mapping
from one category to another. Plenty of constructions are similar but reverse the order. These are
called contravariant functors.
Definition 13.2.4
Let A and B be two categories. A contravariant functor from A to B is a rule F : A → B
that to each object X in Ob(A) associates a unique object F (X) in Ob(B), and to each
arrow f : X → Y in Arr(A) associates a unique arrow F (f ) : F (Y ) → F (X) such that
F (g ◦ f ) = F (f ) ◦ F (g).
Example 13.2.5. Let V and W be vector spaces over a field F and let T : V → W be a linear
transformation. Recall that the dual of a vector space V is the vector space V ∗ of linear transfor-
mations HomF (V, F ). The dual of T is the linear transformation T ∗ : W ∗ → V ∗ that for all µ ∈ W ∗
returns the element T ∗ (µ) ∈ V ∗ , which is defined by
T ∗ (µ)(v) = µ(T (v)) for all v ∈ V.
∗
It is an easy proof to show that T is indeed a linear transformation. Consider the identity function
idV : V → V . For any functional λ ∈ V ∗ ,
id∗V (λ)(v) = λ(idV (v)) = λ(v)
so id∗V (λ) = λ and hence id∗V = idV ∗ . If S : U → V and T : V → W are linear transformations, then
for all u ∈ U and all µ ∈ W ∗ ,
(T ◦ S) ∗ (µ)(u) = µ((T ◦ S)(u)) = (µ ◦ T )(S(u)) = T ∗ (µ)(S(u))
= S ∗ (T ∗ (µ))(u) = (S ∗ ◦ T ∗ )(µ)(u).
Thus, (T ◦ S)∗ = S ∗ ◦ T ∗ . This shows that taking the dual of a vector space is a contravariant
functor VecF → VecF . 4
Proposition 13.2.6
A functor, whether covariant or contravariant, from a category A to a category B transforms
an isomorphism in A to an isomorphism in B.
684 CHAPTER 13. CATEGORIES
Proof. We prove the case for covariant functors since the proof for contravariant functors is similar.
Let F : A → B be a covariant functor and let f : X → Y be an isomorphism in A with inverse
g : Y → X. By definition of the inverse, g ◦ f = idX and f ◦ g = idY . Using both axioms for
functors, we deduce that
F (g) ◦ F (f ) = F (g ◦ f ) = F (idX ) = idF (X) ,
F (f ) ◦ F (g) = F (f ◦ g) = F (idY ) = idF (Y ) .
Thus, F (f ) : F (X) → F (Y ) is an isomorphism.
With the definition of functors between categories, the class of all categories is itself a category.
More precisely CatCo, in which objects are categories and arrows are covariant functors is a category
as well as CatCon, in which objects are categories and arrows are contravariant functors.
Applying categorical terms to CatCo or CatCon, a covariant (resp. contravariant) functor F :
A → B between categories is called an isomorphism if there exists a covariant (resp. contravariant)
functor G : B → A such that G(F (X)) = X for all X ∈ Ob(A), G(F (f )) = f for all f ∈ Arr(A),
F (G(Y )) = Y for all Y ∈ Ob(B), F (G(h)) = h for all h ∈ Arr(B). If there exists an isomorphism
between two categories, we say that the categories are isomorphic.
T ? T
W W
?
In the above diagram, given a linear transformation T : V → W and an invertible linear transfor-
mation g : V → V , there is not a natural way to define an invertible linear transformation W → W
based on T and g. In the diagram, there is no natural function from W to V , especially if T is not
an isomorphism.
With group homomorphisms, it is always possible to define a trivial homomorphism from one
group to another. In the same vein, we could try to construct a functor (that would not be too
interesting) by defining GL(T ) : GL(V ) → GL(W ) as the group homomorphism with GL(T )(g) = 1
in GL(W ) for all linear transformations T and all g ∈ GL(V ). However, this violates the first axiom
of functors that requires that GL(idV ) = idGL(V ) , so that GL(idV )(g) = g instead of 1 in GL(V ).
Hence, the process of constructing the general linear group is not a functor. 4
13.2. FUNCTORS 685
Example 13.2.9. Fix a positive integer n. In contrast to the previous example, consider the rule
of taking a ring R and returning the matrix ring Mn (R) as described in Section 5.3. For any ring
homomorphism ϕ : R → S define Mn (ϕ) : Mn (R) → Mn (S) by
ϕ(a11 ) ϕ(a12 ) · · · ϕ(a1n )
ϕ(a21 ) ϕ(a22 ) · · · ϕ(a2n )
Mn (ϕ)(A) = .
.. .. ..
.. . . .
ϕ(an1 ) ϕ(an2 ) · · · ϕ(ann ),
for all matrices A = (aij ). We need to prove first that Mn (ϕ) is a ring homomorphism. It is easy to
see that
Mn (ϕ)(A + B) = Mn (ϕ)(A) + Mn (ϕ)(B).
Suppose that C = AB, where A, B ∈ Mn (R), then the (i, j)th entry of C is
n
X
cij = aik bkj .
k=1
which is the (i, j)th entry of Mn (ϕ)(A)Mn (ϕ)(B). Thus, Mn (ϕ) is a ring homomorphism. This
rule will also satisfy the two axioms of a functor so that the process Mn is a covariant functor from
Ring to Ring. For n ≥ 2, this functor is not an isomorphism of categories since Mn (R) is not
commutative for any ring with |R| ≥ 2. 4
Example 13.2.10. A group G can by viewed as a category of one object O in which every arrow
is invertible. The elements of G are the arrows of the category and the composition of the arrows
gives the group operation. Let K be a field and consider a covariant functor F : G → VectK . The
functor F maps the object F (O) to a vector space V over K and every group element, which we view
as an arrow g : O → O, is mapped to a linear transformation F (g) : V → V . Since functors map
isomorphisms to isomorphisms, then F (g) is an invertible linear transformation. Thus, a functor
from G to VectK is precisely a representation of G, as discussed in Section 8.6. 4
Example 13.2.11. Let E be a field and consider the category SubField(E) of subfields of E,
which is a category by virtue of the poset structure by containment. For a subfield K of E, consider
the rule of constructing the group Aut(E/K) of automorphisms of E that fix K. An arrow L → K
in SubField(E) corresponds so containment L ⊆ K. For each arrow L ⊆ K, we can define the
morphism of injection ϕL⊆K : Aut(E/K) → Aut(E/L) because every automorphism of E that fixes
K also fixes L.
For each field extension K ⊆ E in the SubField(E), the group homomorphism ϕK⊆K :
Aut(E/K) → Aut(E/K) is the identity. This is the first axiom of functors. The second axiom
of functors also holds since the composition of injective group homomorphisms is another injective
group homomorphism. Hence, the rule that constructs the automorphism group F(K) = Aut(E/K)
is a contravariant functor from SubField(E) to Grp. 4
Definition 13.2.12
Let A, B, and C be categories. Suppose that for any two objects A in A and B in B, a rule
F returns an object F (A, B) in C such that F (A, ) is a functor from B to C and F ( , B) is
a functor from A to C. Such a rule is called a bifunctor from A × B to C.
686 CHAPTER 13. CATEGORIES
Example 13.2.13. Let R be a ring. By Proposition 10.4.11, for any left R-modules M and N , the
set HomR (M, N ) of left R-module homomorphisms from M to N is another left R-module. We will
show that the rule HomR ( , ) is a bifunctor that is covariant in the second entry and contravariant
in the first.
For a fixed left R-module M , consider the functor FM (X) = HomR (M, X) for any left R-module
X. If ϕ : X → X 0 is a module homomorphism, then we define
It is easy to verify the identity and composition axioms for a covariant functor.
On the other hand, for a fixed left R-module N , consider the functor F N (Y ) = HomR (Y, N ) for
any left R-module Y . If ψ : Y → Y 0 is a module homomorphism, then we define
Again, it is easy to verify the identity and composition axioms for a covariant functor. 4
HX
F (X) G(X)
F (f ) G(f )
F (Y ) G(Y )
HY
Example 13.2.14. Let Q = (V, E, h, t) be a directed graph and consider the associated category
Cat(Q) as described in Example 13.1.21. Consider the category C of covariant functors from Cat(Q)
to the category of vectors spaces VectK , where K is a field. We will show that the category C is
isomorphic as categories to LModK[Q] , the category of modules over the path algebra K[Q].
The data for a functor F : Cat(Q) → VectK gives a K-vector space F (v) for every v ∈ V
and gives a linear transformation F (a) : F (t(a)) → F (h(a)) for each directed edge a ∈ E. By
Proposition 10.11.4, this is precisely the data for a left K[Q]-module. Now let F and G be two
functors Cat(Q) → VectK . In the category of functors, a morphism f from F to G is a rule h that
to each vertex v ∈ V (objects in Cat(Q)) associates a linear transformation fv : F (v) → G(v) such
that for each directed edge a ∈ E, the following diagram is commutative:
ft(a)
F (t(a)) G(t(a))
F (a) G(a)
F (h(a)) G(h(a))
fh(a)
13.2. FUNCTORS 687
By Proposition 10.11.6, this is precisely the data of a homomorphism between K[Q]-modules. Con-
sequently, there is a category isomorphism between the category of functors from Cat(Q) to VectK
and LModK[Q] . 4
A.1
The Algebra of Complex Numbers
A.1.1 – Complex Numbers
When studying solutions to polynomials, one quickly encounters equations that do not have real
roots. One of the simplest examples is the equation x2 + 1 = 0. Since the square of every real
number is nonnegative, there exists no real number such that x2 = −1. The complex numbers begin
with “imagining” that there exists a number i such that i2 = −1. Historically, the strangeness of the
mental leap led to calling i the imaginary unit. Algebraists then assumed that for all other algebraic
properties, i interacted with the real numbers just like any other real number. The powers of the
imaginary unit i are
i1 = i, i2 = −1, i3 = −i, i4 = 1, i5 = i, . . .
and so forth.
An expression of the form bi, where b is a real number, is called an imaginary number and a
complex number is an expression of the form a + bi, where a, b ∈ R. The set of complex numbers is
denoted by C. It is not uncommon to denote a complex variable by a letter z. If z = a + bi, we call
a the real part of z, denoted a = <(z), and we call b the imaginary part of z, denoted by b = =(z).
With this definition, every quadratic equation has a root. For example, applied to 2x2 +5x+4 = 0,
the quadratic formula gives the following solutions:
√ √ √
−b ± b2 − 4ac −5 ± 25 − 32 5 7
x= = =− ± i.
2a 4 4 4
The quadratic formula shows that whenever a + bi is a root of an equation with real coefficients,
then a − bi is also a root. So in some sense, these two complex numbers are closely related. If
z = a + bi, we call the number a − bi, the conjugate of z, and denote it by z.
Since a complex number involves two independent real numbers, C is usually depicted by a
Cartesian plane with <(z) on the x-axis and =(z) as the y-axis. This is called the complex plane.
Figure A.1 shows a few complex numbers.
Just as polar coordinates are useful for analytic geometry, so they are also useful in the study of
complex numbers. The absolute value |z|, also called the√modulus, of a complex number z = a + bi
is the distance r from the origin to z, namely |z| = r = a2 + b2 . The argument of z is the angle θ
of polar coordinates of the point (a, b). Using polar coordinates of a complex number, we write
with r ≥ 0. As with polar coordinates, though we typically consider θ ∈ [0, 2π), the argument θ can
be any real number.
For example, the absolute value and the argument of 3 + 2i are
√
p
2 2 −1 2
|3 + 2i| = 3 + 2 = 13 and θ = tan .
3
The absolute value and the argument of −12 + 5i are
p
−1 5
| − 12 + 5i| = 122 + 52 = 13 and θ = tan − + π.
12
689
690 APPENDIX A. APPENDICES
=(z)
1 + 3i
−π + 1.4i
<(z)
− 45 − 74 i
Without discussing the issue of convergence of series, using the power series of known functions,
observe that
∞ ∞
X (iθ)k X θk
eiθ = = ik
k! k!
k=0 k=0
k k
X θ X θ
= (−1)k/2 + i (−1)(k−1)/2
k! k!
k≥0 k≥0
k even k odd
∞ ∞
! !
X θ2n n
X θ2n+1
n
= (−1) +i (−1)
n=0
(2n)! n=0
(2n + 1)!
= cos θ + i sin θ.
z = reiθ ,
where r = |z| and θ is the argument of z. A few examples of complex numbers in polar form are
−1 = e−iπ or i = eiπ/2 .
A.1.2 – Operations in C
The addition of two complex numbers is defined as
def
(a + bi) + (c + di) = (a + c) + (b + d)i.
Since the real part and the imaginary part act as x-coordinates and y-coordinates for vectors in
R2 and since the addition of complex numbers is done component-wise, the addition of complex
numbers is identical to the addition of vectors in R2 . The subtraction of two complex numbers is
(a + bi) − (c + di) = (a − c) + (b − d)i.
The product of two complex numbers is expressed in Cartesian coordinates as
def
(a + bi)(c + di) = ac + adi + bci + bd(−1) = (ac − bd) + (ad + bc)i.
Besides this identity, the formula for multiplication does not readily appear to have interesting
geometric interpretation. However, the polar coordinate expression of complex numbers leads im-
mediately to an interpretation of the product. We have
(r1 eiθ1 )(r2 eiθ2 ) = r1 r2 ei(θ1 +θ2 ) ,
so the product of two complex numbers z1 and z2 has absolute value |z1 z2 | = |z1 ||z2 | and has an
argument that is the sum of the arguments of z1 and z2 .
For division of two complex numbers, let z1 = a + bi = r1 eθ1 i and z2 = c + di = r2 eθ2 i 6= 0 be
two complex numbers. Then the two expressions of division are
z1 a + bi (a + bi)(c − di)
= =
z2 c + di c2 + d 2
r1
= e(θ1 −θ2 )i .
r2
Using the complex conjugate, we can express the inverse of a complex number as
1 z z
z −1 = = = 2.
z zz |z|
From the multiplication operation, if z = reiθ , where r ≥ 0 and θ is an angle, then for all integers
n ∈ Z, the powers of z are z n = rn einθ . The argument θ is equivalent to any angle θ + 2πk for any
k ∈ Z. Consequently, all of the following complex numbers
√
n
rei(θ+2πk)/n for k = 0, 1, 2, . . . , n − 1, (A.2)
have an nth power equal to z = reiθ . Consequently, the n numbers in (A.2) are the nth roots of z.
As one example, consider the cube roots of i. We write i = eiπ/2 so the cube roots of i are
√ √
3 1 5iπ/6 3 1
e iπ/6
= + i, e i(π/6+2π/3)
=e =− + i, ei(π/6+4π/3) = e3iπ/2 = −i.
2 2 2 2
As another example, we calculate the square roots of z = 1 + 3i from the polar form
√ −1
z = 10ei tan (3) .
Hence, the square roots of z are
√ −1 √ −1 √ −1
10ei tan (3)/2 10ei tan (3)/2+iπ
= − 10ei tan (3)/2 .
4 4 4
and
Combining various trigonometric identities we get
s s
1 −1 1 1 1 1 1
cos tan r = 1+ √ and sin tan−1 r = 1− √ .
2 2 1 + r2 2 2 1 + r2
Then the two square roots of 1 + 3i are
s s !
√ √ √
q q
4 1 1 1 1 1
± 10 1+ √ +i 1− √ = ±√ 10 + 1 + i 10 − 1 .
2 1 + 32 2 1 + 32 2
A.2
Lists of Groups
This section provides various lists of groups according to a classifying property.
693
694 APPENDIX A. APPENDICES
[1] William A. Adams and Philippe Loustaunau. An Introduction to Gröbner Bases, volume 3 of
Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, 2005.
[2] George E. Andrews. The Theory of Partitions. Encyclopedia of Mathematics and its Applica-
tions. Cambridge University Press, Cambridge, U.K., 1998.
[3] Vladimir I. Arnold. Ordinary Differential Equations. MIT Press, Cambridge, MA, 1973.
[4] Michael Aschbacher and Stephen D. Smith. The Classification of Quasithin Groups. I. Struc-
ture of Strongly Quasithin K-groups, volume 111 of Mathematical Surveys and Monographs.
American Mathematical Society, Providence, RI, 2004.
[5] Michael Aschbacher and Stephen D. Smith. The Classification of Quasithin Groups. II. Main
Theorems: The Classification of Simple QTKE-groups, volume 112 of Mathematical Surveys
and Monographs. American Mathematical Society, Providence, RI, 2004.
[6] Michael F. Atiyah and Ian G. MacDonald. Introduction to Commutative Algebra. Perseus
Books, Reading, MA, 1969.
[7] Michale J. Bardzell and Kathleen M. Shannon. The PascGalois triangle: A tool for visualizing
abstract algebra. In Allen C. Hibbard and Ellen J. Maycock, editors, Innovations in Teach-
ing Abstract Algebra, number 60 in MAA Notes, pages 115–123. Mathematical Association of
America, Washington, DC, 2002.
[8] Nathan Bliss, Ben Fulan, Stephen Lovett, and Jeff Sommars. Strong divisibility, cyclotomic
polynomials, and iterated polynomials. The American Mathematical Monthly, 120(6):519–536,
2013.
[9] N. N. Bogolyubov, G. K. Mikhailov, and A. P. Yushkevich, editors. Euler and Modern Science.
The MAA Tercentenary Euler Celebration. Mathematical Association of America, Washington,
DC, 2007.
[10] William W. Boone. The word problem. Proceedings of the National Academy of Sciences,
44:1061–1065, 1958.
[11] William E. Boyce and Richard C. DiPrima. Elementary Differential Equations. John Wiley
and Sons, Inc., New York, 9th edition, 2009.
[12] William Burnside. Theory of Groups of Finite Order. Cambridge University Press, Cambridge,
2nd edition, 1911.
[13] John Clough. A rudimentary geometric model for contextual transposition and inversion. Jour-
nal of Music Theory, 42(2):297–306, 1998.
[14] John Conway, Rob Curtis, Simon Norton, Richard Parker, and Robert Wilson. Atlas of Finite
Simple Groups. Oxford University Press, Oxford, 1986.
[15] David A. Cox. Galois Theory. John Wiley & Sons, Inc., Hoboken, NJ, 2004.
[16] David A. Cox, John B. Little, and Donal O’Shea. Ideals, Varieties, and Algorithms. Springer-
Verlag, New York, 2nd edition, 1997.
[17] David A. Cox, John B. Little, and Donal O’Shea. Using Algebraic Geometry, volume 185 of
Graduate Texts in Mathematics. Springer-Verlag, New York, 2nd edition, 2005.
699
700 BIBLIOGRAPHY
[18] Alissa S. Crans, Thomas M. Fiore, and Ramon Satyendra. Musical actions of a dihedral group.
The American Mathematical Monthly, 116(6):479–495, 2009.
[19] Richard Dedekind. Über der Theorie der ganzen algebraischen Zahlen. Friedr. Vieweg & Sohn,
Braunschweig, 1964. With a foreword by B. van der Waerden.
[20] John D. Dixon and Brian Mortimer. Permutation Groups, volume 163 of Graduate Texts in
Mathematics. Springer-Verlag, New York, 1996.
[21] Murray Eisenberg. Axiomatic Theory of Sets and Classes. Holt, Rinehart and Winston, New
York, 1971.
[22] David Eisenbud. Commutative Algebra with a View Toward Algebraic Geometry, volume 150
of Graduate Texts in Mathematics. Springer-Verlag, New York, 1995.
[23] Walter Feit and John G. Thompson. Solvability of groups of odd order. Pacific Journal of
Mathematics, 13:775–1029, 1963.
[24] Allen Forte. The Structure of Atonal Music. Yale University Press, New Haven, CT, 1973.
[25] Abraham A. Fraenkel and Yehoshua Bar-Hillel. Foundations of Set Theory. North-Holland
Publishing Company, Amsterdam, 1958.
[26] Ralf Fröberg. An Introduction to Gröbner Bases. Wiley, New York, 1997.
[27] William Fulton. Young Tableaux: With Applications to Representation Theory and Geome-
try, volume 35 of London Mathematical Society Student Texts. Cambridge University Press,
Cambridge, U.K., 1996.
[28] Carl Friedrich Gauss. Disquisitiones Arithmeticae. Springer-Verlag, New York, 1986. Translated
and with a preface by Arthur A. Clarke, Revised by William C. Waterhouse, Cornelius Greither,
and A. W. Grootendorst and with a preface by Waterhouse.
[29] Aleksandr Gelfond. Sur le septième problème de Hilbert. Bulletin de l’Académie des Sciences
de l’URSS, 4:623–634, 1934.
[30] Daniel Gorenstein. The classification the finite simple groups. I. simple groups and local analysis.
Bulletin of the AMS. New Series, 1(1):43–199, 1979.
[31] Daniel Gorenstein. Classifying the finite simple groups. Bulletin of the AMS, 14(1):1–98, 1986.
[32] Branko Grunbaum and Geoffrey Shephard. Tilings and Patterns. W.H. Freeman, New York,
1990.
[33] Paul R. Halmos. Naive Set Theory. D. Van Nostrand, Princeton, NJ, 1960.
[34] Ján Haluska. The Mathematical Theory of Tone Systems. CRC Press, New York, 2003.
[35] G. H. Hardy and E. M. Wright. An Introduction to the Theory of Numbers. Oxford University
Press, New York, 6th edition, 2008.
[36] Glyn Harmon. Prime-Detecting Sieves. Princeton University Press, Princeton, NJ, 2007.
[37] Joe Harris. Algebraic Geometry: A First Course, volume 133 of Graduate Texts in Mathematics.
Springer-Verlag, New York, 1992.
[38] Robin Hartshorne. Algebraic Geometry, volume 52 of Graduate Texts in Mathematics. Springer-
Verlag, New York, 1977.
[40] James E. Humphreys. Reflection Groups and Coxeter Groups. Cambridge University Press,
Cambridge, U.K., 1992.
[41] T. Y. Lam and K. H. Leung. On the cyclotomic polynomial φpq (x). The American Mathematical
Monthly, 103(7):562–564, 1996.
[42] I. G. Macdonald. Symmetric Functions and Hall Polynomials. Oxford Mathematical Mono-
graphs. Oxford University Press, New York, 1999.
[43] Saunders MacLane. Categories for the Working Mathematician. Graduate Texts in Mathemat-
ics. Springer-Verlag, New York, 1971.
[44] Hideyuki Matsumura. Commutative Ring Theory, volume 8 of Cambridge Studies in Advanced
Mathematics. Cambridge University Press, New York, 1986.
[45] James McKay. Another proof of Cauchy’s group theorem. The American Mathematical Monthly,
66(2):119, 1959.
[46] Richard A. Mollins. Algebraic Number Theory. Chapman & Hall, Boca Raton, FL, 1999.
[47] J. Donald Monk. Introduction to Set Theory. McGraw-Hill, New York, 1969.
[48] Patrick Morandi. Field and Galois Theory. Graduate Texts in Mathematics. Springer-Verlag,
New York, 1996.
[49] Peter M. Neumann. A lemma that is not Burnside’s. Mathematical Scientist, 4(2):133–141,
1979.
[50] P. S. Novikov. On the algorithmic unsolvability of the word problem in group theory. Proceedings
of the Steklov Institute of Mathematics, 44:1–143, 1955. In Russian.
[51] Graham Priest. An Introduction to non-Classical Logic. Cambridge University Press, Cam-
bridge, U.K., 2001.
[52] R. Remmert. The fundamental theorem of algebra. In Numbers, volume 123 of Graduate Texts
in Mathematics, chapter 4. Springer-Verlag, New York, 1990.
[53] Kenneth H. Rosen. Elementary Number Theory and Its Applications. Addison Wesley, New
York, 5th edition, 2005.
[54] R. L. Roth. On extensions of Q by square roots. The American Mathematical Monthly,
78(4):392–393, 1971.
[55] Herman Rubin and Jean E. Rubin. Equivalents of the Axiom of Choice. North-Holland Pub-
lishing, Amsterdam, 1963.
[56] John R. Rutter. Geometry of Curves. Chapman and Hall/CRC, Boca Raton, FL, 2000.
[57] C. Edward Sandifer. How Euler Did It. The MAA Tercentenary Euler Celebration. Mathemat-
ical Association of America, Washington, DC, 2007.
[58] Stewart Shapiro. Philosophy of Mathematics: Structure and Ontology. Oxford University Press,
Oxford, U.K., 1997.
[59] Robert R. Stoll. Introduction to Set Theory and Logic. W.H. Freeman, San Francisco, 1963.
[60] C. L. F. von Lindemann. Über die Zahl π. Mathematische Annalen, 20:213–225, 1882.
[61] Robert A. Wilson. The Finite Simple Groups, volume 251 of Graduate Texts in Mathematics.
Springer-Verlag, New York, 2009.
[62] Martin M. Zuckerman. Sets and Transfinite Numbers. Macmillan Publishing, New York, 1974.
Mathematics
LOVETT
ABSTRACT ALGEBRA ABSTRACT ALGEBRA
STRUCTURES AND APPLICATIONS
STRUCTURES AND APPLICATIONS
Abstract Algebra: Structures and Applications helps you understand the abstraction of modern algebra. It
ABSTRACT ALGEBRA
emphasizes the more general concept of an algebraic structure while simultaneously covering applications.
• Definition of structure
• Motivation
• Examples
• General properties
• Important objects
• Description
• Subobjects
• Morphisms
• Subclasses
• Quotient objects
• Action structures
• Applications
The text uses the general concept of an algebraic structure as a unifying principle and introduces other
algebraic structures besides the three standard ones (groups, rings, and fields). Examples, exercises,
investigative projects, and entire sections illustrate how abstract algebra is applied to areas of science and
other branches of mathematics.
Features
• Emphasizes the general concept of an algebraic structure as a unifying principle instead of just focusing
on groups, rings, and fields
• Describes the application of algebra in numerous fields, such as cryptography and geometry
• Includes brief introductions to other branches of algebra that encourage you to investigate further
• Provides standard exercises as well as project ideas that challenge you to write investigative or
expository mathematical papers
• Contains many examples that illustrate useful strategies for solving the exercises
STEPHEN LOVETT
K23698
w w w. c rc p r e s s . c o m