0% found this document useful (0 votes)
25 views

Abstract Algebra_ Structures and Applications ( PDFDrive )

The document is an overview of the book 'Abstract Algebra: Structures and Applications' by Stephen Lovett, which focuses on the abstraction of modern algebra and the general concept of algebraic structures. It covers core topics such as definitions, properties, and applications of various algebraic structures including groups, rings, and fields, while providing examples and exercises. The book also highlights the relevance of abstract algebra in fields like cryptography and geometry.

Uploaded by

andreigabe07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Abstract Algebra_ Structures and Applications ( PDFDrive )

The document is an overview of the book 'Abstract Algebra: Structures and Applications' by Stephen Lovett, which focuses on the abstraction of modern algebra and the general concept of algebraic structures. It covers core topics such as definitions, properties, and applications of various algebraic structures including groups, rings, and fields, while providing examples and exercises. The book also highlights the relevance of abstract algebra in fields like cryptography and geometry.

Uploaded by

andreigabe07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 717

Mathematics

LOVETT
ABSTRACT ALGEBRA ABSTRACT ALGEBRA
STRUCTURES AND APPLICATIONS
STRUCTURES AND APPLICATIONS

Abstract Algebra: Structures and Applications helps you understand the abstraction of modern algebra. It

ABSTRACT ALGEBRA
emphasizes the more general concept of an algebraic structure while simultaneously covering applications.

The book presents the core topics of structures in a consistent order:

• Definition of structure
• Motivation
• Examples
• General properties
• Important objects
• Description
• Subobjects
• Morphisms
• Subclasses
• Quotient objects
• Action structures
• Applications

The text uses the general concept of an algebraic structure as a unifying principle and introduces other
algebraic structures besides the three standard ones (groups, rings, and fields). Examples, exercises,
investigative projects, and entire sections illustrate how abstract algebra is applied to areas of science and
other branches of mathematics.

Features
• Emphasizes the general concept of an algebraic structure as a unifying principle instead of just focusing
on groups, rings, and fields
• Describes the application of algebra in numerous fields, such as cryptography and geometry
• Includes brief introductions to other branches of algebra that encourage you to investigate further
• Provides standard exercises as well as project ideas that challenge you to write investigative or
expository mathematical papers
• Contains many examples that illustrate useful strategies for solving the exercises

STEPHEN LOVETT
K23698

w w w. c rc p r e s s . c o m
ABSTRACT ALGEBRA
STRUCTURES AND APPLICATIONS
ABSTRACT ALGEBRA
STRUCTURES AND APPLICATIONS

STEPHEN LOVETT
Wheaton College
Wheaton, Illinois, USA
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2016 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works


Version Date: 20150227

International Standard Book Number-13: 978-1-4822-4891-3 (eBook - PDF)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and
information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission
to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic,
mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or
retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact
the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides
licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment
has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation
without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Contents

Preface ix

1 Set Theory 1
1.1 Sets and Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The Cartesian Product; Operations; Relations . . . . . . . . . . . . . . . . . . . . . 14
1.3 Equivalence Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4 Partial Orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.5 Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2 Number Theory 43
2.1 Basic Properties of Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.2 Modular Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.3 Mathematical Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.4 Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3 Groups 69
3.1 Symmetries of the Regular n-gon . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.2 Introduction to Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.3 Properties of Group Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.4 Symmetric Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.5 Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
3.6 Lattice of Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
3.7 Group Homomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
3.8 Group Presentations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
3.9 Groups in Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
3.10 Diffie-Hellman Public Key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
3.11 Semigroups and Monoids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
3.12 Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

4 Quotient Groups 161


4.1 Cosets and Lagrange’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
4.2 Conjugacy and Normal Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
4.3 Quotient Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
4.4 Isomorphism Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
4.5 Fundamental Theorem of Finitely Generated Abelian Groups . . . . . . . . . . . . 193
4.6 Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

5 Rings 207
5.1 Introduction to Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
5.2 Rings Generated by Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
5.3 Matrix Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
5.4 Ring Homomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
5.5 Ideals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
5.6 Quotient Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
5.7 Maximal Ideals and Prime Ideals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
5.8 Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264

v
vi CONTENTS

6 Divisibility in Commutative Rings 267


6.1 Divisibility in Commutative Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
6.2 Rings of Fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
6.3 Euclidean Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
6.4 Unique Factorization Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
6.5 Factorization of Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
6.6 RSA Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
6.7 Algebraic Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
6.8 Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318

7 Field Extensions 321


7.1 Introduction to Field Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
7.2 Algebraic Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
7.3 Solving Cubic and Quartic Equations . . . . . . . . . . . . . . . . . . . . . . . . . 340
7.4 Constructible Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
7.5 Cyclotomic Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
7.6 Splitting Fields and Algebraic Closure . . . . . . . . . . . . . . . . . . . . . . . . . 362
7.7 Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
7.8 Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376

8 Group Actions 379


8.1 Introduction to Group Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
8.2 Orbits and Stabilizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
8.3 Transitive Group Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
8.4 Groups Acting on Themselves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
8.5 Sylow’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
8.6 A Brief Introduction to Representations of Groups . . . . . . . . . . . . . . . . . . 415
8.7 Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426

9 Classification of Groups 429


9.1 Composition Series and Solvable Groups . . . . . . . . . . . . . . . . . . . . . . . . 429
9.2 Finite Simple Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
9.3 Semidirect Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
9.4 Classification Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453
9.5 Nilpotent Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
9.6 Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463

10 Modules and Algebras 465


10.1 Boolean Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
10.2 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
10.3 Introduction to Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486
10.4 Homomorphisms and Quotient Modules . . . . . . . . . . . . . . . . . . . . . . . . 497
10.5 Free Modules and Module Decomposition . . . . . . . . . . . . . . . . . . . . . . . 504
10.6 Finitely Generated Modules over PIDs, I . . . . . . . . . . . . . . . . . . . . . . . . 513
10.7 Finitely Generated Modules over PIDs, II . . . . . . . . . . . . . . . . . . . . . . . 519
10.8 Applications to Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . 524
10.9 Jordan Canonical Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532
10.10 Applications of the Jordan Canonical Form . . . . . . . . . . . . . . . . . . . . . . 539
10.11 A Brief Introduction to Path Algebras . . . . . . . . . . . . . . . . . . . . . . . . . 546
10.12 Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
CONTENTS vii

11 Galois Theory 557


11.1 Automorphisms of Field Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . 557
11.2 Fundamental Theorem of Galois Theory . . . . . . . . . . . . . . . . . . . . . . . . 564
11.3 First Applications of Galois Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 571
11.4 Galois Groups of Cyclotomic Extensions . . . . . . . . . . . . . . . . . . . . . . . . 577
11.5 Symmetries among Roots; The Discriminant . . . . . . . . . . . . . . . . . . . . . . 583
11.6 Computing Galois Groups of Polynomials . . . . . . . . . . . . . . . . . . . . . . . 593
11.7 Fields of Finite Characteristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602
11.8 Solvability by Radicals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605
11.9 Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611

12 Multivariable Polynomial Rings 613


12.1 Introduction to Noetherian Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613
12.2 Multivariable Polynomials and Affine Space . . . . . . . . . . . . . . . . . . . . . . 619
12.3 The Nullstellensatz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626
12.4 Polynomial Division; Monomial Orders . . . . . . . . . . . . . . . . . . . . . . . . . 631
12.5 Gröbner Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640
12.6 Buchberger’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649
12.7 Applications of Gröbner Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656
12.8 A Brief Introduction to Algebraic Geometry . . . . . . . . . . . . . . . . . . . . . . 666
12.9 Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 672

13 Categories 675
13.1 Introduction to Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675
13.2 Functors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 682

A Appendices 689
A.1 The Algebra of Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . 689
A.2 Lists of Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 691

List of Notations 693

Bibliography 699

Index 703
Preface

What Is Abstract Algebra?


When a student of mathematics studies abstract algebra, he or she inevitably faces questions in the
vein of, “What makes the algebra abstract?” or “What is it good for?” or, more teasingly, “I fin-
ished algebra in high school; why are you still studying it as a math student?” Since undergraduate
mathematics curriculum designers nearly always include an algebra requirement, then these ques-
tions illustrate the general lack of awareness by the general public about advanced mathematics.
Consequently, we try to answer this question up front: “What is abstract algebra?”
Abstract algebra in its broadest sense describes a way of thinking about classes of mathematical
objects. In contrast to high school algebra in which one studies properties of the operations (+,
−, ×, and ÷) on real numbers, abstract algebra studies consequences of properties of operations
without specifying what types of number or object we work with. Hence, any theorem established
in the abstract context holds not only for real numbers but for every possible algebraic structure
that has operations with the stated properties. Furthermore, some profound theorems in algebra,
called classification theorems, enumerate all possible objects of a structure with a given property.
Such theorems often lead to profound results when algebra is applied to other areas of mathematics.
Classical (high school) algebra, including vectors and algorithms to solve equations or systems of
equations, finds applications in every area of natural and social sciences. Algebra has many applica-
tions in number theory, topology, geometry, analysis, and nearly every branch of mathematics. For
nearly a hundred years, scientists have noted applications of abstract algebra to advanced physics,
inorganic chemistry, and certain types of art. More recent applications include Boolean algebras in
digital electronics, the mathematics of information security, and coding theory in telecommunica-
tions.
The general modern mindset of studying an algebraic structure has found applications in many
areas: linguistics, machines, social networks, etc. Even in music, there exist some natural connections
to algebra. A connection between music theory, both classical and atonal, and group theory has been
studied recently. (See [34], [18], [13], or [24].) Recent attempts to codify atonal music borrowed from
group theory more generally. Though there is not necessarily a direct connection between modern
programming languages and abstract algebra, defining a class in object-oriented programming is
reminiscent of how an algebraist defines an algebraic structure, and instantiating an object is not
unlike an algebraist considering a specific object with that structure.
The fundamental importance of the notion of a structure in algebra has not escaped philosophers
of mathematics. Structuralism, a recent position in the philosophy of mathematics, holds to a
modified Platonist position that mathematical objects exist independent of human activity but that
they always exist in reference to a structure. (See [58].)

Organizing Principles
Algebraic Structure. Many abstract algebra textbooks focus on three specific algebraic struc-
tures: groups, rings, and fields. These particular structures have indeed played important roles
throughout mathematics and arguably deserve considerable attention. However, this book empha-
sizes the general concept of an algebraic structure as a unifying principle. Therefore, we present the
core topics of structures following a consistent order and deliberately introduce the reader to other
algebraic structures besides these standard three.
When studying a given algebraic structure, we follow this outline of topics:

• Definition of Structure—What are the axioms?

• Motivation—What value is there in minding this structure?

ix
x PREFACE

• Examples—What are some examples that demonstrate the scope and restrictions of a struc-
ture’s definition?

• General Properties—What can we prove about all objects with a given structure?

• Important Objects—Are there some objects in this structure that are singularly important?

• Description—How do we conveniently describe an object with the given structure or elements


in this structure?

• Subobjects—What can be said generally about the internal structure of a given object?

• Morphisms—What are the properties of functions that preserve the structure?

• Subclasses—What are some interesting subclasses of the structure that we can obtain by
imposing additional conditions?

• Quotient Objects—Under what conditions do equivalence classes behave nicely with respect
to the structure?

• Action Structures—Can we create some interesting/useful structures by considering how one


structure might act on another?

• Applications—What are some other places this structure is used effectively?

For convenience in the rest of the text, we will often refer to this list simply as “the Outline.” With
a given structure, some of these topics may be brief and others lead to considerable investigation.
Consequently, we do not give equal relative weight to these topics when studying various algebraic
structures.
Algebraists may dislike the expression “algebraic structure” as it is not a well-defined mathe-
matical term. Nonetheless, we use this term loosely until, in Chapter 13, we finally make the idea
of algebraic structure rigorous by introducing categories.
Applications. The second guiding principle of this book is application of algebra. Examples,
exercises, investigative projects, and whole sections, illustrate how abstract algebra is applied to
other branches of mathematics or to areas of science. In addition, this textbook offers a few sections
whose titles begin with “A Brief Introduction to...” These sections are just the trailhead for a whole
branch of algebra and are intended to whet the student’s appetite for further investigations and
study.

A Note to Instructors
Though covering groups, rings, and fields in detail, this textbook emphasizes the more general
concept of an algebraic structure while simultaneously keeping an eye on applications. The style
deliberately acknowledges the process of discovery, making this book suited for self-study.
This book is designed so that full coverage of all the sections will fill a two-semester sequence, with
the semester split occurring between Chapters 7 and 8. However, it can be used for a one-semester
introductory course in abstract algebra with many possible variations.
There are a variety of pathways to work through this textbook. Some colleges require a robust
discrete mathematics background or transition course before abstract algebra. In this case, Chap-
ters 1 and 2, which cover some basic set theory and a few motivating number theory concepts, might
serve as a review or could be skipped entirely. Some application sections or topic sections are not
essential for the development of later theory.
Each section was written with the intent to fill a one-hour lecture. Occasionally, some subsections
carry the label (Optional). These optional subsections are not needed for further theory outside
that section but offer additional perspective. In the dependency chart below, sections in rectangles
represent core theory and build on each other within the boxes. Sections in ellipses are application
or “brief introduction” sections and can generally be done in any order within the ellipse.
xi

1.1–1.4

2.1–2.3

3.1–3.8

4.1–4.5 3.9–3.11

8.1–8.5 5.1–4.7

8.6 9.1–9.5 6.1–6.5

6.6–6.7
7.1, 7.2, 7.5–7.7
10.1–10.7
7.3, 7.4
10.8–10.11 12.1–12.5 11.1–11.8

12.6

13.1–13.2

A Note to Students
From a student’s perspective, one of the biggest challenges to modern algebra is its abstraction.
A student taking a first course in modern algebra quickly discovers that most of the exercises are
proofs. Calculus, linear algebra, and differential equations can be taught from many perspectives but
often a preponderance of exercises simply require the student to carefully follow a certain prescribed
algorithm. In algebra, a student does not typically learn many algorithms to solve a specific range
of problems. Instead, he or she is expected to prove new results using the theorems presented in
the text. By doing exercises, the student becomes an active participant in the development of the
field. This common aspect of algebraic textbooks is very valuable because it trains the student in
the methods of mathematical investigation. In this textbook, however, for many exercises (though
not all) the student will find a similar example in the section that will illustrate a useful strategy.
The text includes many properties of the objects we study. However, this does not mean that
everything that is interesting or even useful for some further result is proved or even mentioned in the
text. If every interesting fact were proved in the text, this book would swell to unwieldy proportions
and regularly digress from a coherent presentation. Consequently, to get a full experience of the
material, the reader is encouraged to peruse the exercises in order to absorb many consequences of
the theory.

Computer Algebra Systems (CAS)


There exist a number of general computer algebra systems (CAS) (e.g., Maple and Mathematica)
that provide packages that offer commands that implement certain calculations that are useful in
algebra. There also exist a number of free CAS that are specifically designed for computations in
algebra (e.g., Magma and Macaulay2 ). It is impossible in such a textbook to offer a tutorial on each
one or give a complete description of the full functionality of the commands. However, occasionally a
section ends with a subsection that lists a few commands or libraries of commands that are relevant
to that section. Unless otherwise indicated, it is generally expected that the computations in the
exercises be done by hand and would not require the use of a computer algebra system. Whether
xii PREFACE

a specific command is listed or whether a library of commands is given, the reader should visit the
CAS’s help page for specific examples on how to use that given command or to see what functions
are in the library.

Investigative Projects
Another feature of this book are the investigative projects. In addition to regular exercises, at the
end of most chapters, there is a list of ideas for investigative projects. The idea of assigning projects
stems from a pedagogical experiment to challenge all students to write investigative or expository
mathematical papers in undergraduate classes. As a paper, the projects should be (1) Clear: Use
proper prose, follow the structure of a paper and provide proper references; (2) Correct: Proofs
and calculations must be accurate; (3) Complete: Address all the questions or questions one should
naturally address associated to the investigation; (4) Creative: Evidence creative problem-solving
or question-asking skills.
These project ideas stand as guidelines. A reader who tackles one is encouraged to add his or her
own investigations. While some questions associated to a project idea are precise and lead to well-
defined answers, other questions are left vague on purpose, might not have clear cut solutions, or lead
to open-ended problems. Some questions require proofs while others may benefit from technology:
calculator work, a computer program, or a computer algebra system.
The ideas in some projects are known and have been developed in articles, books, or online
resources. Consequently, if the investigative project is an assignment, then the student should
generally not consult online resources besides the ones allowed by the project description. Otherwise,
the project ideas may offer topics for further reading.

Habits of Notation
This book uses without explanation the logic quantifiers ∀, to be read as “for all,” ∃, to be read as
“there exists,” and ∃!, to be read as “there exists a unique.” We also regularly use =⇒ for logical
implication and ⇐⇒ logical equivalence. More precisely, if P (x, y, . . .) is a predicate with some
variables and Q(x, y, . . .) is another predicate using the same variables, then
 
P (x, y, . . .) =⇒ Q(x, y, . . .) means ∀x∀y . . . P (x, y, . . .) −→ Q(x, y, . . .)

and  
P (x, y, . . .) ⇐⇒ Q(x, y, . . .) means ∀x∀y . . . P (x, y, . . .) ←→ Q(x, y, . . .) .
As another habit of language, this textbook is careful to always and only use the expression “As-
sume [hypothesis]” as the beginning of a proof by contradiction. Like so, the reader can know ahead
of time that whenever she sees this expression, the hypothesis will eventually lead to a contradiction.

Acknowledgments
I must first thank the mathematics majors at Wheaton College (IL) who served for many years as the
test environment for many topics, exercises, and projects. I am indebted to Wheaton College (IL)
for the funding provided through the Aldeen Grant that contributed to portions of this textbook.
I especially thank the students who offered specific feedback on the draft versions of this book,
in particular Kelly McBride, Roland Hesse, and David Garringer. Joel Stapleton, Caleb DeMoss,
Daniel Bradley, and Jeffrey Burge deserve special gratitude for working on the solutions manual
to the textbook. I also must thank Justin Brown for test running the book and offering valuable
feedback. I also thank Africa Nazarene University for hosting my sabbatical, during which I wrote
a major portion of this textbook. Finally, I must absolutely thank the publishing and editing team
at Taylor & Francis for their work in making this project become a reality.
1. Set Theory

Set theory sits at the foundation of all of modern mathematics.


Just as Boolean logic provides a rigorous framework to how we think, set theory provides a
similarly precise framework for how we mentally gather instances of objects into classes. Notions
from set theory such as relations, equivalences, operations, functions, etc. give a logically rigorous
way to ascribe properties to objects or to think of how classes of objects are in relation to one
another or to consider how two objects might be combined to make another object. Consequently,
the terminology and notation of set theory provides a concise way to say many different things,
mathematical or otherwise, with exacting precision.
Nearly every algebraic structure begins with a set as its first piece of data. Hence, familiarity
with the fundamentals of set theory is essential for modern algebra. More importantly for the
perspective of this textbook, set theory also provides us a basic example of an algebraic structure.
The properties and concepts we choose to highlight in set theory focus on topics needed later, but
also serve as a template for our study of other algebraic structures.
Since set theory serves as a preliminary topic to algebra, this chapter only offers a quick intro-
duction to sets. For many readers this will be a review. Section 1.1 presents the notion of sets,
subsets, operations on subsets, and functions between sets. Section 1.2 begins by introducing the
Cartesian product of two sets and proceeds to discuss standard concepts related to sets that be-
come available with the notion of the Cartesian product: binary operations on sets and relations.
Section 1.3 discusses equivalence relations, equivalence classes, partitions, and quotient sets. This
chapter concludes with an introduction to partial orders in Section 1.4, where we present posets as
a first example of a nontrivial algebraic structure.

1.1
Sets and Functions
In mathematics, the concept of a set makes precise the notion of a collection of things. As broad as
this concept appears, it is foundational for modern mathematics.

1.1.1 – Sets

Definition 1.1.1
(1) A set is a collection of objects for which there is a clear rule to determine whether an
object is included or excluded.

(2) An object in a set is called an element of that set. We write x ∈ A to mean “the
element x is an element of the set A.” We write x ∈
/ A if x is not an element of A.

Alternate expressions for x ∈ A include “x is in A” or “A contains x.”


Some examples of sets include the registered voters in Illinois, or the man-made structures above
800 feet tall. Both of these examples have clear rules that allow someone with enough information to
clearly determine whether a given object is included in the collection or not. In natural languages,
we regularly use terms or expressions that we treat as sets but in fact do not have a clear rule.
For example, I cannot legitimately talk about the set of “my friends.” There are some people, for
whom, at a given point in time, I am hard pressed to say whether I consider them a friend or not.

1
2 CHAPTER 1. SET THEORY

In contrast, the people listed as “Friends” or “Contacts” on someone’s preferred social networking
site does form a set. As another nonexample of a set, consider the collection of all chairs. Whether
this is a set is debatable. Indeed, by some artistic or functional failure, a piece of furniture may not
be comfortable enough to sit on. Furthermore, should we consider a rock beside a hiking trail as a
chair if we happen to sit on it?
Some discussion in logic is appropriate here. Set theory based on this idea of a “clear rule” is
called naive set theory [33]. The idea of a clear rule in set theory is as precise as Boolean logic,
which calls a proposition any statement for which there a clear rule to decide whether it is true or
false. However, like Russell’s Paradox in logic (e.g., consider the truth value of the statement “This
sentence is false.”), naive set theory ultimately can lead to contradictions. For example, if S is the
set of all sets that do not contain themselves, does S contain itself? The Zermelo-Fraenkel axioms
of set theory, denoted ZF, offer more technical foundations and avoid these contradictions. (See [47]
for a presentation of set theory with ZF. See [25] for a philosophical discussion of ZF axioms.)
The most widely utilized form of set theory adds one axiom to the standard ZF, the so-called
Axiom of Choice, and the resulting set of axioms is denoted by ZFC. Occasionally, certain theorems
emphasize when their proofs directly utilize the Axiom of Choice. The reason for this is primarily
historical. In the context of ZF, the Axiom of Choice implies many statements that seem down-
right obvious and others that feel counterintuitive. Consequently, there is a habit in mathematical
literature to make clear when a certain result (and all results that use it as a hypothesis) rely on
the Axiom of Choice.
A thorough treatment of axiomatic set theory would detract from an introduction to abstract
algebra. Naive set theory will suffice for our purposes. Whenever we need a technical aspect of set
theory, we provide appropriate references. The interested reader is encouraged to consult [21, 39, 62]
for a deeper treatment of set theory.
Some sets occur so frequently in mathematics that they have their own standard notation. Here
are a few:

• Standard sets of numbers:

– N is the set of natural numbers (includes 0).


– Z is the set of integers.
– Q is the set of rational numbers.
– R is the set of real numbers.
– C is the set of complex numbers.

• Sometimes we use modifiers to the above sets. For example, R+ denotes the set of nonnegative
reals and R<0 denotes the set negative (strictly) reals.

• A modifier we use consistently in this book is N∗ , Z∗ , etc. to stand for the given number set
excluding 0. In particular, N∗ denotes the set of positive integers.

• ∅, called the empty set, is the set that contains no elements.

• Intervals of real numbers:

– [a, b] denotes the closed interval of real numbers between a and b inclusive.
– [a, b) is the interval or reals between a and b, including a but not b.
– [a, ∞) is the interval of all real numbers greater than or equal to a.
– Other self-explanatory combinations are possible such as (a, b); (a, b]; (a, ∞); (−∞, b];
and (−∞, b).

There are two common notations for defining sets. Both of them explicitly provide the clear rule
as to whether an object is in or out. However, in either case, the parentheses { marks the beginning
of the defining rule and } marks the end.
1.1. SETS AND FUNCTIONS 3

(1) List the elements. For example, writing S = {1, 3, 7} means that the set S is comprised of the
three integers 1, 3, and 7. It is important to note that in this notation, order does not matter
and we do not list numbers more than once. We only care about whether a certain object is in
or not; we don’t care about order or repetitions. (It is important to note that what we write in
the list is merely a signifier that points to the actual object. Hence, the symbol 1 is pointing
to the mathematical object of “one.” Similarly, I may write F = {AL, CL, SL} as a set of
three elements that describes my family where the symbols AL, CL, and SL are pointers to
the actual objects in the set, namely my daughter, my wife, and myself.)

(2) Explicitly state a defining property. For example,

{x | x is a rational number with x2 < 2}

means the set of all x such that x is a rational number such that x2 < 2. Since we already
have a set label for the rational numbers, we will usually rewrite this more concisely as

{x ∈ Q | x2 < 2}

and read it as, “the set of rational numbers x such that x2 < 2.” An alternate notation for
this construction is {x ∈ Q : x2 < 2}.

Two sets A and B are considered equal when x ∈ A ⇐⇒ x ∈ B, or in other words, they have
exactly the same elements. We write A = B to denote set equality.

1.1.2 – Subsets and Operations


When working with sets, it is common to work within a context set and consider sets within this
context. For example, if a given problem or discussion only involves the set of living people, then
we will only be interested in considering sets that exist within this context set.

Definition 1.1.2
A set A is called a subset of a set S if x ∈ A =⇒ x ∈ S. In other words, every element of
A is an element of S. We write A ⊆ S.

The symbol ⊆ should remind the reader of the symbol ≤ on the real numbers. This similarity
of notation might inspire us to assume that A ⊂ B would, like the strict inequality symbol <, mean
A ⊆ B and A 6= B. Unfortunately, by a fluke of historical inconsistency in notation, some authors
do use the ⊂ symbol to mean a strict subset, while others use it synonymously with ⊆. To remove
confusion, we use the symbol A ( B to mean A ⊂ B and A 6= B. The symbol A * B means that A
is not a subset of B.

Example 1.1.3. Let C 0 ([2, 5]) denote the set of continuous real-valued functions on the interval
[2, 5] and let C 1 ([2, 5]) denote the set of differentiable functions whose derivative is continuous on
the interval [2, 5]. The statement that

C 1 ([2, 5]) ⊆ C 0 ([2, 5])

follows from the nontrivial result in analysis that if a function is differentiable over a closed interval,
it is continuous over that interval. 4

There are a few basic operations on subsets of a given set S. In the following list, we define
operations on subsets A and B of S and provide corresponding Venn diagrams, in which the shaded
portion illustrates the result of the operation.

• The union of A and B is A ∪ B = {x ∈ S | x ∈ A or x ∈ B}.


4 CHAPTER 1. SET THEORY

A B

• The intersection of A and B is A ∩ B = {x ∈ S | x ∈ A and x ∈ B}.

A B

• The complement of A is A = {x ∈ S | x ∈
/ A}.

• The set difference of B from A is A − B = {x ∈ S | x ∈ A and x ∈


/ B}.

A B

• The symmetric difference of A and B is A4B = {x ∈ S | x ∈ A or x ∈ B but x ∈


/ A ∩ B}.
1.1. SETS AND FUNCTIONS 5

A B

Example 1.1.4. Let U = {1, 2, . . . , 10} and consider the subsets A = {1, 3, 6, 7}, B = {1, 5, 6, 8, 9},
and C = {2, 3, 4, 5, 6}. We calculate the following examples of operations.
(1) A ∩ B = {1, 6}.
(2) B − (A ∪ C) = B − {1, 2, 3, 4, 5, 6, 7} = {8, 9}.
(3) B ∪ C = {1, 2, 3, 4, 5, 6, 8, 9} = {7, 10}.
(4) (A ∩ B) ∩ C = {1, 6} ∩ C = {6}.
(5) A ∩ (B ∩ C) = A ∩ {5, 6} = {6}. 4

Example 1.1.5. Let A = {x ∈ Z | 4 divides x} and let B = {x ∈ Z | 6 divides x}. The set A ∩ B
consists of all integers that are divisible by 4 and by 6. As we will be reminded in Section 2.1.4, an
integer is divisible by 4 and 6 if and only if it is divisible by the least common multiple of 4 and 6,
written lcm(4, 6) = 12. Hence,

A ∩ B = {x ∈ Z | 12 divides x}

On the other hand, A ∪ B consists of all integers that are divisible by 4 or divisible by 6. 4

Set operations offer concise ways to describe many common properties of sets. The following
definition illustrates this.

Definition 1.1.6
Let A and B be subsets of a set S. We say that A and B are disjoint if A ∩ B = ∅.

The set difference provides a standard notation for all elements of a set except for a few specified
elements. For example, if S is the set of all real numbers except 1 and 5, we write this succinctly as
S = R − {1, 5}.

Proposition 1.1.7
For sets A and B, A = B if and only if A ⊆ B and B ⊆ A.

Proof. If A = B then A is a subset of B and B is a subset of A. Conversely, if A ⊆ B and B ⊆ A


then x ∈ A =⇒ x ∈ B and x ∈ B =⇒ x ∈ A. Hence, x ∈ A ⇐⇒ x ∈ B, which means that A and B
have exactly the same elements. 

There are many properties of how set operations relate to each other. We will study these
properties in detail in Section 10.1, but we will not need them all for our purposes prior to then.
However, we present two comments about properties of set operations.
The associativity law for union (respectively intersection) states that for subsets A, B, C ⊆ S,
the following holds

(A ∪ B) ∪ C = A ∪ (B ∪ C) and respectively (A ∩ B) ∩ C = A ∩ (B ∩ C).


6 CHAPTER 1. SET THEORY

Therefore, we may commonly write A ∪ B ∪ C, instead of either (A ∪ B) ∪ C or A ∪ (B ∪ C), and


similarly for intersection. More generally, consider a set I and subsets Ai ⊆ S for each i ∈ I. We
say that {Ai }i∈I is a collection of subsets of a set S, indexed by I. Then we define the generalized
unions and intersections as
def
[
Ai = {s ∈ S | ∃i ∈ I, s ∈ Ai } (1.1)
i∈I
def
\
Ai = {s ∈ S | ∀i ∈ I, s ∈ Ai }. (1.2)
i∈I

An interesting example from analysis of a generalized union is the fact that


[ 1 
, 1 = (0, 1].

i
i∈N

1
 1 all fractions n where n is a positive integer are positive, then 0 is not an element of any interval
Since
n , 1 . We conclude that the left-hand side is a subset of the  set
 on the right-hand side. However,
for any positive real number ε with 0 < ε ≤ 1, setting n = 1ε (i.e., n is the least integer greater
than or equal to 1ε ), we have n ≥ 1ε so  
1
ε∈ ,1 .
n
Thus, the set on the right-hand side is a subset of the set on the left-hand side, so the two sets are
equal.
As a second comment, we illustrate a basic abstract proof concerning relations between set
operations with the following proposition.

Proposition 1.1.8
Let A and B be subsets of a context set S. Then A − B = A ∩ B.

Proof. We remark that


x ∈ A − B ⇐⇒ x ∈ A and x ∈
/B by definition of set difference
⇐⇒ x ∈ A and x ∈ B by definition of set complement
⇐⇒ x ∈ A ∩ B by definition of intersection.

We conclude that A − B = A ∩ B. 

Definition 1.1.9
If S is a set, the power set of S, denoted by P(S) is the set of all subsets of S.

Note that A ∈ P(S) is equivalent to writing A ⊆ S. Furthermore, taking the power set of a set
is one way to obtain a new set for a previous one.
Example 1.1.10. Let S = {1, 2, 3}. Then the power set of S is
P(S) = {∅, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}} .
Note that P(S) has eight elements. 4
The terminology of “power” set comes from the following proposition.

Proposition 1.1.11
Let n be a nonnegative integer. If S is a set with n elements, then P(S) has 2n elements.
1.1. SETS AND FUNCTIONS 7

Proof. First consider the case n = 0. This means that S = ∅. Then P(S) = {∅} and hence contains
1 = 20 element. Thus, the proposition holds for n = 0.
Write S as S = {s1 , s2 , . . . , sn } for some positive integer n. For every subset, we can ask n
independent questions: whether s1 ∈ A, whether s2 ∈ A, and so forth for all elements of S. Each
question has two possible answers, each of which is independent of the others. Thus, there are
ntimes
z }| {
2 × 2 × · · · × 2 = 2n

subsets of S. 

Proposition 1.1.12
Let n be a nonnegative integer and let S be a set with n elements. The number of subsets
with k elements is  
n n!
= .
k k! (n − k)!

Proof. Call nk the number of subsets of size k in a set of size n.




To count the number of subsets of size k, we must count how many ways we can select (unordered)
k elements from S. This unordered selection corresponds uniquely to a subset. If we select k elements
in order without repetition, we have
n!
n × (n − 1) × · · · × (n − k + 1) =
(n − k)!
choices of how to do this. On the other hand, with a given set of k elements, there are k! ways of
ordering them. Hence,  
n n!
k! =
k (n − k)!
and the result follows. 

1.1.3 – Functions

Definition 1.1.13
Let A and B be two sets. A function f from A to B, written f : A → B, is an association
that to each element a ∈ A, associates exactly one element b ∈ B. We write b = f (a) if b
is the associate of a via f . The set A is called the domain of f and the set B is called the
codomain.

Functions are ubiquitous in and outside of mathematics. Functions model the mental habit of
uniquely associating various quantities or options to objects in a set. For example, if V is the set
of motor vehicles registered in the United States, then the concept of mileage (at a given date) is
a function f : V → N. At a given point in time a car has a unique number associated to it that
describes mileage.
As another example, if P is the set of people (living or who have passed away), the concept of
biological mother can be represented as a function m : P → P such that m(a) = b means that the
person b is a’s biological mother. The concept of brother is not a function because some people may
have more than one brother.
For a function f : A → B, it is not uncommon to say that f maps the element a to b. Also, the
function is sometimes called a mapping, or more briefly, map from A to B.
Starting in precalculus, we study functions of the form f : I → R, where I is some interval of
real numbers. Sequences of real numbers are also functions f : N → R, though by a historical habit,
we often write terms of a sequence as fn instead of f (n).
8 CHAPTER 1. SET THEORY

Definition 1.1.14
Let A, B, and C be sets, and let f : A → B and g : B → C be two functions. The
composition of g with f is the function g ◦ f : A → C defined by for all x ∈ A,

(g ◦ f )(x) = g(f (x)).

Function composition arises in a significant manner in linear algebra when considering the com-
position of two linear transformations. Let S : Rm → Rn and T : Rn → Rp be linear transformations
with A and B their respective associated matrices with respect to the standard bases. It is not hard
to show that the composition T ◦ S : Rm → Rp is a linear transformation. Furthermore, the matrix
multiplication is defined as it is precisely so that the matrix product BA is the associated matrix to
T ◦ S with respect to the standard basis.
The following proposition establishes an identity about iterated composition of functions. Though
simple, it undergirds desired algebraic properties for many later situations.

Proposition 1.1.15
Let f : A → B, g : B → C, and h : C → D be functions. Then

h ◦ (g ◦ f ) = (h ◦ g) ◦ f.

Proof. Let x be an arbitrary element in A. Then

(h ◦ (g ◦ f )) (x) = h ((g ◦ f )(x))


= h (g (f (x)))
= (h ◦ g) (f (x))
= ((h ◦ g) ◦ f ) (x).

Since the functions are equal on all elements of A, the functions are equal. 

Definition 1.1.16
We say that a function f : A → B is
(1) injective (one-to-one) if f (a1 ) = f (a2 ) =⇒ a1 = a2 .

(2) surjective (onto) if for all b ∈ B, there exists a ∈ A such that f (a) = b.
(3) bijective (one-to-one and onto), if it is both.

The contrapositive of the definition offers an alternative way to understand injectivity, namely
that a1 6= a2 =⇒ f (a1 ) 6= f (a2 ).
Example 1.1.17. As an example, we prove whether the following functions are injective or surjec-
tive.
(1) Consider f : R − {0} → R with f (x) = 1 + x1 . For injectivity, we solve

1 1 1 1 x1 x2 x1 x2
f (x1 ) = f (x2 ) =⇒ 1 + =1+ =⇒ = =⇒ = =⇒ x2 = x1 .
x1 x2 x1 x2 x1 x2
To prove surjectivity, given any real y, we need to solve f (x) = y for x. We have
1 1
f (x) = y =⇒ 1 + = y =⇒ = y − 1 =⇒ x(y − 1) = 1.
x x
This last equation has no solutions in x if y = 1. Hence, f is not surjective.
1.1. SETS AND FUNCTIONS 9

(2) Consider g : R → R with g(x) = x2 + 2x. For injectivity, we solve

g(x1 ) = g(x2 ) =⇒ x21 + 2x1 = x22 + 2x2 =⇒ x21 − x22 + 2x1 − 2x2 = 0
=⇒ (x1 − x2 )(x1 + x2 ) + 2(x1 − x2 ) = 0 =⇒ (x1 − x2 )(x1 + x2 + 2) = 0
=⇒ x1 = x2 or x1 = −x2 − 2.

Since g(x1 ) = g(x2 ) does not imply x1 = x2 , then the function is not injective. Of course, to
disprove a universal statement, it suffices to find a counterexample. Remarking that g(1) =
3 = g(−3) shows that g in not injective. The function g is surjective if for all y there exists x
with g(x) = y. However,

x2 + 2x = y =⇒ (x + 1)2 − 1 = y =⇒ (x + 1)2 = y + 1.

This has no solutions in x if y < −1. Hence, g is not surjective.

(3) Consider h : R → R with h(x) = x3 . For injectivity, we solve

h(x1 ) = h(x2 ) =⇒ x31 − x32 = 0 =⇒ (x1 − x2 )(x21 + x1 x2 + x22 ) = 0.

It is not hard to see that


1 1 1
x21 + x1 x2 + x22 = (x1 + x2 )2 + x21 + x22 .
2 2 2
Hence, for all reals, x21 + x1 x2 + x22 ≥ 0 with equality holding if and only if x1 = x2 = 0. Thus,
we have proven that h(x1 ) = h(x2 ) implies that x1 = x2 . This shows injectivity. To prove
surjectivity, we use a theorem from calculus to prove that for all y0 , there exists an x0 with
h(x0 ) = y0 . Note that f is continuous over R and

lim h(x) = ∞ and lim h(x) = −∞.


x→∞ x→−∞

By the definition of infinite limits, there exists x1 and x2 such that h(x1 ) < y and h(x2 ) > y.
By the Intermediate Value Theorem, there exists a real number x0 with x1 < x0 < x2 and
h(x0 ) = y0 . This proves surjectivity. Since h is both injective and surjective, it is bijective. 4

The last case above gives an example of a bijection. When a function f : A → B is a bijection,
then for every b ∈ B, there is an a ∈ A that is uniquely associated to it via the relationship b = f (a).
Hence, the association from B to A is also a function.

Definition 1.1.18
Let f : A → B be a bijective function between two sets. The inverse function, denoted
f −1 , is the function f −1 : B → A such that f −1 (b) = a if and only if f (a) = b.

The bijection
√ f (x) = x3 in Example 1.1.17(3) has the inverse function f −1 : R → R with
f −1 (x) = 3 x. It is important to keep in contrast the function g(x) = x2 . The function g is a
function but not a bijection R → R. When we restrict the domain and codomain to h : R≥0 → R √
≥0

with h(x) = x2 , then h is a bijection with inverse function h−1 : R≥0 → R≥0 given by h−1 (x) = x.
It is not uncommon to consider how a function behaves on a subset of elements in the domain.
If f : A → B is a function and S ⊆ A, we regularly use the following shorthand notation:
def
f (S) = {f (s) | s ∈ S} = {b ∈ B | ∃s ∈ S, f (s) = b}. (1.3)

This is the image set of S under f . We also define the restriction of f to S, and denote it by f |S ,
the function f |S : S → B such that f |S (x) = f (x) for all x ∈ S.
In the special case when S = A, the subset f (A) of B is called the range of f . Note that a
function is surjective if and only if its range is equal to its codomain.
10 CHAPTER 1. SET THEORY

If T ⊆ B, we also use the shorthand notation

def
f −1 (T ) = {a ∈ A | f (a) ∈ T }. (1.4)

This set is called the pre-image of T by f . It is essential to note that using this latter notation does
not presume that f is bijective; the definition in (1.4) is a matter of notation. If T = {b}, consisting
of a single element b ∈ B, then f −1 ({b}) is called the fiber of b.
For example, if f : R → R is defined by f (x) = sin x, then the fiber of 2 is f −1 ({2}) = ∅, and
the fiber of 21 is
nπ o  5π 
−1
f ({1/2}) = + 2πk | k ∈ Z ∪ + 2πk | k ∈ Z .
6 6

Remark 1.1.19. There are two common notations, Fun(A, B) and B A , for the set of all functions
from A to B. Exercise 1.1.31 provides a justification for the latter notation. 4

1.1.4 – Cardinality
We conclude this section with a brief discussion on a concept of size for sets. Interestingly enough,
properties of functions between sets are the key.

Definition 1.1.20
We say that two sets A and B have the same cardinality if there exists a bijection f : A → B.
We write |A| = |B|. If there does not exist a bijection between A and B, then we write
|A| =6 |B|. If there exists an injection f : A → B, then we write |A| ≤ |B|. If |A| ≤ |B| and
|A| =
6 |B|, then we write |A| < |B|.

Definition 1.1.21
A set A is called finite if there exists a bijection from A to {1, 2, . . . , n}, where n is a positive
integer. In this case, we write in shorthand |A| = n. If a set A is not finite, it is called
infinite.

Definition 1.1.22
An infinite set A is called countable if there exists a bijection f : A → N∗ . In this case,
we denote its cardinality by ℵ0 and write |A| = ℵ0 . If a set is not countable, it is called
uncountable.

The definition of the term “countable” models the mental process of listing all the elements
in the set A and labeling them as “first” (1), “second” (2), “third” (3), and so on. The function
f : N → N∗ defined by f (n) = n + 1 sets up a bijection between N and N∗ so N is countable. Often,
to show that some other sets are countable requires a little more creativity.

Proposition 1.1.23
The set of integers Z is countable.

1
 
Proof. Consider the function f : Z → N defined by f (n) = 2n + 2 . It is not hard to show that
(
  m
1 2 if m is even
2n + ⇐⇒ n = m+1

2 − 2 if m is odd.

This establishes that f is a bijection. 


1.1. SETS AND FUNCTIONS 11

In symbols of =, ≤, and < are obviously reminiscent of inequalities over the integers. However,
we need to be careful not to use the analogy of these symbols and assume that something is true
simply by analogy. For example, the fact that |A| ≤ |B| and |B| ≤ |A| implies that |A| = |B| is the
Schröder-Bernstein Theorem [21, 13.10]. Consider also the Trichotomy Law , which states that for
any two sets A and B, then exactly one of the following statements is true:

|A| < |B|, |A| = |B|, |A| > |B|.

The Trichotomy Law is equivalent to the Axiom of Choice. (See [55, p. 9].)
We mention without proof a few interesting results about cardinalities of sets. Proposition 1.1.23
shows that |N| = |Z|. It is not difficult to show that |N>0 | = |Q>0 | and from there conclude that
Q is countable. On the other hand, |N| < |R|. This proof follows by showing that R is in bijection
with P(N) and from Cantor’s Theorem.

Theorem 1.1.24 (Cantor)


Let X be a set. Then |X| < |P(X)|.

Proof. See [21, 13.7]. 

As stated in the preface, this book emphasizes the notion of an algebraic structure. We point
out that sets along with functions between sets provide a first example of an algebraic structure.
We might consider the structure of sets as the simplest possible algebraic structure. Later alge-
braic structures involve sets with additional properties, operations, and relations on them. These
structures will come with their own interesting applications and properties.
The reader may notice that we have already covered some of the topics of interest outlined in the
paragraph on Organizing Principles in the preface. The remaining sections of this chapter illustrate
some interesting and essential topics in the context of the algebraic structure of sets.

Exercises for Section 1.1


1. Let U = {n ∈ N | n ≤ 10} and consider the subsets A = {1, 3, 5, 7, 9}, B = {1, 2, 3, 4, 5}, and C =
{1, 2, 5, 7, 8}. Calculate the following operations.
(a) A ∩ B
(b) (B ∪ C) − A
(c) (A ∩ B) ∩ (A ∩ C) ∩ (B ∩ C)
(d) ((A − B) − C) ∩ (A − (B − C))
2. Let U = {a, b, c, d, e, f, g} and consider the subsets A = {a, b, d}, B = {b, c, e}, and C = {c, d, f }.
Calculate the following operations.
(a) C ∩ (A ∪ B)
(b) (A ∪ C) − B
(c) (A ∪ B ∪ C) − (A ∩ B ∩ C)
(d) (A − B) ∪ (B − C)
3. As subsets of the reals, describe the differences between the sets {3, 5}, [3, 5], and (3, 5).
4. Prove that the following are true for all sets A and B.
(a) A ∩ B ⊆ A
(b) A ⊆ A ∪ B
5. Let A and B be subsets of a set S.
(a) Prove that A ⊆ B if and only if P(A) ⊆ P(B).
(b) Prove that P(A ∩ B) = P(A) ∩ P(B).
12 CHAPTER 1. SET THEORY

(c) Show that P(A ∪ B) = P(A) ∪ P(B) if and only if A ⊆ B or B ⊆ A.


6. Give the list description of P({1, 2, 3, 4}).
7. Give the list description of {{a1 , a2 , . . . , ak } ∈ P({1, 2, 3, 4, 5}) a1 + a2 + · · · + ak = 8}.
8. Let A, B, and C be subsets of a set S.
(a) Prove that (A − B) − C = (A − C) − (B − C).
(b) Find and prove a similar formula for A − (B − C).
9. Let A, B, and C be subsets of a set S.
(a) Prove that A4B = ∅ if and only if A = B.
(b) Prove that A ∩ (B4C) = (A ∩ B)4(A ∩ C).
10. Let S be a set and let {Ai }i∈I be a collection of subsets of S. Prove the following.
[ \
(a) Ai = Ai
i∈I i∈I
\ [
(b) Ai = Ai
i∈I i∈I

11. Let P be the parabola in the plane whose equation is y = x2 . Let {A


Sq }q∈P be the collection of subsets
of R2 where Aq is the tangent line to P at q. Determine with proof q∈P Aq . (The notation R2 stands
for pairs of real numbers and, via Cartesian coordinates, represents the Euclidean plane.)
3 3
12. Let {A
Sk }k∈N be theTcollection of subsets of R 3 such that Ak = {(x, y, z) ∈ R | z ≥ ky}. Determine
both k∈N Ak and k∈N Ak . (The notation R stands for triples of real numbers and, via Cartesian
coordinates, represents Euclidean 3-space.)
13. In geometry of the plane, a subset S of the plane is called convex if for all p, q ∈ S the line segment
pq connecting p and q is a subset of S. Prove that the intersection of two convex sets is a convex set.
14. Inclusion-Exclusion Principle. Let A, B, and C be finite subsets of a set S.
(a) Prove that |A ∪ B| = |A| + |B| − |A ∩ B|.
(b) (*) Prove that |A ∪ B ∪ C| = |A| + |B| + |C| − |A ∩ B| − |A ∩ C| − |B ∩ C| + |A ∩ B ∩ C|.
15. Let U be a set and A, B ⊆ U .
(a) Show by any means that A ∩ B = A ∪ B.
(b) Show by any means that A ∪ B = A ∩ B.
16. Let n be a positive integer. Describe an algorithm (a finite list of well-defined instructions to accomplish
a task) to list all the subsets of {1, 2, 3, . . . , n}.
17. For each of these real-valued functions determine the largest possible domain D as a subset of R and
then prove whether f : D → R is an injection, surjection, both, or neither.
(a) f (x) = −3x + 4
(b) f (x) = −3x2 + 7
(c) f (x) = (x + 1)/(x + 2)
(d) f (x) = x5 + 1
18. Given an explicit example of a function f : Z → Z that is
(a) bijective;
(b) surjective but not injective;
(c) injective but not surjective;
(d) neither injective nor surjective.
19. Given an explicit example of a function f : N → N that is
(a) bijective;
(b) surjective but not injective;
1.1. SETS AND FUNCTIONS 13

(c) injective but not surjective;


(d) neither injective nor surjective.
20. Let f : A → B and g : B → C be functions. Prove that if f and g are bijective, then g ◦ f is bijective
and
(g ◦ f )−1 = f −1 ◦ g −1 .
21. Suppose that f and g are functions and that f ◦ g is injective.
(a) Prove that g is injective.
(b) Does it also follow that f is injective? Justify your answer (with a proof or counterexample).
22. Suppose that f and g are functions and that f ◦ g is surjective.
(a) Prove that f is surjective.
(b) Does it also follow that g is surjective? Justify your answer (with a proof or counterexample).
23. Restate the definition of (a) injective and (b) surjective as applied to a function f : A → B in terms
of properties of the sets f −1 ({b}).
24. For the following functions f , find the pre-image (or fiber) f −1 (T ) of the given set T in the codomain.

(a) f : R → R with f (x) = sin x and T = { 3/2}.
(b) f : R → R with f (x) = x2 − 2 and T = [1, 2].
(c) f : R → R with f (x) = x3 − 2x and T = [−1, 0].
25. Let f : A −→ B be a function from the set A to the set B. Let S and T be subsets of the domain A.
Recall the notation in (1.3).
(a) Show that f (S ∪ T ) = f (S) ∪ f (T ).
(b) Show that f (S ∩ T ) ⊆ f (S) ∩ f (T ).
(c) Find an example of a function f : A → B and subsets S and T in A such that f (S ∩ T ) 6=
f (S) ∩ f (T ).
26. Let f : A −→ B be a function from the set A to the set B. Let V and W be subsets of the codomain
B. Recall the notation in (1.4). Show the following.
(a) f −1 (V ∪ W ) = f −1 (V ) ∪ f −1 (W )
(b) f −1 (V ∩ W ) = f −1 (V ) ∩ f −1 (W )
27. Let S and T be finite sets with |S| = |T |. Prove that a function f : S → T is injective if and only if
it is surjective.
28. Let S be a set. For each subset A ⊆ S we define the characteristic function of A as the function
χA : S → {0, 1} such that (
1 if s ∈ A
χA (s) =
0 if s ∈
/ A.
Prove the following.
(a) The association A 7→ χA is a bijection between P(S) and the set of functions from S to {0, 1}.
(b) χA∩B (s) = χA (s) · χB (s) for all s ∈ S.
(c) χA∪B (s) = χA (s) + χB (s) − χA (s) · χB (s) for all s ∈ S.
(d) χA (s) = 1 − χA (s) for all s ∈ S.
(e) χA−B (s) = χA (s)(1 − χB (s)) for all s ∈ S.
29. Provide all the details in the proof of Proposition 1.1.23.
30. Let A, B, and C be sets. Prove that if |A| ≤ |B| and |B| ≤ |C|, then |A| ≤ |C|.
31. Let A and B be finite sets. Prove that the number of distinct functions A → B is |B||A| .
14 CHAPTER 1. SET THEORY

1.2
The Cartesian Product; Operations; Relations
In Section 1.1, when we discussed the power set of a set, we mentioned that this was one way to
create a new set from a previous one. The Cartesian product is another way to create new sets from
old ones. More importantly, the Cartesian product of two sets gives a rigorous model to the mental
notion of pairing or ordering elements from sets.

1.2.1 – Cartesian Product

Definition 1.2.1
Let A and B be sets. The Cartesian product of A and B, denoted A × B is the set that
consists of ordered pairs (a, b), where a ∈ A and b ∈ B. Hence,
def
A × B = {(a, b) | a ∈ A and b ∈ B}.

More generally, if A1 , A2 , . . . , An are n sets, then we define the Cartesian product A1 ×


A2 × · · · × An as the set of ordered n-tuples (a1 , a2 , . . . , an ) with ai ∈ Ai for i = 1, 2, . . . , n.
More specifically,
def
A1 × A2 × · · · × An = {(a1 , a2 , . . . , an ) | ai ∈ Ai , for i = 1, 2, . . . , n}.

If we take the Cartesian product of the same set A, we write


ntimes
n def
z }| {
A = A × A × ··· × A.

The Cartesian coordinate system motivates the concept of the Cartesian product. The notation R2
stands for ordered pairs of real numbers, which we regularly use to locate points in the Euclidean
plane (in reference to a set of axes). Similarly, R3 is the set of triples of real numbers and represents
Euclidean 3-space.
Example 1.2.2. Let W be the set of students registered at Wheaton College right now. Let C be
the set of classes offered (at Wheaton College, right now). The set W × C represents all possible
pairings of registered students with classes offered. 4

Example 1.2.3. Let A = {1, 2, 3} and let B = {e, f }. We write out the sets A × B, B × B, and
B × A explicitly:

A × B = {(1, e), (1, f ), (2, e), (2, f ), (3, e), (3, f )} ,


B × B = {(e, e), (e, f ), (f, e), (f, f )} ,
B × A = {(e, 1), (e, 2), (e, 3), (f, 1), (f, 2), (f, 3)} .

Note that since A 6= B, it is not true that A × B and B × A are equal. 4

The terminology of “product” in the name “Cartesian product” comes from the following fact.

Proposition 1.2.4
Let A and B be finite sets. Then A × B is also finite and

|A × B| = |A| · |B|.
1.2. THE CARTESIAN PRODUCT; OPERATIONS; RELATIONS 15

Proof. For any element a0 ∈ A, there exist |B| pairs in A × B of the form (a0 , b) with b ∈ B. There
are |A| distinct elements in A. Furthermore, (a1 , b1 ) = (a2 , b2 ) if and only if a1 = a2 and b1 = b2 .
Hence, all |A| · |B| pairs are distinct. 

The Cartesian product of two sets or a finite collection of sets is of considerable importance
through set theory. Here below, we introduce a few concepts arising from the Cartesian product
that are essential for the rest of this book.

1.2.2 – Binary Operations

Definition 1.2.5
A binary operation on a set S is a function ? : S × S → S. We typically write a ? b instead
of ?(a, b) for the output of the binary operation on the pair (a, b) with a, b ∈ S.

The concept of a binary operation models a process by which any two objects in a set may be
combined to produce a specific other element in the set. This concept is ubiquitous in mathematics.
Some standard examples include +, −, and × on Z or R. Note that the division process ÷ is not a
binary operator on R because for example 2 ÷ 0 is not well-defined, while it is a binary operator on
R − {0}. Also ÷ is not a binary operator on Z − {0} because for example, 5 ÷ 2 is a rational number
but not again an element of Z − {0}.
As yet another nonexample, consider the dot product · of two vectors in R3 . This is not a binary
operation because it is a function R3 × R3 → R and the codomain is not again R3 .
As we will see throughout the book, many algebraic structures are defined as sets equipped with
some binary operations that satisfy certain properties. We list a few of the typical properties that
we consider later.

Definition 1.2.6
Let S be a set equipped with a binary operation ?. We say that the binary operation is

(1) called associative if ∀a, b, c ∈ S, (a ? b) ? c = a ? (b ? c);


(2) called commutative if ∀a, b ∈ S, a ? b = b ? a;
(3) said to have an identity if ∃e ∈ S ∀a ∈ S, a ? e = e ? a = a (e is called an identity
element);

(4) called idempotent if ∀a ∈ S, a ? a = a.

It is important to note the order of the quantifiers in definition (3) for the
√ identity element.
For example, let ? be the operation of geometric average on R>0 , i.e., a ? b = ab. Note that this
operation is commutative and idempotent. The operation of geometric average does not have an
identity because if we attempted to solve for b in a ? b = a, we would obtain b = a1 . However, in
order for the geometric average to have an identity, this element b could not depend on a.

Proposition 1.2.7
Let S be a set equipped with a binary operation ?. If ? has an identity element then ? has
a unique identity element.

Proof. Suppose that there exist two identity elements e1 and e2 in S. Since e1 is an identity element,
then e1 ? e2 = e2 . Since e2 is an identity element, then e1 ? e2 = e1 Thus, e1 = e2 . There do not
exist two distinct identity elements so S has a unique identity element. 

Because of this proposition, we no longer say “an identity element” but “the identity element.”
16 CHAPTER 1. SET THEORY

Definition 1.2.8
Let S be a set equipped with a binary operation ? that has an identity e. The operation is
said to have inverses if ∀a ∈ S ∃b ∈ S, a ? b = b ? a = e.

For example, in R the operation + has inverses because for all a ∈ R, we have a + (−a) =
(−a) + a = 0. So (−a) is the (additive) inverse of a. In R∗ , the operation × also has inverses: For
all a ∈ R∗ , we have a × a1 = a1 × a = 1. So a1 is the (multiplicative) inverse of a.

Definition 1.2.9
Let S be a set equipped with two binary operations ? and ∗. We say that

(1) ? is left-distributive over ∗ if ∀a, b, c ∈ S, a ? (b ∗ c) = (a ? b) ∗ (a ? c);


(2) ? is right-distributive over ∗ if ∀a, b, c ∈ S, (b ∗ c) ? a = (b ? a) ∗ (c ? a);
(3) ? is distributive over ∗ if ? is both left-distributive and right-distributive over ∗.

The quintessential example for distributivity is that, as binary operations on R, × is distributive


over +. However, many other pairs of operations share this property.
Example 1.2.10. Consider a set S and the binary operation of intersection ∩ on the set P(S).

Associativity. For any three subsets A, B, C ⊆ S, an element x ∈ (A ∩ B) ∩ C if and only if x ∈ A,


x ∈ B, and x ∈ C if and only if x ∈ A ∩ (B ∩ C). Thus,

(A ∩ B) ∩ C = A ∩ (B ∩ C)

and we conclude that ∩ is associative.


Commutativity. For any two subsets A, B ⊆ S,

A ∩ B = {x ∈ S | x ∈ A and x ∈ B} = {x ∈ S | x ∈ B and x ∈ A} = B ∩ A

and we conclude that ∩ is commutative.


Identity. Since A ∩ S = A, we see that ∩ has an identity element in P(S).
Idempotence. It is easy to see that A ∩ A = A, so ∩ also satisfies idempotence.
Inverses. If A ( S, then for any subset B ⊆ S, since A ∩ B ⊆ A, we see that A ∩ B ( S and in
particular than A ∩ B 6= S. Hence, ∩ does not have inverses in P(S).
Distributivity. The operation ∩ is distributive over ∪ in P(S), i.e., for all A, B, C ∈ P(S),

A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C).

(We leave the proof of this result as an exercise for the reader. See Exercise 1.2.17.) 4

We wish to point out that in Definition 1.2.6(3), the requirement a ? e = e ? a = a is not


redundant when the operation is not commutative. A similar comment holds for the definition
of the inverse element in Definition 1.2.8. The following example illustrates the necessity of the
seemingly redundant statements.
p
Example 1.2.11. Let ? be the operation on R≥0 defined by x ? y = |x2 + xy − y 2 |. We note
that ? is not commutative and observe that 0 is the identity element for this binary operation. Now
suppose we are given a and wish to solve for x in a ? x = 0. We have
√ √ !
2 2 a±a 5 1+ 5
a ? x = 0 =⇒ a + ax − x = 0 =⇒ x = =⇒ x = a ,
2 2
1.2. THE CARTESIAN PRODUCT; OPERATIONS; RELATIONS 17

where we choose the + in order


√ for x to remain a positive number. On the other hand, if we are
given a and solve for x in x a = 0, we have
√ √ !
2 2 −a ± a 5 −1 + 5
x ? a = 0 =⇒ x + ax − a = 0 =⇒ x = =⇒ x = a ,
2 2

where we choose the + in order for x to remain a positive number. Consequently, the binary
operation ? does not have an inverse.
We may sometimes call the left-inverse of a an element x such that x ? a is the identity and call
the right-inverse of a an element x such that a?x is the identity. In this example, the operation has a
left- and a right-inverse for all elements in R≥0 . However, since the left-inverse and the right-inverse
are not equal, then ? does not have inverses for any element. 4

1.2.3 – Relations
The everyday notion of a relationship between classes of objects is very general and somewhat
amorphous. Mathematics requires a concept as broad as that of a relation but with rigor. Cartesian
products offer a simple solution.

Definition 1.2.12
A relation from a set A to a set B is a subset R of A × B. A relation on a set A is a subset
of A2 . If (a, b) ∈ R, we often write a R b and say that a is related to b via R.

At first sight, this definition may appear strange. We typically think of a relation as some
statement about pairs of objects that is true or false. By gathering together all the true statements
about a relation into a subset of the Cartesian product, this definition gives the notion of a relation
(in mathematics) the same rigor as sets and as Boolean logic.

Example 1.2.13. Let W be the set of Wheaton College students registered now and C the set of
classes offered now. Let T be the relation of “taking classes.” T is a relation from W to C and we
write w T c if student w is taking class c. (A major function of the registrar is to keep track of T at
any given point in time.) 4

Example 1.2.14. Consider the relation ≤ on S = {1, 2, 3, 4, 5}. According to Definition 1.2.12, the
relation ≤ is the subset of S × S, given by

{(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (2, 2), (2, 3), (2, 4), (2, 5), (3, 3), (3, 4), (3, 5), (4, 4), (4, 5), (5, 5)}. 4

When we consider relations on reasonably small sets, we may depict them in a variety of ways.
We illustrate the following four descriptions with the same relation R from A = {1, 2, 3, 4, 5} to
B = {a, b, c}

List. Define the relation R as as subset of A × B:

R = {(1, b), (1, c), (2, a), (2, b), (4, a), (4, b), (4, c), (5, c)}.

Chart. In a chart with the columns labeled for the elements of A and the columns labeled with
the elements of B, mark a check in the box of column x and row y if x R y. The chart for our
running example is the following.

1 2 3 4 5
a x x
b x x x
c x x x
18 CHAPTER 1. SET THEORY

Matrix. Label the elements of A and B by A = {a1 , a2 , . . . , am } and B = {b1 , b2 , . . . , bn }. Define


the m × n matrix MR associated to R by the matrix with entries mij defined by
(
1 if bi R aj
mij =
0 otherwise.

For our example, we have  


0 1 0 1 0
MR = 1 1 0 1 0 .
1 0 0 1 1
Note that the chart and matrix descriptions are very similar.
Arrow Diagram. Use a Venn diagram with a bubble for A and a bubble for B, illustrating the
elements in each as points. Draw an arrow from a point a ∈ A to a point b ∈ B if a R b. The
arrow diagram for our running example is given below.

1 a

3 b

5 c

Example 1.2.15. Consider the relation t on R defined by x t y whenever


(
y = 10 − x2
x2 + y 2 = 4.

The relation t consists of the solution set. Setting x2 = 10−y and plugging into the second equation,
we have y 2 − y − 6 = 0, which has two roots −2 and 3. Referring to the first equation, we find that
√ √ √ √
t= {( 7, 3), (− 7, 3), ( 12, −2), (− 12, −2)}.

In this relation, no other real numbers are in relation to each other. 4

In this book, we introduced functions between sets before relations. Many presentations of set
theory reverse the order. An alternate definition of a function f from A to B is a relation satisfying
∀a ∈ A ∃!b ∈ B, af b. Function notation writes f (a) = b instead of a f b. We now see relations as a
generalization of functions. Furthermore, recasting our former definition of a function in this way
provides a rigorous definition as opposed to using the unclear term “association.”
Example 1.2.16. Let P be the set of people who are living and let E be the set of working email
accounts. Let R be the relation from P to E so that p R e stands for person p owns the email e.
Some people own multiple email accounts, so R could not be a function. R also fails to be a function
because people do not own any email accounts. Conversely, note that some email accounts are used
by more than one person so it would not be possible to create a function from E to P that accurately
described email ownership. 4
1.2. THE CARTESIAN PRODUCT; OPERATIONS; RELATIONS 19

Definition 1.2.17
Let A, B, and C be sets. Let R1 be a relation from A to B and let R2 be a relation from
B to C. The composite relation of R2 with R1 is the relation R = R2 ◦ R1 from A to C
such that
aRc ⇐⇒ ∃b ∈ B, a R1 b and b R2 c.

Definition 1.2.18
If R is a relation from A to B, then the inverse relation R−1 is the relation from B to A
such that
b R−1 a ⇐⇒ a R b.

Certain classes of relations from a set A to itself play important roles due to a combination of
specific properties. A relation from a set A to itself is called a relation on A. In the next section, we
introduce equivalence relations and partial orders, both of which are essential in abstract algebra.
However, we list here below some of the properties for relations on a set A that are often of particular
interest.

Definition 1.2.19
Let R be a relation on a set A. The relation R is called
(1) reflexive if ∀a ∈ A, a R a (Reflexivity);
(2) symmetric if a R b =⇒ b R a (Symmetry);
(3) antisymmetric if a R b and b R a =⇒ a = b (Antisymmetry);

(4) transitive if a R b and b R c =⇒ a R c (Transitivity).

Example 1.2.20. Let L be the set of lines in R2 . The notion of perpendicularity ⊥ is a relation
on L. It is a symmetric relation but it does not satisfy any of the other properties described in
Definition 1.2.19. 4

Example 1.2.21. Let B be the set of blood types. Encode the blood types by B = {o, a, b, ab}
and consider the donor relation →, such that t1 → t2 means (disregarding all other factors) someone
with blood type t1 can donate to someone with blood type t2 . As a subset of B 2 , the donor relation
is
{(o, o), (o, a), (o, b), (o, ab), (a, a), (a, ab), (b, b), (b, ab), (ab, ab)}.
It is not hard to check exhaustively or logically that → is reflexive, antisymmetric, and transitive
but not symmetric. 4

As an aside, consider the many nonalphabetical symbols that are used in mathematics. Many
of them either represent a relation or a binary operation. Some common relations symbols on real
numbers are =, ≤, ≥, <, >, and 6=. If S is any set, some common relation symbols on P (S) are ⊆,
(, *. Even the symbols ∈ and ∈ / represent relations from S to P (S).
When defining relations, it is common to create a symbol to signify a relation. When selecting a
symbol to represent a relation, we usually use one that is left-to-right or centrally symmetric (e.g.,
!, $, ) to represent a symmetric relation. Another standard for symbols is that if a symbol
represents a relation R, then the symbol with an angle slash through it represents the complement
relation R. For example 6= is the symbol not equals, a  b is the symbol for the relation in which it
is not true that a ≤ b, and 5 is the complement of whatever the relation E represents.
20 CHAPTER 1. SET THEORY

Exercises for Section 1.2


1. Let A, B, C be sets. Explain why A × B × C is not the same set as A × (B × C).
2. Let A, B, C, D be sets. Explain why A × (B × C) × D is not the same set as (A × B) × (C × D).
3. Let A = {1, 2, 3, 4} and B = {a, b}. Write out as a list (a) A × B; (b) A × A; (c) B × B × A.
4. Write in list form {1, 3} × {2, 4} × {3, 5}.
5. Write in list form {1} × {1, 2} × {1, 2, 3}.
6. Justify the statement that A × ∅ = ∅ for all sets A.
7. (*) This exercise offers a proof that if A and B are countable sets, then A × B is countable.
(a) Find a bijection between N∗ and N∗ × N∗ . [Hint: Count through the pairs (x, y) ∈ N∗ × N∗ by
successively going through the lines of the form x + y = n for n = 2, 3, 4, . . ..]
(b) Use the bijection in the previous part to prove that if A and B are countable sets, then A × B
is countable.

For Exercises 1.2.8 through 1.2.16, determine (with proof or counterexample) if the binary operation is
associative, is commutative, has an identity, has inverses, and/or is idempotent.
8. The operation ∗ on vectors of Rn defined by ~ u ∗ ~v = proju~ ~v , i.e., projection of ~v onto ~
u.
9. The operation ? on the open interval [0, 1) described by a ? b = a + b − ba + bc where bxc is the greatest
integer less than or equal to x.
10. The operation 4 on nonnegative integers N defined by n4m = |m − n|.
11. The operation ~ on points in the plane R2 where A ~ B is the midpoint of A and B.
× on C defined by a+
12. The operation + ×b = a + b + ab.
13. The operation 4 on P(S), where S is any set.
14. The cross product on R3 .
15. The power operator a∧ b = ab on the set N∗ of positive integers.
16. The composition operator ◦ on the set F(A, A) of functions from a set A to A (where A is any set).
17. Prove that for all A, B, C ∈ P(S),

A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C).

18. Let S be a set with a binary operation ∗. Assume that (a ∗ b) ∗ a = b for all a, b ∈ S. Prove that
a∗(b∗a) = b for all a, b ∈ S. [This exercise occurred as Problem A-1 in the 2001 Putnam Mathematical
Competition.]
19. Consider the operations a∧ b = ab and a × b on N∗ . Prove that ∧
is right-distributive over × but not
left-distributive over ×.
20. Let S be a finite set with |S| = n. How many binary operations exist on S?
21. Let S = {1, 2}. How many binary operations on S are associative?
22. Let A and B be finite sets. Find the number of distinct relations from A to B.
2
−n
23. Let A be a finite set with n elements. Prove that the number of reflexive relations on A is 2n and
that the number of symmetric relations on A is 2n(n+1)/2 .

For Exercises 1.2.24 through 1.2.30, determine (with proof ) which of the properties reflexivity, symmetry,
antisymmetry, and transitivity hold for each of the following relations.
24. For any set S, consider the relation G on P(S) defined by A G B to mean that A ∩ B 6= ∅.
25. The relation % on S the set of people defined by p1 % p2 if p1 is taller than or the same height as p2 .
26. The relation R on Z defined by nRm if n ≥ m2 .
27. The relation  on S = R2 defined by (x1 , y1 )  (x2 , y2 ) to mean x21 + y12 ≤ x22 + y22 .
28. The relation $ on R defined by a $ b to mean ab = 0.
29. For any set S, consider the relation on P(S) defined by A B to mean that A ∪ B = S.
1.3. EQUIVALENCE RELATIONS 21

30. The relation on the set of pairs of points in the plane S = R2 × R2 defined by (P1 , Q1 ) (P2 , Q2 )
if the segment [P1 , P2 ] intersects [Q1 , Q2 ].
31. Let S be a set and let R be a relation on S. Prove that if a relation is reflexive, symmetric, and
antisymmetric, then it is the = relation on S.
32. Let P be the set of people who are living now. Let R be the relation on P defined by aRb if a and b
are in the same nuclear family, i.e., if a is a self, child, parent, sibling, or spouse of b.
(a) Decide whether R is reflexive, symmetric, antisymmetric, or transitive.
(b) List all the family relations included in R(2) = R ◦ R.
(c) Give four commonly used family terms for relations in R(3) = R ◦ R ◦ R though not in R(2) .
33. We can define the graph of a relation R from R to itself as the subset of R2

{(x, y) ∈ R2 | x R y}.

(a) Sketch the graph of the relation ≤.


(b) Sketch the graph of the relation l defined by x l y if |x − y| = 1.
(c) Provide defining geometric characteristics of a subset of R2 for a relations on R that are (i)
reflexive; (ii) symmetric; (iii) transitive; (iv) antisymmetric.
34. Let S = {a, b, c, d, e} and consider the relation R on S described by

R = {(a, a), (a, c), (a, d), (b, c), (b, e), (c, b), (c, d), (e, a), (e, b)}.

Determine as a list in S × S, the composite relation R ◦ R.


35. Let R be a relation on a set A. Denote by R(n) the n-composite relation of R with itself:
n times
z }| {
(n) def
R = R ◦ R ◦ · · · ◦ R.

Prove that the relation R is transitive if and only if R(n) ⊆ R for all n = 1, 2, 3, . . ..
36. Let R be a relation that is reflexive and transitive. Prove that Rn = R for all n ∈ N∗ .

1.3
Equivalence Relations
An equivalence relation is a generalization of the concept of equality. Intuitively speaking, an
equivalence relation mentally models a notion of sameness or similarity, that is to say that two
elements are in relation to each other if they are “the same from a certain perspective.” This
concept is ubiquitous throughout mathematics and occurs frequently in algebra.

1.3.1 – Equivalence Relations

Definition 1.3.1
An equivalence relation on a set S is a relation ∼ that is reflexive, symmetric, and transitive.

Example 1.3.2. Let S be any set. The equal relation is reflexive, symmetric, antisymmetric, and
transitive. In particular, = is an equivalence relation. Two elements are in relation via = if and
only if they are the same object. 4

Example 1.3.3. Let S be the set of lines in R3 . Consider the relation of parallelism, denoted k,
on S. This is reflexive, symmetric, and transitive and so is an equivalence relation on S. From an
intuitive perspective, all lines that are parallel have the same direction. 4
22 CHAPTER 1. SET THEORY

Example 1.3.4. Define C as intersections in Chicago and define the relation R on C to be “within
walking distance.” As stated, this is not well-defined so let us say that two intersections in Chicago
are within walking distance if and only if they are two miles or less apart. This relation is reflexive
and symmetric but not transitive. If three intersections a, b, and c lie successively in a straight line
with a and b two miles apart and b and c also two miles apart, then a and c are four miles apart.
This relation is not an equivalence relation. 4

Example 1.3.5. Let X = {1, 2, 3, . . . , 10} and consider S = P(X). For the following two relations,
we discuss whether they are equivalence relations.

(1) A ∼1 B if |A| = |B|. This is an equivalence relation. Two sets are equivalent to each other if
they have the same cardinality.

(2) A ∼2 B if 1 ∈ A ∩ B. The relation ∼2 is symmetric and transitive but is not reflexive. A set
that does not contain 1 is not in relation to itself. Sets that are in relation to each other in ∼2
have the similar property of all containing the element 1 but it feels natural to impose that in
any notion of sameness, an element should be “the same” as itself. 4

1.3.2 – Equivalence Classes


Since an equivalence relation furnishes some notion of sameness on elements of a set, it is natural to
gather similar elements into classes. Such classes formalize the sameness property.

Definition 1.3.6
Let ∼ be an equivalence relation on a set S. For a ∈ S, the equivalence class of a is
def
[a] = {s ∈ S | s ∼ a}.

We sometimes write [a]∼ to clarify if a certain context considers more than one equivalence
relation at a time. A subset U ⊆ S is called an equivalence class if U = [a] for some a ∈ S.
An element a of an equivalence class U is called a representative of U .

We call a subset T of S a complete set of distinct representatives of ∼ if any equivalence class U


has U = [a] for some a ∈ T and for any two a1 , a2 ∈ T , [a1 ] = [a2 ] implies that a1 = a2 .

Definition 1.3.7
Let S be a set and ∼ an equivalence relation on S. The set of ∼-equivalence classes on S
is called the quotient set of S by ∼ and is denoted by S/ ∼.

The quotient set S/ ∼ is a first example of what we called in the preface, a quotient object. As
we will see later at various points in the book, in a given algebraic structure, if we begin with one
object O that also has an equivalence relation ∼ such that O/ ∼ is again an object with the same
algebraic structure, we call O/ ∼ a quotient object. With a set S, any equivalence relation on S
leads to a quotient set. However, with other algebraic structures, not all equivalence relations are
such that the set of equivalence classes naturally produce another object with the same algebraic
structure. Therefore, the study of quotient objects will require some care.
We remark that there is a bijection between S/ ∼ and a complete set of representatives T of ∼
via the function

ψ : T → S/ ∼
a 7→ [a].

However, we do not consider these sets as equal since their objects are different.
1.3. EQUIVALENCE RELATIONS 23

Example 1.3.8. The notion of cardinality of subsets of a given set can now be described in the
following way. Fix a set S, not necessarily finite or even countable, and consider the equivalence
relation ∼ on P(S) by A ∼ B if and only if there exists a bijection between A and B. It is not hard
to check that this is an equivalence relation on P(S). We saw earlier that we always write |A| instead
of [A]∼ but now the notion of cardinality |A| is well-defined as an element in the set P(S)/ ∼, even
if A is not a finite set. 4

Example 1.3.9 (Projective Space). Let L(R3 ) be the set of lines in R3 and consider the equiva-
lence relation of parallelism k on L(R3 ). If L is a line in L(R3 ), then [L] is the set of all lines parallel
to L. The set L(R3 )/ k is called the projective plane and is denoted as RP2 . It consists of all the
directions lines can possess.
There are other ways to understand the projective plane. Every line in R3 is parallel to a unique
line through the origin. Possible direction vectors for lines through the origin consist simply of
nonzero vectors, so we consider the set R3 − {(0, 0, 0)}. Lines through the origin are the same if and
only if their given direction vector differs by a nonzero multiple. Hence, we define the equivalence
relation ∼ on R3 − {(0, 0, 0)} by

~a ∼ ~b ⇐⇒ ~a = λ~b, for some λ ∈ R − {0}.

Our comments on direction vectors show that as sets (R3 − {(0, 0, 0)})/ ∼= RP2 . 4

Example 1.3.10 (Rational Numbers). Consider the set of pairs S = Z × Z∗ and the relation ∼
defined by
(a, b) ∼ (c, d) ⇐⇒ ad = bc.

We leave it to the reader to show that this is an equivalence relation. (See Exercise 1.3.20.) The
quotient set S/ ∼ is a rigorous definition for Q. The equivalence relation is precisely the condition
that is given when two fractions are considered equal. Hence, the fraction notation ab for rational
numbers represents the equivalence class [(a, b)]∼ . 4

Remark 1.3.11. When working with functions whose domains are quotient sets, it is natural to
wish to define a function of an equivalence class based on a representative of the class. More precisely,
if S and T are sets and ∼ and equivalence relation on S, we may wish to define a function

F : S/ ∼ −→ T
(1.5)
[a] −
7 → f (a)

where f : S → T is some function. This construction does not always produce a function. We say
that a function defined according to (1.5) is well-defined if whenever a ∼ b then f (a) = f (b). Thus,
using any representative of the equivalence of [a] will return the same value for F ([a]). 4

As an example to illustrate the above remark, consider the equivalence relation ∼ on Z × Z∗


described in Example 1.3.10. We also use the typical notation of ab for the equivalence classes.
Suppose that we attempted to define a function f : Q → Z by f ( ab ) = a + b. This is not a well-
defined function (and hence is not a function at all) because for example 12 = 36 whereas 1 + 2 = 3
and 3 + 6 = 9. Depending on the representative chosen for the fraction ab , the value of a + b may
differ.
We could circumvent this problem of properly defining functions from a quotient set by choosing
a specific representative from each equivalence class. For example, still working with functions f
from Q as defined in Example 1.3.10, we could define f (q) = a + b where the fraction q = ab is
expressed in reduced form with b > 0. This approach is sometimes neither tractable nor desirable.
Otherwise, we can define a function f in terms of a representative of an equivalence class and prove
that it is well-defined by showing that its image under f is the same for all elements in any given
2
+b2
equivalence class. For example, consider the function g : Q → Q defined by g ab = a ab

. We show
24 CHAPTER 1. SET THEORY

a
that this g is well-defined as follows. Suppose that b = dc . Then (a, b) ∼ (c, d) so ad = bc. Then

c2 + d2 b2 (c2 + d2 )
= since these two fractions satisfy ∼
cd b2 cd
a d + b2 d2
2 2
= because bc = ad
bad2
a + b2
2
= since the last two fractions satisfy ∼.
ba
Hence, the formula for g is independent of the choice of representative for the fraction ab .
The above discussion leads to another nice property about quotient sets. Let ∼ be an equivalence
relation on a set S and let p : S → S/ ∼ be the “projection” defined by p(a) = [a]. For any set X
and any function f : S → X such that a ∼ b implies that f (a) = f (b), there exists a unique function
def
f 0 : S/ ∼ → X such that f = f 0 ◦ p. This function is simply defined by f 0 ([a]) = f (a) and we saw
that since f (a) = f (b) whenever a ∼ b, this f 0 is well-defined. We often express the relationship
f = f 0 ◦ p by calling the following diagram of sets and functions commutative.

p
S S/ ∼

f0
f
X

1.3.3 – Partitions
Let ∼ be an equivalence relation on a set S. For any two elements in a, b ∈ S, by definition a ∈ [b] if
and only if a ∼ b. However, since ∼ is symmetric, this implies that b ∼ a and hence that b ∈ [a]. By
transitivity, if a ∈ [b] then s ∼ a implies that s ∼ b, so a ∈ [b] implies that [a] ⊆ [b]. Consequently,
we have proven that the following statements are logically equivalent:

a ∈ [b] ⇐⇒ [a] ⊆ [b] ⇐⇒ b ∈ [a] ⇐⇒ [b] ⊆ [a] ⇐⇒ [a] = [b].

This observation leads to the following important proposition.

Proposition 1.3.12
Let S be a set equipped with an equivalence relation ∼. Then
(1) distinct equivalence classes are disjoint;

(2) the union of distinct equivalence classes is all of S.

Proof. Suppose that [a] ∩ [b] 6= ∅. Then there exists c ∈ [a] ∩ [b] so c ∈ [a] and c ∈ [b]. Hence,
[a] = [c] = [b]. Hence, if two equivalence classes overlap, then they are equal.
Let T be a complete set of representatives of ∼ in S. Obviously, since a ∈ [a], every element of
S is in some equivalence class. Thus, we have
[ [
S= [a] = [a]
a∈S a∈T

and the result follows. 

The property of equivalence classes described in Proposition 1.3.12 has a particular name in set
theory.
1.3. EQUIVALENCE RELATIONS 25

s5 s6
s8
s10
s2 s9
s3 s7
s4 S
s1

Figure 1.1: A partition of a set

Definition 1.3.13
Let S be a set. A collection A = {Ai }i∈I of subsets of S is called a partition of S if

(1) Ai ∩ Aj 6= ∅ =⇒ i = j and
[
(2) Ai = S.
i∈I

Partitions of sets may be visualized by a diagram akin to Figure 1.1. In this figure, S is a set
with ten elements and the sets of the partition are {s1 , s2 , s3 }, {s4 }, {s5 , s6 }, and {s7 , s8 , s9 , s10 }. A
partition of S is a particular subset of P(S). A general subset of P(S) would consist of overlapping
subsets of S and possibly not cover all of S. Hence, a diagram like Figure 1.1 would not suffice to
visualize a general subset of P(S).
The concept of a partition simply models the mental construction of subdividing a set into parts
without losing any elements of the set and without any parts overlapping. Partitions and equivalence
relations are closely connected. Proposition 1.3.12 establishes that the set of distinct equivalence
classes of an equivalence relation on S forms a partition of S. The following proposition establishes
the converse.

Proposition 1.3.14
Let A = {Ai }i∈I be a partition of a set S. Define the relation ∼A on S by

a ∼A b =⇒ ∃i ∈ I with a ∈ Ai and b ∈ Ai .

Then ∼A is an equivalence relation. Furthermore, the sets in A are the distinct equivalence
classes of ∼A .

Proof. Let a ∈ S be arbitrary. Since A is a partition, then a ∈ Ai for some i ∈ I. Hence, a ∼A a


and ∼A is reflexive.
Suppose that a ∼A b. Then for some i ∈ I, we have a ∈ Ai and b ∈ Ai . Obviously, this implies
that b ∼A a, showing that ∼A is symmetric.
Suppose that a ∼A b and b ∼A c. Then for some i ∈ I, we have a ∈ Ai and b ∈ Ai and for some
j ∈ I, we have b ∈ Aj and c ∈ Aj . However, since b ∈ Ai ∩ Aj , then Ai ∩ Aj 6= ∅. By definition of a
partition, i = j and so a ∈ Ai and c ∈ Ai and thus a ∼A c. This shows transitivity and establishes
that ∼A is an equivalence relation.
Let Ai be a set in A and let s be any element in Ai . By construction, [s] = Ai and so the
elements of A are precisely the equivalence classes of ∼A . 
26 CHAPTER 1. SET THEORY

Example 1.3.15. Consider the set S = {1, 2, 3, 4, 5, 6, 7, 8}. The following are examples of parti-
tions on S:

{{1, 4, 5}, {2, 6, 7}, {3, 8}} , {{1, 3, 5, 7}, {2, 4, 6, 8}} , {{1, 2, 3, 4, 5, 6, 7, 8}} .

However, {{1, 3, 5}, {2, 8}, {4, 7}} is not a partition because the union of the subsets does not contain
6. On the other hand, {{1, 2, 3, 5}, {3, 6, 8}, {4, 6, 7}} is not a partition because some of the subsets
have nonempty intersections, namely {1, 2, 3, 5} ∩ {3, 6, 8} = {3} and {3, 6, 8} ∩ {4, 6, 7} = {6}. 4

Example 1.3.16. Consider the unit sphere in R3 , denoted by S2 . Consider the partition on S2
given by
A = {{p, −p} | p ∈ S2 }.
The partition A consists of pairs of points that are diametrically opposite each other. According
to Proposition 1.3.14, there exists a unique equivalence relation ∼1 on S2 that has A as a set of
distinct equivalence classes. Note that any line through the origin intersects S2 in two diametrically
opposite points. Hence, the quotient set S2 / ∼1 is equal to the projective space RP2 constructed in
Example 1.3.9. 4

Exercises for Section 1.3


1. Let S = Z × Z and let R be the relation on S defined by (a, b)R(c, d) means that a + d = b + c. Show
that R is an equivalence relation. Concisely describe the equivalence classes of R.
2. Let C be the set of people in your abstract algebra class. Describe a “natural” relation satisfying each
of the combination of properties listed below.
(a) Reflexive and symmetric, but not transitive.
(b) Reflexive and transitive, but not symmetric.
(c) Symmetric and transitive, but not reflexive.
(d) An equivalence relation.

For Exercises 1.3.3 through 1.3.15, prove or disprove whether the described relation is an equivalence relation.
If the relation is not an equivalence relation, determine which properties it lacks.
3. Let P be the set of living people. For all a, b ∈ P , define the relation a R b if a and b have met.
4. Let P be the set of living people. For all a, b ∈ P , define the relation a R b if a and b live in a common
town.
5. Let C be the set of circles in R2 and let R be the relation of concentric on C.
6. Let S = Z × Z and define the relation R on S by (m1 , m2 ) R (n1 , n2 ) if m1 m2 = n1 n2 .
7. Let S = Z × Z and define the relation R on S by (m1 , m2 ) R (n1 , n2 ) if m1 n1 = m2 n2 .
8. Let S = Z × Z and define the relation R on S by (m1 , m2 ) R (n1 , n2 ) if m1 n2 = m2 n1 .
9. Let P3 be the set of polynomials with real coefficients and of degree 3 or less. Define the relation R
on P3 by p(x) R q(x) to mean that q(x) − p(x) has 5 as a root.
10. Consider the set C 0 (R) of continuous functions over R. Define the relation R on C 0 (R) by f R g if
there exist some a, b ∈ R such that

g(x) = f (x + a) + b for all x ∈ R.

11. Let Pfin (R) be the set of finite subsets of R and define the relation ∼ on Pfin (R) by A ∼ B if the sum
of elements in A is equal to the sum of elements in B. Prove that ∼ is an equivalence relation.
12. Let `∞ (R) be the set of sequences of real numbers. Define the relation R on `∞ (R) by (an ) R (bn ) if

lim (bn − an ) = 0.
n→∞

13. Let `∞ (R) be the set of sequences of real numbers. Define the relation R on `∞ (R) by (an ) R (bn ) if
the sequence (an + bn )∞
n=1 converges.
1.3. EQUIVALENCE RELATIONS 27

14. Let S be the set of lines in R2 and let R be the relation of perpendicular.
15. Let W be the words in the English language (i.e., have an entry in the Oxford English Dictionary).
Define the relation R on W by w1 R w2 is w1 comes before w2 in alphabetical order.
16. Let C 0 ([0, 1]) be the set of continuous real-valued functions on [0, 1]. Define the relation ∼ on C 0 ([0, 1])
by Z 1 Z 1
f ∼g ⇐⇒ f (x) dx = g(x) dx.
0 0
Show that ∼ is an equivalence relation and describe (with a precise rule) a complete set of distinct
representatives of ∼.
17. Let C ∞ (R) be the set of all real-value functions on R such that all its derivatives exist and are
continuous. Define the relation R on C ∞ (R) by f R g if f (n) (0) = g (n) (0) for all positive, even integers
n.
(a) Prove that R is an equivalence relation.
(b) Describe concisely all the elements in the equivalence class [sin x].
18. Let S = {1, 2, 3, 4} and the relation ∼ on P(S), defined by A ∼ B if and only if the sum of elements
in A is equal to the sum of elements in B, is an equivalence relation. List the equivalence classes of ∼.
19. Let T be the set of (nondegenerate) triangles in the plane.
(a) Prove that the relation ∼ of similarity on triangles in T is an equivalence relation.
(b) Concisely describe a complete set of distinct representatives of ∼.
20. Prove that the relation defined in Example 1.3.10 is an equivalence relation.
21. Let S = {1, 2, 3, 4, 5, 6}. For the partitions of S given below, write out the equivalence relation as a
subset of S × S.
(a) {{1, 2}, {3, 4}, {5, 6}}
(b) {{1}, {2}, {3, 4, 5, 6}}
(c) {{1, 2}, {3}, {4, 5}, {6}}
22. Let S = {a, b, c, d, e}. For the partitions of S given below, write out the equivalence relation as a
subset of S × S.
(a) {{a, d, e}, {b, c}}
(b) {{a}, {b}, {c}, {d}, {e}}
(c) {{a, b, d, e}, {c}}
23. Let C 1 ([a, b]) be the set of continuously differentiable functions on the interval [a, b]. Define the relation
∼ on C 1 ([a, b]) as f ∼ g if and only if f 0 (x) = g 0 (x) for all x ∈ (a, b). Prove that ∼ is an equivalence
relation on C 1 ([a, b]). Describe the elements in the equivalence class for a given f ∈ C 1 ([a, b]).
24. Let Mn×n (R) be the set of n × n matrices with real coefficients. For two matrices A, B ∈ Mn×n (R),
we say that B is similar to A if there exists an invertible n × n matrix S such that B = SAS −1 .
(a) Prove that similarity ∼ is an equivalence relation on Mn×n (R).
(b) Prove that the function f : Mn×n (R)/ ∼ → R defined by f ([A]) = det A is a well-defined function
on the quotient set Mn×n (R)/ ∼.
(c) Determine with a proof or counterexample whether the function g : Mn×n (R)/ ∼ → R defined
by g([A]) = Tr A, the trace of A, is a well-defined function.
25. Define the relation ∼ on R by a ∼ b if and only if b − a ∈ Q.
(a) Prove that all real x ∈ R, there exists y ∈ [x]∼ that is arbitrarily close to x. (In other words, for
all ε > 0, there exists y with y ∼ x and |x − y| < ε.)
(b) (*) Prove that ∼ has an uncountable number of equivalence classes.
26. Let R1 and R2 be equivalence relations on a set S. Determine (with a proof or counterexample) which
of the following relations are also equivalence classes on S. (a) R1 ∩ R2 ; (b) R1 ∪ R2 ; (c) R1 4R2 .
[Note that R1 ∪ R2 , and similarly for the others, is a relation as a subset of S × S.]
28 CHAPTER 1. SET THEORY

27. Which of the following collections of subsets of the integers form partitions? If it is not a partition,
explain which properties fail.
(a) {pZ | p is prime}, where kZ means all the multiples of k.
(b) {{3n, 3n + 1, 3n + 2} | n ∈ Z}.
(c) {{k | n2 ≤ k ≤ (n + 1)2 } |n ∈ N}.
(d) {{n, −n} | n ∈ N}.
28. Let S be a set. Prove that there is a bijection between the set of partitions of S and the set of
equivalence classes on S.
29. Call p(n) the number of equivalence relations (equivalently, by Exercise 1.3.28, partitions) on a set of
cardinality n. (The numbers p(n) are called the Bell numbers after the Scottish-born mathematician
E. T. Bell.)
(a) (*) Prove that p(0) = 1 and that for all n ≥ 1, p(n) satisfies the condition
n−1
!
X n−1
p(n) = p(n − j − 1) .
j=0
j

(b) Use the previous part to calculate p(n) for n = 1, 2, 3, 4, 5, 6, 7.


30. Consider the relation ∼ on R defined by x ∼ y if y − x ∈ Z.
(a) Prove that ∼ is an equivalence relation.
(b) Prove that if a ∼ b and c ∼ d, then (a + c) ∼ (b + d).
(c) Decide with a proof or counterexample whether ac ∼ bd, whenever a ∼ b and c ∼ d.
31. Let S be a set and let A = {Ai }i∈I be a partition of S. Another partition B = {Bj }j∈J is called a
refinement of A if
∀j ∈ J , ∃i ∈ I, Bj ⊆ Ai .
Let A and B be two partitions of a set S and let ∼A (resp. ∼B ) as the equivalence relation corre-
sponding to A (resp. B). Prove that B is a refinement of A if and only if s1 ∼B s2 =⇒ s1 ∼A s2 .
32. Let S be a set and let A = {Ai }i∈I and B = {Bj }j∈J be two partitions of S. Prove that the collection
of sets
{Ai ∩ Bj | i ∈ I and j ∈ J } − {∅}
is a partition of S.
33. Let S be a set and let R be any relation on S. Design an algorithm that determines the smallest
equivalence relation on S that contains the relation R.

1.4
Partial Orders
1.4.1 – Partial Orders
Section 1.3.1 introduced equivalence relations as a generalization of the notion of equality. Equiva-
lence relations provide a mental model for calling certain objects in a set equivalent. Similarly, the
concept of a partial order generalizes the inequality ≤ on R to a mental model of ordering objects
in a set.

Definition 1.4.1
A partial order on a set S is a relation 4 that is reflexive, antisymmetric, and transitive. A
pair (S, 4), where S is a set and 4 is a partial order on S is often succinctly called a poset.
1.4. PARTIAL ORDERS 29

The name poset, which abbreviates “partially ordered set,” emphasizes the perspective that
posets form an algebraic structure. As a first example of a nontrivial structure, we point out that a
poset consists of a set equipped with a relation with certain specified properties. The data for many
other algebraic structures will resemble Definition 1.4.1.
Motivated by the notations for inequalities over R, we use the symbol ≺ to mean

x ≺ y ⇐⇒ x 4 y and x 6= y

/ to mean that it is not true that x 4 y.


and the symbol 4

Example 1.4.2. Consider the relation ≤ on R. For all x ∈ R, x ≤ x so ≤ is reflexive. For all
x, y ∈ R, if x ≤ y and y ≤ x, then x = y and hence ≤ is antisymmetric. It is also true that x ≤ y
and y ≤ z implies that x ≤ z and hence ≤ is transitive. Thus, the inequality ≤ on R is a partial
order. 4

Note that ≥ is also a partial order on R but that the strict inequalities < and > are not. The
inequality < is not reflexive though it is both antisymmetric and transitive. (< is antisymmetric
because there do not exist any x, y ∈ R such that x < y and y < x so the conditional statement
“x < y and y < x implies x = y” is trivially satisfied.)
Equivalence relations generalize = and hence loosen up some properties of =. In a similar way,
though modeled after the relation of ≤ on R, a partial order on a set S has a number of additional
possibilities. For example, in a general poset (S, 4), given two arbitrary elements a, b ∈ S, it is
possible that neither a 4 b nor b 4 a.

Definition 1.4.3
Let (S, 4) be a poset. If for some pair {a, b} of distinct elements, either a 4 b or b 4 a then
we say that a and b are comparable; otherwise a and b are called incomparable. A partial
order in which every pair of elements is comparable is called a total order .

The posets (N, ≤) and (R, ≤) are total orders. Many posets are not total orders as the following
examples illustrate.

Example 1.4.4. Consider the donor relation → defined on the set of blood types B = {o, a, b, ab}
as discussed in Example 1.2.21. We saw that → is reflexive, antisymmetric, and transitive. This
shows that (B, →) is a poset. Note that a and b are not comparable, meaning that neither can
donate to the other. 4

Example 1.4.5. Let S be any set. The subset relation ⊆ on P(S) is a partial order. In the partial
order, many pairs of subsets in S are incomparable. In fact, two subsets A and B are incomparable
if and only if A − B and B − A are both nonempty. 4

Example 1.4.6. Consider the relation 4 on R2 defined by

(x1 , y1 ) 4 (x2 , y2 ) ⇐⇒ 2x1 − y1 < 2x2 − y2 or (x1 , y1 ) = (x2 , y2 ).

That (x1 , y1 ) 4 (x1 , y1 ) is built into the definition so 4 is reflexive. It is impossible for 2x1 − y1 <
2x2 − y2 and 2x2 − y2 ≤ 2x1 − y1 so the only way (x1 , y1 ) 4 (x2 , y2 ) and (x2 , y2 ) 4 (x1 , y1 ) can occur
is if (x1 , y1 ) = (x2 , y2 ). Finally, the relation is also transitive so that 4 is a partial order on R2 .
In this poset on R2 , two elements (x1 , y1 ) and (x2 , y2 ) are incomparable if and only if 2x2 − y2 =
2x1 − y1 and (x1 , y1 ) 6= (x2 , y2 ), namely they are distinct points on the same line of slope 2. 4

Besides the dichotomy between totally ordered posets and partial orders with incomparable
elements, there is another dichotomy that already appears when comparing properties of the posets
(N, ≤) and (R, ≤). In (R, ≤), given any x ≤ y with x 6= y, there always exists an element z such
that z 6= x and z 6= y with x ≤ z ≤ y. In contrast, in (N, ≤), for example 2 ≤ 3 but for all z ∈ N, if
2 ≤ z ≤ 3, then z = 2 or z = 3.
30 CHAPTER 1. SET THEORY

Definition 1.4.7
Let (S, 4) be a poset and let x ∈ S. We call y ∈ S an immediate successor (resp. immediate
predecessor ) to x if y 6= x with x 4 y (resp. y 4 x) and for all z ∈ S such that x 4 z 4 y
(resp. y 4 z 4 x), either z = x or z = y.

In (N, ≤) all elements have both immediate successors and immediate predecessors, except for 0
that does not have an immediate predecessor. In (Z, ≤) all elements have both immediate successors
and immediate predecessors. In contrast, as commented above, in (R, ≤) no element has either an
immediate successor or an immediate predecessor.
A partial order does not have to be a total order to have immediate successors or predecessors.
In the blood donor relation (B, →) in Example 1.4.4, o has two immediate successors, namely a and
b.

Example 1.4.8 (Another Order on Q). The usual partial order of ≤ on Q>0 is a total order,
but like for (R, ≤), no element has either an immediate successor or an immediate predecessor. We
define an alternate partial order 4 on Q>0 in which every element has an immediate successor and
an immediate predecessor, except for 1 which only has an immediate successor.
For fractions given in reduced form, we define
(
a c a + b ≤ c + d if a + b 6= c + d
4 ⇐⇒
b d a≤c if a + b = c + d.

It is not hard to check that: (1) 4 is a partial order on Q>0 ; (2) 4 is a total order; (3) that every
element besides 1 has an immediate successor and an immediate predecessor. (See Exercise 1.4.6.)
We can visualize this total order in the following way. Organize all fractions in Q>0 as in the
chart below. For all positive integers n, define the subsets An by
 
x
An = ∈ Q>0 x + y = n + 1, gcd(x, y) = 1, 1 ≤ x, y ≤ n .
y
1 3 5 7
So for example, A7 = 7, 5, 3, 1 .

A6
1 2 3 4 5 6
A5 6 6 6 6 6 6

1 2 3 4 5 6
A4 5 5 5 5 5 5

1 2 3 4 5 6
A3 4 4 4 4 4 4

1 2 3 4 5 6
A2 3 3 3 3 3 3

1 2 3 4 5 6
A1 2 2 2 2 2 2

1 2 3 4 5 6
1 1 1 1 1 1

We read the fractions in Q>0 successively according to 4 by first going through the An subsets
in increasing order of n and within each An (each of which is finite) list the fractions in increasing
order. 4
1.4. PARTIAL ORDERS 31

1.4.2 – Subposets

Definition 1.4.9
Let S be a nonempty set equipped with a partial order 4 and let T be a subset of S. The
restriction of 4 to T is the relation on T defined by
def
4T = 4 ∩(T × T ),

where T × T is viewed as a subset of S × S.

It is not difficult to see that 4T is reflexive, antisymmetric, and transitive on T , making (T, 4T )
into a poset in its own right. We call (T, 4T ) a subposet of (S, 4).
Though a generic poset (S, 4) need not be a total order, many of the terms associated to
inequalities in relation to subsets of R have corresponding definitions in any poset.

Definition 1.4.10
Let (S, 4) be a poset, and let A be a subset of S.

(1) A maximal element of A is an M ∈ A such that if t ∈ A with M 4 t, then t = M .


(2) A minimal element of A is an m ∈ A such that if t ∈ A with t 4 m, then t = m.

As an example, consider the blood donor relation (B, →) described in Example 1.4.4 and consider
the subset A = {o, a, b}. Then A has one minimal element o and two maximal elements a and b.

Definition 1.4.11
Let (S, 4) be a poset, and let A be a subset of S.
(1) An upper bound of A is an element u ∈ S such that ∀t ∈ A, t 4 u.
(2) A lower bound of A is an element ` ∈ S such that ∀t ∈ A, ` 4 t.

(3) A least upper bound of A is an upper bound u of A such that for all upper bounds u0
of A, we have u 4 u0 .
(4) A greatest lower bound of A is a lower bound ` of A such that for all lower bounds `0
of A, we have `0 4 `.

We say that a subset A ⊆ S is bounded above if A has an upper bound, is bounded below if A
has a lower bound, and is bounded if A is bounded above and bounded below.
If u1 and u2 are two least upper bounds to A, then by definition u1 4 u2 and u2 4 u1 . Thus,
u1 = u2 and we conclude that least upper bounds are unique. It is similar for greatest lower bounds.
Therefore, if a subset A has a least upper bound, we talk about the least upper bound of A and
denote this element by lub(A). Similarly, if a subset A has a greatest lower bound, we talk about
the greatest lower bound of A and denote this element by glb(A). If A is a subset of S given in
list form as A = {a1 , a2 , . . . , an }, we often write lub(a1 , a2 , . . . , an ) for lub(A) and similarly for the
greatest lower bound.
From the perspective of analysis, one of the most important differences between the posets (R, ≤)
and its subposet (Q, ≤) is that any bounded subset of R has a least upper bound whereas this does
not hold in Q. Consider for example, the subset
 
p 2 2
A= ∈ Q | p < 2q .
q
32 CHAPTER 1. SET THEORY
√ r
In R, lub(A) = 2 whereas in Q for any upper bound u = s of A we have


 
1 r 2s r
2< + < .
2 s r s

(We leave the proof to the reader.) Hence, A has no least upper bound in (Q, ≤).

Example 1.4.12. Let S be any set and consider the power set P(S) equipped with the ⊆ partial
order. Let X be a subset of P(S). Then a maximal element of X is a set M in X such that no other
set in X contains M . Note that there may be more than one of these. An upper bound of X is any
subset of S that contains every element in every set in X. The least upper bound of X is the union
[
lub(X) = A.
A∈X 4

Definition 1.4.13
In a poset (S, 4) any subposet (T, 4T ) that is a total order is called a chain.

The concept of a chain allows us to introduce a theorem that is essential in a variety of contexts
in algebra.

Theorem 1.4.14 (Zorn’s Lemma)


Let (S, 4) be a poset. Suppose that every chain in S has an upper bound. Then S contains
a maximal element.

In the context of ZF-set theory, Zorn’s Lemma is equivalent to the Axiom of Choice. (See [62,
Theorem 5.13.1] for a proof.)

Definition 1.4.15
A poset (S, 4) is called a lattice if for all pairs (a, b) ∈ S × S, both lub(a, b) and glb(a, b)
exist.

Lattices are a particularly nice class of partially order sets. They occur frequently in various
areas of mathematics. Given any set S, the power set (P(S), ⊆) is a lattice with lub(A, B) = A ∪ B
and glb(A, B) = A ∩ B. In Section 3.6 we show how to utilize the lattice structure on the set of
subgroups of a group effectively to quickly answer questions about the internal structure of a group.

1.4.3 – Hasse Diagrams


For partially ordered sets with a relatively small number of elements it is possible to easily visualize
the relation via a Hasse diagram.
Let (S, 4) be a poset in which S is finite. In a Hasse diagram, each element of S corresponds
to a point in the plane, with the points placed on the page so that if a 4 b, then b appears higher
on the page. The points of the Hasse diagram are also called nodes or vertices. Finally, we draw an
edge between two points (corresponding to) a and b with b above a if b is an immediate successor of
a. A chain of the partial order appears in the Hasse diagram as a rising path.
Figure 1.2 gives the Hasse diagram for the donor relation described in Example 1.4.4.
The Hasse diagram of a poset (S, 4) does not have an edge between every pair (p, q) where
p 4 q. The diagram need not indicate that p 4 p for all elements in S since reflexivity is part of
the definition. Furthermore, the diagram does not show any edges that would exist by virtue of
transitivity. For example, in Figure 1.2, the diagram does not show an explicit edge between o and
ab but we read that o → ab because there is a chain from o to ab.
1.4. PARTIAL ORDERS 33

ab

a b

Figure 1.2: The Hasse diagram for the donor relation on blood types

Example 1.4.16. Consider the partial order on S = {a, b, c, d, e, f, g, h, i} described by the Hasse
diagram shown in Figure 1.3. The diagram makes it clear what relations hold between elements. For
example, notice that all elements in {a, b, c, d, e, f, g} are incomparable with the elements in {h, i}.
The maximal elements in S are d and i. The minimal elements are a, e, g, and h. As a least upper
bound calculation, lub(a, f ) = c because c is the first element in a chain above a that is also in
a chain above f . We also see that lub(e, h) does not exist because there is no chain above e that
intersects with a chain above h. 4

c g i
b f
h
a e

Figure 1.3: An example of a poset defined by a Hasse diagram

Hasse diagrams allow for easy visualization of properties of the poset. For example, a poset will
be a lattice if and only if taking any two points p1 and p2 in the diagram, there exists a chain rising
from p1 that intersects with a chain rising from p2 (existence of the least upper bound) and a chain
descending from p1 that intersects with a chain descending from p2 (existence of the greatest upper
bound).
Figure 1.4 illustrates three different lattices. The reader is encouraged to notice that, the third
Hasse diagram corresponds to the partial order ⊆ on P({1, 2, 3}). (See Figure 1.5.)

Figure 1.4: Lattices

1.4.4 – Monotonic Functions


When studying functions between two objects with a given algebraic structure, we often do not care
about all possible functions between the underlying sets. In order for some properties to carry over
from one object to another, we usually need to assume that the function preserves the structure
of the objects. The notion of a structure-preserving function will reoccur regularly but we show
already what this means in the context of posets.
34 CHAPTER 1. SET THEORY

{a, b, c}

{a, b} {a, c} {b, c}

{a} {b} {c}

Figure 1.5: Lattice of (P({a, b, c}), ⊆)

Definition 1.4.17
Let (S, 41 ) and (T, 42 ) be two partially ordered sets. A function f : S → T is called
monotonic (or order-preserving) if

x 41 y =⇒ f (x) 42 f (y).

Example 1.4.18. Consider the function f : R → R defined by f (x) = ex . It is not hard to show
that since e > 1, then x1 ≤ x2 implies that ex1 ≤ ex2 . Thus, f is a monotonic function from (R, ≤)
to itself. 4

For functions over intervals of R, we typically say that f is increasing or decreasing functions to
mean

(increasing) : x ≤ y =⇒ f (x) ≤ f (y) and


(decreasing) : x ≤ y =⇒ f (x) ≥ f (y).

We do not distinguish between increasing and decreasing with posets generally because (R, ≥) is
also a poset. Hence, a decreasing function is simply a monotonic function from (R, ≤) to (R, ≥).

Definition 1.4.19
Let (S, 41 ) and (T, 42 ) be two partially ordered sets. A function f : S → T is called a
poset isomorphism if it is bijective and x 41 y if and only if f (x) 42 f (y).

The etymology of the term “isomorphism” comes from the Greek, meaning “same shape.” If
there exists an isomorphism between two posets they are the same object from the perspective of
the poset algebraic structure; only labels of elements change under f .

1.4.5 – Well-Orderings
A few algorithms presented in this textbook involve a strictly decreasing sequence of elements in a
poset. In many instances, it is crucial to know that these algorithms terminate.

Definition 1.4.20
A total order (S, 4) is called a well-ordering (or well-order, or well-ordered set) if every
nonempty subset A ⊂ S contains a least element.

The poset (N, ≤) is a well-ordering. (See Axiom 2.1.1 and the surrounding discussion.) In
contrast, (Z, ≤) is not a well-ordering because, for example, Z itself has no least element.
1.4. PARTIAL ORDERS 35

Proposition 1.4.21
A total order 4 on S is a well-ordering if and only if every strictly decreasing sequence in
S eventually terminates.

Proof. We prove the contrapositive statement. Suppose that 4 is not a well-ordering. Then there
exists a subset A ⊆ S that has no least element. Pick a1 ∈ A. Since a1 is not a least element of
A, there exists a2 ∈ A such that a1  a2 (i.e., a1 < a2 and a2 6= a1 ). Then a2 is also not a least
element so there exist a3 ∈ A such that a2  a3 . Continuing similarly, we find that there exists an
infinite strictly decreasing sequence

a1  a2  a3  · · · .

Conversely, suppose that (S, 4) is a total order and that there exists a strictly decreasing sequence
as described above. Then the subset {a1 , a2 , a3 , . . .} ⊆ S contains no least element and therefore
(S, 4) is not a well-ordered set. 

The concept of well-ordering is valuable for a number of profound properties, not the least of
which is the principle of mathematical induction. However, Proposition 1.4.21 is important for some
algorithms we encounter in algebra: If an algorithm involves steps that produce a sequence that is
strictly decreasing with respect to some well-ordering, then by this proposition, the algorithm must
terminate.

1.4.6 – Lexicographic Orders


Suppose that (A1 , 41 ) and (A2 , 42 ) are two posets, the lexicographic order on the Cartesian product
A1 × A2 is the partial order 4 defined by (a1 , a2 ) 4 (b1 , b2 ) if a1 6= b1 and a1 4 b1 or a1 = b1 and
a2 4 b2 .
More generally, if (Ai , 4i ) are posets for i = 1, 2, . . . , n (not necessarily distinct), then the
associated lexicographic order on A1 × A2 × · · · × An is the order 4 such that

(a1 , a2 , . . . , an ) 4 (b1 , b2 , . . . , bn )

if and only if the first i such that ai 6= bi we have ai 4 bi .


The lexicographic order on the Cartesian product derives its name from the fact that it mimics
the total order used on words expressed in alphabetical languages for organization in a dictionary.
For example, in English, we can think of the alphabet A as consisting of the 26 letters “A” through
“Z” and a space “ ”. There is a total order on A with the space being the least element of A and
all the other letters in their usual order. Words are elements of An for some large enough n, where
we can think of using spaces at the end if the word is shorter than length n. So “cantaloupe” comes
before “cantilever” in the lexicographic order because at the first position in which the letters differ
(the fifth letter) “a” comes before “i” in the alphabet.
Example 1.4.22. Consider the poset (Z, ≤) and let 4 be the lexicographic ordering on Z × Z × Z.
Then, for example,

(2, 1700, −5) 4 (4, −300, 2) because the first entry where they differ 2 ≤ 4,
(−5, 4, −10) 4 (−5, 4, 0) because the first entry where they differ − 10 ≤ 0. 4

Example 1.4.23. Consider the posets (Z, ≤) and (P({1, 2, 3, 4}), ⊆) and let 4 be the lexicographic
ordering on Z × P({1, 2, 3, 4}). Then, for example,

(3, {1, 3}) 4 (5, {2, 3, 4}) because the first entry where they differ 3 ≤ 5,
(−2, {4}) 4 (−2, {2, 3, 4}) because the first entry where they differ {4} ⊆ {2, 3, 4}.

On the other hand (−2, {1, 4}) and (−2, {2, 3, 4}) are incomparable. 4
36 CHAPTER 1. SET THEORY

1.4.7 – Quotient Posets (Optional)


In the preface, we referred to the concept of quotient objects in an algebraic structure. In Sec-
tion 1.3.2, we presented the notion of a quotient set S/ ∼ as the set of equivalence classes arising
from an equivalence relation ∼ on the set S. Though quotient structures figure more prominently in
later algebraic structures, we briefly illustrate the idea of a quotient object in the context of posets.
Let (S, 4) be a poset. The intuitive idea behind constructing a quotient poset is that S/ ∼ should
be a poset with a partial order 40 inherited from 4. In other words, if p : S → S/ ∼ is the projection
p(a) = [a], then we want p to be monotonic between (S, 4) and (S/ ∼, 40 ).
However, not every equivalence relation ∼ on S will allow for this. For example, suppose that
x 4 y 4 z such that x ∼ z but x  y. Then if p is monotonic we would have [x] 40 [y] because
x 4 y but also [y] 40 [z] because y 4 z. Since [z] = [x], by antisymmetry we deduce that [y] = [x].
This contradicts the assumption that y ∈ / [x].
Suppose that the projection p is a monotonic function onto (S/ ∼, 40 ) for some partial order 40 .
Let x, y ∈ S satisfying x 4 y and let x0 , y 0 ∈ S such that x0 ∼ x and y 0 ∼ y. Since p is monotonic,
x 4 y =⇒ [x] 40 [y]. Now

y 0 4 x0 and x0  y 0 =⇒ [y 0 ] 40 [x0 ] and [x0 ] 6= [y 0 ]


0
=⇒ [x0 ]4
/ [y 0 ] since 40 is a partial order
0
/ [y]
=⇒ [x]4
=⇒ x4
/ y.

Taking the contrapositive of the above implication, we deduce that x 4 y implies that y 0 4
/ x0 or
0 0 0 0
x ∼ y . Note that if x ∼ y then x ∼ y in the first place and thus [x] = [y]. Also, the statement
that y 0 4/ x0 is equivalent to x0 4 y 0 or x0 and y 0 are incomparable.
Conversely, suppose that ∼ satisfies the condition that

x  y and x 4 y and x0 ∼ x and y 0 ∼ y =⇒ y 0 4


/ x0 . (1.6)

Then we define 4inh on S/ ∼ by

[x] 4inh [y] if and only if ∃x0 ∈ [x], ∃y 0 ∈ [y] such that x0 4 y 0 (1.7)

makes the projection p monotonic onto S/ ∼. We call 4inh the partial order inherited from 4. This
proves the following proposition.

Proposition 1.4.24
Let (S, 4) be a poset and let ∼ be an equivalence relation on S satisfying Condition
(1.6). Then the partial order 4inh on S/ ∼ inherited from 4 on S defined by (1.7) makes
(S/ ∼, 4inh ) into a poset.

Definition 1.4.25
We call Condition (1.6) the poset quotient condition and we call the poset (S/ ∼, 4inh )
established in Proposition 1.4.24 the quotient poset of (S, 4) by ∼.

Example 1.4.26. Consider the Hasse diagram in Figure 1.6. Consider the poset depicted on the
left and suppose that the gray bubbles indicate equivalences of an equivalence relation ∼ on S. Then
∼ satisfies the condition in (1.6). The Hasse diagram on the right shows the resulting quotient poset
S/ ∼. 4

Example 1.4.27. Consider the poset (R, ≤) and the equivalence relation ∼ on R defined by x ∼ y
if and only if bxc = byc, where bxc is the greatest integer less than or equal to x. Equivalence classes
of this partition consist of intervals [n, n + 1) with n ∈ Z where [x] = [n, n + 1) if and only if bxc = n.
1.4. PARTIAL ORDERS 37

S S/ ∼

Figure 1.6: A quotient poset

Suppose x  y, x ≤ y, x ∼ x0 , and y ∼ y 0 . Call m = bxc and n = byc. Then m < n, bx0 c = m


and by 0 c = n. Consequently, y 0 ≥ n ≥ m + 1 > x0 and in particular y 0  x0 . Hence, Condition 1.6
holds. The quotient poset of (R, ≤) by ∼ is (Z, ≤). 4

Exercises for Section 1.4


1. Let S = {a, b, c, d, e} (where we consider all the labels unique elements). In the following relations on
S determine with explanation whether or not the relation is a partial order. If it fails antisymmetry
then remove a least number of pairs and if it fails transitivity then add some pairs to make the relation
a partial order.
(a) R = {(a, a), (b, b), (c, c), (d, d), (e, e), (a, c)}
(b) R = {(a, a), (b, b), (c, c), (d, d), (e, e), (a, c), (a, d)}
(c) R = {(a, a), (b, b), (c, c), (d, d), (e, e), (a, c), (d, a)}
(d) R = {(a, a), (b, b), (c, c), (d, d), (e, e), (b, c), (c, d), (d, e), (a, e)}
2. In microeconomics (the study of consumer behavior), one considers consumer’s utility (preference)
in regards to pairs of commodities. Let (q1 , q2 ) ∈ N2 be a pair of nonnegative integers representing
quantities of two commodities. Explain why, given two specific commodities and a given consumer,
the relation of preferable (or equal) is a partial order.
3. Let S = R>0 × R>0 be the positive first quadrant in the Cartesian plane. Consider the relation R on
S defined by
(x1 , y1 ) R (x2 , y2 ) =⇒ x1 y1 ≥ x2 y2 .
Prove or disprove that R is a partial order.
√ √ 1 2

4. Prove that for any real x > 2, the inequality 2 < 2
x+ x
< x holds.
5. Let (S, 4) be a partial order in which every element has an immediate successor. Prove that it is not
necessarily true that for any two elements a 4 b that any chain between a and b has finite length.
6. Prove the three claims about properties of 4 in Example 1.4.8. Prove that Q≥0 is countable. Conclude
that Q is countable.
7. Let S be a set. Show that the relation of refinement is a partial order on the set of partitions of S.
(See Exercise 1.3.31.)
8. Draw the Hasse diagram of the partial order ⊆ on P({1, 2, 3, 4}).
9. Draw the Hasse diagram for the poset ({1, 2, 3, 4, 5, 6}, ≤).
10. Let A = {a, b, c, d, e, f, g}. Draw the Hasse diagram for the partial order 4 given as a subset of A × A
as
4 = {(a, a), (b, b), (c, c), (d, d), (e, e), (f, f ), (g, g), (a, c),
(b, c), (d, g), (a, e), (b, e), (c, e), (d, h), (g, h)}.
11. A person’s blood type is usually listed as one of the eight elements in the set
B 0 = {o+, o−, a+, a−, b+, b−, ab+, ab−}.
We define the donor relation → on B 0 as follows. The relation t1 → t2 holds if the letter portion of
the blood type donates according to Examples 1.2.21 and 1.4.4 and if someone with a + designation
can only give to someone else with +, while someone with − can give to anybody.
38 CHAPTER 1. SET THEORY

(a) Draw the Hasse diagram for (B 0 , →).


(b) Show that the (B 0 , →) poset does not have the lexicographic order on B × {+, −}.
12. Consider the set of triples of integers Z3 . Define the relation 4 on Z3 by
(
a1 + a2 + a3 < b1 + b2 + b3 if a1 + a2 + a3 6= b1 + b2 + b3 ;
(a1 , a2 , a3 ) 4 (b1 , b2 , b3 ) ⇐⇒
a1 + a2 + a3 4lex b1 + b2 + b3 if a1 + a2 + a3 = b1 + b2 + b3 ,

where 4lex is the lexicographic order on Z3 (with each copy of Z equipped with the partial order ≤).
Prove that 4 is a partial order on Z3 . Prove also that 4 is a total order.
13. Let (Ai , 4i ) be posets for i = 1, 2, . . . , n and define 4lex as the lexicographic order on A1 × A2 × · · · An .
Prove that 4lex is a total order if and only if 4i is a total order on Ai for all i.
14. Let 4 be the lexicographic order on R3 , where each R is equipped with the usual ≤. Prove or disprove
the following statement: For all vectors ~a, ~b, ~c, d,
~ if ~a 4 ~b and ~c 4 d,
~ then ~a + ~c 4 ~b + d.
~
15. Answer the following questions pertaining to the poset described by the Hasse diagram below.

j
h i
f g
e
c
d
a b

(a) List all the minimal elements.


(b) List all the maximal elements.
(c) List all the maximal elements in the subposet with {a, b, c, d, e, f, g}.
(d) Determine the length of the longest chain and find all chains of that length.
(e) Find the least upper bound of {a, b}, if it exists.
(f) Find the greatest lower bound of {b, c}, if it exists.
(g) List all the upper bounds of {f, d}.

16. Consider the partial order on R2 given in Example 1.4.6. Let A be the unit disk

A = {(x, y) ∈ R2 | x2 + y 2 ≤ 1}.

(a) Show that A has both a maximal and minimal element. Find all of them.
(b) Find all the upper bounds and all the lower bounds of A.
17. Consider the lexicographic order on R2 coming from the standard (R, ≤). Let A be the closed disk of
center (1, 2) and radius 5.
(a) Show that A has both a maximal and minimal element. Find all of them.
(b) Find all the upper bounds and all the lower bounds of A.
(c) Show that A has both a least upper bound and a greatest lower bound.
18. Prove that in a finite lattice, there exists exactly one maximal element and one minimal element.
19. Let (B, →) be the poset of blood types equipped with the donor relation. (See Example 1.4.4.)
(a) Consider the poset ({1, 2, 3}, ≤). Show that the function f : B −→ {1, 2, 3} defined by f (o) = 1,
f (a) = 2, f (b) = 2 and f (ab) = 3 is a monotonic function.
(b) Show that there exists no isomorphism between (B, →) and ({1, 2, 3, 4}, ≤).
1.4. PARTIAL ORDERS 39

20. Let (S, 4), (T, 40 ), and (U, 400 ) be three posets. Let f : S → T and g : T → U be monotonic functions.
Prove that the composition g ◦ f : S → U is monotonic.
21. Prove that the poset (R, ≤) is not isomorphic to (R − {0}, ≤).
22. Prove that the poset of integers greater that a fixed number k (with partial order ≤) is isomorphic to
the (N, ≤).
23. Let (S, 41 ) and (T, 42 ) be two partially ordered sets and let f : S → T be a monotonic function.
(a) Prove that if A is a subset of S with an upper bound u, then f (u) is an upper bound of f (A).
(b) Prove with a counterexample that f (lub(A)) is not necessarily equal to lub(f (A)).
(c) Prove that if f is an isomorphism, then f (lub(A)) = lub(f (A)).
24. Prove or disprove that (Z, ≤) and (Q, ≤) are isomorphic as posets.
25. Determine whether the posets corresponding to the following Hasse diagrams are lattices. If they are
not, explain why.
(a) (b) h (c) h (d) h
g
f g f g f g
f
d e d e d e d e

b c b c b c b c

a a a
26. Explain under what conditions a flow chart may be viewed as a partial order.
27. Let (S, 4) and (T, 42 ) be two partial orders. Suppose that ∼ is an equivalence relation on S that
satisfies Condition (1.6). Prove that for any monotonic function f : S → T such that f (a) = f (b)
whenever a ∼ b, there exists a unique monotonic function f 0 : S/ ∼ → T such that f = f 0 ◦ p, where
p : S → S/ ∼ is the projection p(a) = [a] and the partial order on S/ ∼ is defined in (1.7). In the
terminology of diagrams, prove that the diagram below is commutative.
p
(S, 4) (S/ ∼, 40 )

f0
f
(T, 42 )

28. Let R1 and R2 be partial orders on a set S.


(a) Prove that R1 ∩ R2 is a partial order.
(b) Show by a counterexample that R1 ∪ R2 is not necessarily a partial order.
29. Consider the poset (N, ≤) and consider the equivalence relation ∼ on N defined by
j n k jmk
n ∼ m ⇐⇒ = .
10 10
(a) Describe the equivalence classes of ∼ and find a complete set of distinct representatives of ∼.
(b) Show that ∼ satisfies the poset quotient condition.
(c) Describe the poset (N/ ∼, ≤inh ).
30. Let S = P({a, b, c, d, e}) and consider the poset (S, ⊆). Consider the equivalence relation ∼ on S that
has as its partition
n o n o
{a} ∪ C, {b} ∪ C, {a, b} ∪ C | C ⊆ {c, d, e} ∪ C | C ⊆ {c, d, e} .

(The equivalence relation has the effect of considering a and b as the same element.) Equivalence
classes have either one element or three elements.
(a) Show that ∼ satisfies the poset quotient condition.
(b) Show that (S/ ∼, ⊆inh ) is isomorphic to a lattice of subgroups for a set of four elements.
40 CHAPTER 1. SET THEORY

1.5
Projects
Project I. Discriminants of Polynomials. The reader should be familiar with the discriminant
of a quadratic polynomial, which determines if a quadratic has 0, 1, or 2 real roots. We
investigate the discriminants for the cubic polynomials and polynomials of higher degree.
(1) Thinking about the shape of a parabola that corresponds to f (x) = ax2 + bx + c and
where it can interest the x-axis determine a condition using calculus that tells you when
the equation ax2 + bx + c has zero, one or two distinct roots.
(2) Show that by replacing x with the variable by x = su + t for appropriate constants s and
t, any cubic equation ax3 + bx2 + cx + d = 0 can be rewritten as u3 + pu + q = 0.
(3) (Using calculus) and thinking about the possible maxima and minima of the graph of
a cubic, find conditions depending on p and q that determine when a cubic of the form
f (x) = x3 + px + q has one, two, or three distinct real roots.
(4) Repeat the above question with f (x) = x3 + ax2 + bx + c.
(5) Can anything be said for a quartic (degree 4) polynomial? Assume that the discriminant
is an expression in that coefficients of the polynomial that is 0 if and only if the polynomial
has a multiple root.
Project II. Fuzzy Set Theory. Let S be any set. Exercise 1.1.28 establishes a bijection between
P(S) and functions from S to {0, 1}. The one defining characteristic of a set is that is has a
clear, unequivocal rule whether any object is in or not in it.
Fuzzy set theory offers a model for when instead of knowing unequivocally whether an object
is in or not in the set, we only know with a certain likelihood whether an object is in it. Given
a set S, a fuzzy subset of S is a function p : S → [0, 1]. Hence, for all s ∈ S, the function p(s)
is a real number between 0 and 1, inclusive. We could make the connection with normal sets
via the interpretation that p(a) = 0 to mean that a is not in the set and p(b) = 1 means b is
in the set. A fuzzy subset of S is tantamount to assigning a probability to each elements of S.
In this project, you are encouraged to develop a theory of fuzzy subsets. Define and make sense
of the usual set operations: intersection, union, complement, set difference, and symmetric
difference. Does the concept of a function between fuzzy sets make sense? Does the concept
of a relation make sense? Can you make sense of the usual relation A ⊆ B for fuzzy sets?
Describe some natural situations where fuzzy set theory may legitimately find more use than
usual set theory.
Project III. Associahedron. Let S be a set and ◦ a binary operation on S. If ? is not associative,
then the expressions with three terms a1 ? (a2 ? a3 ) and (a1 ? a2 ) ? a3 are not necessarily equal.
The nth associahedron Kn is a convex polytope (arbitrary dimensional generalization of a
polyhedron) in which each vertex corresponds to a way of properly placing parentheses in an
operation expression with n terms and each edge corresponds to a single application of the
associativity rule. With three terms, K3 consists of two vertices and one edge so is a line
segment.

a1 ? (a2 ? a3 ) (a1 ? a2 ) ? a3

Address the following questions.


(1) Show that K4 is a pentagon.
(2) Attempt to determine the structure of K5 .
1.5. PROJECTS 41

(3) Can you determine the number of vertices of Kn ?


(4) Provide your own investigations to Kn for higher n.
Project IV. Ternary Equivalence Relations. In this chapter, we studied equivalence relations.
The concept of collinearity on points applies to three points at a time and hence is not a (binary)
relation. Collinearity is an example of a ternary relation. A ternary (as opposed to binary)
relation R on a set S is any subset of S × S × S. For the example of collinearity, we have
S = R2 and for three points A, B, C ∈ R2 , we have (A, B, C) ∈ R if and only if there exists a
line L with A, B, C ∈ L.
Define (just for this project) a ternary-equivalence relation by a ternary relation R such that
• (ternary-reflexive) (a, a, a) ∈ R for all a ∈ S;
• (ternary-symmetric) if (a, b, c) ∈ R then any other triple consisting of the same elements
but in possibly a different order is also in R;
• (ternary-transitive) if (a, b, c) ∈ R and (b, c, d) ∈ R, then (a, b, d) ∈ R and (a, c, d) ∈ R.
Address at least the following questions. Is collinearity a ternary-equivalence relation? Can
you think of other natural ternary-equivalence relations? What elements of the theory of
equivalence relations carry over to ternary-equivalence relations and with what modifications?
Is the definition given for ternary-equivalence a good one? If so, why? If not, how could it be
improved?
Project V. My First Functor. After Definition 1.1.9, we mentioned that the process of creating
the power set P(S) of a set S is a way to get a new set from an old one. This project develops
the power procedure further. Let f : S → T be a function between sets S and T . Define the
function P(f ) : P(S) → P(T ) by P(f )(A) = f (A), as in (1.3), for all subsets A of S.
(1) Show that for any set S, if idS is the identity function on S, then P(idS ) is the identity
function on P(S). Prove also that if f : S → T and g : T → U are functions between
sets, then the composition g ◦ f satisfies P(g ◦ f ) = P(g) ◦ P(f ). [Note: In the language
of category theory, this result establishes the power procedure P as a functor from sets
to sets.]
(2) How does the procedure P behave with respect to the injective, surjective, or bijective
property of functions? (That is, if f is injective, is P(f ) injective?, etc.)
(3) Define now the procedure Pinv by Pinv (S) = P(S) for all sets S and for all functions
f : S → T define Pinv (f ) : P(T ) → P(S) with Pinv (f )(B) = f −1 (B) as in (1.4) for all
subsets B ∈ P(T ). What kinds of questions similar to those posed above can you answer
about this procedure?
Project VI. Social Ordering. In many social contexts, there exists a form of social ordering. In
a company, in the army, even in packs or herds of animals, there exists a concept of dominance,
reporting relations, or obedience structures. Use the discussion in Section 1.4 to give a precise
analysis of at least two different types of social orderings. Clearly describe the partial order
relation and support your discussion with concepts such as monotonic functions, quotient
posets, chains, subposets, and so forth.
2. Number Theory

Some mathematicians say tongue-in-cheek that abstract algebra is a service branch of mathematics.
By “service” they mean that applications in other branches of mathematics shape the development
and topics of interest of abstract algebra. Algebraists chafe at such comments because, as a branch of
mathematics, algebra has a life of its own. However, there is some historical basis for this sentiment.
Besides approaches to solving the quadratic equation dating back to antiquity, geometry and
number theory (arithmetic) formed the heart of mathematics until the 16th century. As algebra
became a branch of its own, work centered on studying properties of polynomials with an emphasis
on finding solutions to polynomial equations. However, certain conjectures in geometry and number
theory resisted proofs using classical methods. Some of these difficult problems galvanized math-
ematical investigations, leading to countless discoveries and to the development of whole fields of
inquiry.
Consider Fermat’s Last Theorem that states that for any integer n ≥ 3, there do not exist
nonzero integers a, b, and c such that
an + bn = cn .
In 1637, Fermat wrote in the margin of his notes that he had found a proof for this fact but that he
did not have room in the margin to write it down. For hundreds of years, mathematicians attempted
to prove this conjecture. Though simple to state, a rigorous proof of Fermat’s Last Theorem eluded
mathematicians until 1994 when, building on the work of scores of others, Andrew Wiles proved
the Taniyama-Shimura Conjecture, which implies Fermat’s Last Theorem. During their quest for
this Holy Grail of classical number theory, mathematicians contributed to areas now labeled as ring
theory and algebraic geometry.
Throughout this book, we will occasionally mention when classical conjectures motivated certain
directions of investigation and when algebra offered solutions to some of these great problems.
Consequently, many motivating examples for various types of algebraic structures come from
geometry and number theory. Though this text does not offer a review in geometry, it does offer three
sections of elementary number theory that cover only the topics absolutely necessary to introduce the
algebraic structures we will study. Section 2.1 reviews terms and theorems concerning divisibility of
integers. Section 2.2 introduces modular arithmetic, a useful technique to answer many divisibility
problems in number theory. Finally, Section 2.3 gives an optional review of mathematical induction.

2.1
Basic Properties of Integers
The following results about integers arise in a basic course on number theory. Indeed, some of
these topics are taught as early as late elementary school and developed to different degrees in high
school. However, in primary and secondary school, many basic theorems in number theory are
taught without giving proofs and are introduced in an order that is not appropriate for providing
proofs. In this section, we discuss elementary divisibility properties of integers, first because we
need these results, and second as a reference for how to develop similar topics in general algebraic
structures.

2.1.1 – Well-Ordering of the Integers


Leopold Kronecker, the 19th century mathematician, famously said, “God gave us the integers; all
else is the work of man.” For the purposes of number theory, Kronecker’s statement is misleading

43
44 CHAPTER 2. NUMBER THEORY

because it presupposes that we are clear about what the integers are. Since the set of integers is
infinite, we cannot list them out or perceive them at a glance. Hence, we need a structural definition,
a set of axioms, that define the set of integers.
There exist a few different, though equivalent formulations of the set of axioms for the integers.
Though we do not provide a list of axioms for the integers here, [53, Appendix A] and [59, Section 2.1]
give two slightly different formulations. Almost all of the axioms are well-known, even by elementary
school children. The only axiom that does not feel immediately obvious is the well-ordering property.

Axiom 2.1.1 (Well-Ordering Property on N)


Let A be any nonempty subset of nonnegative integers. There exists m ∈ A such that
m ≤ a for all a ∈ A. In other words, A contains a minimal element.

Example 2.1.2. Let A = {n ∈ N | n2 > 167}. A is nonempty because for example 202 = 400 > 167
so 20 ∈ A. Theorem 2.1.1 allows us to conclude that A has a minimal element. By trial and error
or using some basic algebra, we can find that the minimal element of A is 13. 4

Example 2.1.3. The rational numbers, equipped with the usual partial order ≤, does not satisfy
the well-ordering principle. Let S = Q≥0 and let A be the set of positive rational numbers. The
set A does not contain a minimal element. No matter what positive rational number pq we take, the
p
rational 2q is less than pq . 4

Example 2.1.4. Let A be the set of integers that can be written as the sum of two positive cubes in
three different ways. Since A consists of positive numbers, Theorem 2.1.1 allows us to conclude that
A is either empty or has a minimal element. In this example, if there does exist a number theoretic
method to find the minimal element of A (besides running a computer algorithm that would perform
an exhaustive search), Theorem 2.1.1 offers no way to determine this minimal element. 4

Example 2.1.4 shows that Theorem 2.1.1 is a nonconstructive result as it affirms the existence of
an element with a certain property without offering a method to “construct” or to find this element.

2.1.2 – Divisibility
The notion of divisibility of integers is often introduced in elementary school. However, in order to
prove theorems about divisibility, we need a rigorous definition.

Definition 2.1.5
If a, b ∈ Z with a 6= 0, we say that a divides b if ∃k ∈ Z such that b = ak. We write a | b if
a divides b and a - b otherwise. We also say that a is a divisor (or a factor ) of b and that
b is a multiple of a.

We write aZ for the set of all multiples of the integer a.


We can prove many basic properties about divisibility directly from the definition. We mention
only a few in the following proposition.

Proposition 2.1.6
Let a, b, and c be integers.
(1) Any nonzero integer divides 0.
(2) Suppose a 6= 0. If a|b and a|c, then a|(b + c).

(3) Suppose a 6= 0 and b 6= 0. If a|b and b|c, then a|c.


(4) Suppose a 6= 0 and b 6= 0. If a|b and b|a, then a = b or a = −b.
2.1. BASIC PROPERTIES OF INTEGERS 45

Proof. For (1), setting k = 0, then for any nonzero integer a has ak = 0. Thus, a divides 0.
For (2), by a|b and a|c, there exist k, ` ∈ Z such that ak = b and a` = c. Then

b + c = ak + a` = a(k + `).

Since k + ` is an integer, a|(b + c).


For (3), by a|b and b|c, there exist k, ` ∈ Z such that ak = b and b` = c. Then c = b` = a(k`).
Since k` ∈ Z, then a|c.
For (4), a|b and b|a means that there exist k, ` ∈ Z such that ak = b and b` = a. Thus, a = ak`.
Since a 6= 0, we can divide by a and get k` = 1. There are only two ways to multiply two integers
to get 1: Either k = ` = 1 or k = ` = −1. The result follows. 

Suppose that we restrict the relation of divisibility to positive integers N∗ . Since any positive
integer divides itself, divisibility on N∗ is reflexive, antisymmetric (by Proposition 2.1.6(4)), and
transitive (by Proposition 2.1.6(3)). Thus, (N∗ , |) is a partially ordered set. This poset has a
minimal element of 1, because 1|n for all positive integers n, but has no maximal element.
To discuss how divisibility interacts with the sign of integers, note that since dk = a implies
d(−k) = −a, then d is a divisor of a if and only if d is a divisor of −a. Therefore, when discussing
the set of divisors of a number a 6= 0, we can assume without loss of generality that a > 0.
Let d be a nonzero integer. To get a positive multiple dk of d, we need k 6= 0 of the same sign
as d. Then
dk = |d||k| ≥ |d|
because |k| ≥ 1. Thus, any nonzero multiple a of d satisfies |a| ≥ |d|. By the same token, any divisor
d of a satisfies |d| ≤ |a|.

Theorem 2.1.7 (Integer Division)


For all a, b ∈ Z with a 6= 0, there exist unique integers q and r such that

b = qa + r where 0 ≤ r < |a|.

The integer q is called the quotient and r is called the remainder .

Proof. Given a, b ∈ Z with a 6= 0. Define the set

S = {b − ka ∈ N | k ∈ Z}.

Let b − k1 a and b − k2 a be two elements of S. We have

(b − k1 a) − (b − k2 a) = (k2 − k1 )a,

so the difference of any two elements in S is a multiple of a. Furthermore, to each element b − ka in


S there corresponds only one integer k.
By the Well-Ordering Principle, S has a least element r = b − qa. Now if r ≥ a, then

b − a(q + 1) = b − aq − a = r − a ≥ 0

and so r − a contradicts the minimality of r as a least element of S. Since any two elements of S
differ by a multiple of a, then r is the unique element n of S with 0 ≤ n < a. Hence, the integers q
and r satisfy the conclusion of the theorem. 

Techniques for integer division are taught in elementary school. It is obvious that d|n if and only
if the remainder of the integer division of n by d is 0.
Borrowing notation from some programming languages, it is common to write b mod a to stand
for the remainder of the division of b by a.
46 CHAPTER 2. NUMBER THEORY

2.1.3 – The Greatest Common Divisor


The concepts of greatest common divisor (or factor) and the least common multiple arise early in
math education. However, we introduce these concepts in a manner that may appear novel but is
equivalent with the elementary school formulation but has the benefit of allowing us to generalize
to other algebraic contexts.

Definition 2.1.8
If a, b ∈ Z with (a, b) 6= (0, 0), a greatest common divisor of a and b is an element d ∈ Z
such that:

• d|a and d|b (d is a common divisor);


• if d0 |a and d0 |b (d0 is another common divisor), then d0 |d.

Because of condition (2) in this definition, it is not obvious that two integers not both 0 have
a greatest common divisor. (If we had said d0 ≤ d in the definition, the proof that two integers
not both 0 have a greatest common divisor is a simple application of the Well-Ordering Principle
on Z.) The key to showing that integers possess a greatest common divisor relies on the Euclidean
Algorithm, which we described here below.
Let a and b be two positive integers with a ≥ b. The Euclidean Algorithm starts by setting
r0 = a and r1 = b and then repeatedly perform the following integer divisions

r0 = r1 q1 + r2 (where 0 ≤ r2 < b)
r1 = r2 q2 + r3 (where 0 ≤ r3 < r2 )
..
.
rn−2 = rn−1 qn−1 + rn (where 0 ≤ r3 < r2 )
rn−1 = rn qn + 0 (and rn > 0).

This process terminates because the sequence r1 , r2 , r3 , . . . is a strictly decreasing sequence of positive
integers and hence has at most r1 = b terms in it. Note that if b | a, then n = 1 and the Euclidean
Algorithm has one line.
To see what the Euclidean Algorithm tells us, consider the positive integer rn . By the last line
of the Euclidean Algorithm we see that rn |rn−1 . From the second to last row, of the Euclidean
Algorithm, rn | rn−1 qn−1 and by Proposition 2.1.6(2), rn | rn−2 . Repeatedly applying this process
(n − 1 times, and hence a finite number), we see that rn | r1 and rn | r0 , so rn is a common divisor
of a and b.
Also, suppose that d0 is a common divisor of a and b. Then d0 k0 = a = r0 and d0 k1 = b = r1 .
We have
r2 = r0 − r1 q1 = d0 k0 − d0 k1 q1 = d0 (k0 − k1 q1 ).
Hence, d0 divides r2 with d0 k2 = r2 . Repeating this process (n − 1 times), we deduce that d0 |rn .
Thus, rn is a positive greatest common divisor of a and b. Consequently, the Euclidean Algorithm
leads to the following theorem.

Proposition 2.1.9
There exists a unique positive greatest common divisor for all pairs of integers (a, b) ∈
Z × Z − {(0, 0)}.

Proof. First suppose that either a or b is 0. Without loss of generality, assume that b = 0 and
a 6= 0. Since any integer divides 0, common divisors of a and b consist of divisors of a. Then greatest
common divisors of a and 0 consist of a and −a and |a| is the unique positive greatest common
divisor of a and 0.
2.1. BASIC PROPERTIES OF INTEGERS 47

Now suppose neither a nor b is 0. Since the set of divisors of an integer c is the same set as
the divisors of −c, then we can assume without loss of generality that a and b are both positive.
Applying the Euclidean Algorithm to a and b shows that the pair (a, b) has a greatest common
divisor. Now suppose d1 and d2 are two positive greatest common divisors of a and b. Then d1 |d2
and d2 |d1 and according to Proposition 2.1.6(4), d1 = d2 since they are both positive. Hence, a and
b possess a unique positive greatest common divisor. 
Because of Proposition 2.1.9, we regularly refer to “the” greatest common divisor of two integers
as this unique positive one and we use the notation gcd(a, b). The proof of Proposition 2.1.9 tells us
how to calculate the greatest common divisor: (1) if a 6= 0, then gcd(a, 0) = |a|; (2) if a, b 6= 0, then
gcd(a, b) is the result of the Euclidean Algorithm applied to |a| and |b|.
Example 2.1.10. We perform the Euclidean Algorithm to find gcd(522, 408).
522 = 408 × 1 + 114
408 = 114 × 3 + 66
114 = 66 × 1 + 48
66 = 48 × 1 + 18
48 = 18 × 2 + 12
18 = 12 × 1 + 6
12 = 6 × 2 + 0
According to Proposition 2.1.9 and the Euclidean Algorithm, gcd(522, 408) = 6. 4
Two integers always have 1 and −1 as common divisors. However, if gcd(a, b) = 1 then we say
that a and b are relatively prime.

Lemma 2.1.11
Let a and b be positive integers. Then if k and l are integers such that a = k gcd(a, b) and
b = l gcd(a, b), then k and l are relatively prime.

Proof. Consider c = gcd(k, l), and write k = ck 0 and l = cl0 for some integers k 0 and l0 . Then
a = k 0 c gcd(a, b) and b = l0 c gcd(a, b).
Therefore, c gcd(a, b) would be a divisor of gcd(a, b) so for some integer h, we have c gcd(a, b)h =
gcd(a, b). Hence, ch = 1. Since c is a positive integer, this is only possible if c = 1. 
There is an alternative characterization of the greatest common divisor. Let a, b ∈ Z∗ and define
Sa,b as the set of integer linear combinations in integers of a and b, i.e.,
Sa,b = {sa + tb|s, t ∈ Z}.

Proposition 2.1.12
The set Sa,b is the set of all integer multiples of gcd(a, b). Consequently, gcd(a, b) is the
least positive linear combination in integers of a and b.

Proof. By Proposition 2.1.6, any common divisors to a and b divides sa, tb, and sa + tb. This shows
that Sa,b ⊆ gcd(a, b)Z. We need to show the reverse inclusion.
By the Well-Ordering Principle, the set Sa,b has a least positive element. Call this element d0
and write d0 = s0 a + t0 b for some s0 , t0 ∈ Z. We show by contradiction that d0 is a common divisor
of a and b. Suppose that d0 does not divide a. Then by integer division,
a = qd0 + r where 0 < r < d0 .
48 CHAPTER 2. NUMBER THEORY

Then a − r = qd0 = qs0 a + qt0 b. Then after rearranging, we get

r = (1 − qs0 )a + (−qt0 )b.

This writes r, which is positive and less than d0 , as a linear combination of a and b. This contradicts
the assumption that d0 is the minimal positive element in S(a, b). Hence, the assumption that d0
does not divide a is false so d0 divides a. By a symmetric argument, d0 divides b as well. Since
kd0 = (ks0 )a + (kt0 )b, then every multiple of d0 is in Sa,b in particular every multiple of gcd(a, b) is
too. Hence, gcd(a, b)Z ⊆ Sa,b . We conclude that Sa,b = gcd(a, b)Z and the proposition follows. 

Proposition 2.1.12 does not offer a way to find the integers s and t such that gcd(a, b) = sa + tb.
If a and b are small then one can find s and t by inspection. For example, by inspecting the divisors
of 22, it is easy to see that gcd(22, 14) = 2. A linear combination that illustrates Proposition 2.1.12
for 22 and 14 is
2 × 22 − 3 × 14 = 44 − 42 = 2.

However, it is possible to backtrack the steps of the Euclidean Algorithm and find s and t such that
sa + tb = gcd(a, b). The following example illustrates this.

Example 2.1.13 (Extended Euclidean Algorithm). In Example 2.1.10, the Euclidean Algo-
rithm gives gcd(522, 408) = 6. We start from the penultimate line in the algorithm and work back-
ward, in such a way that each line gives 6 as a linear combination of the intermediate remainders ri
and ri+1 .

6 = 18 − 12 × 1
= 18 − (48 − 18 × 2) × 1 = 18 × 3 − 48 × 1
= (66 − 48 × 1) × 3 − 48 × 1 = 66 × 3 − 48 × 4
= 66 × 3 − (114 − 66 × 1) × 4 = 66 × 7 − 114 × 4
= 114 × 4 − (408 − 114 × 3) × 7 = 408 × 7 − 114 × 25
= (522 − 408 × 1) × 25 − 408 × 7 = 408 × 32 − 522 × 25

Hence, this last line gives 6 = (−25) × 522 + 32 × 408. 4

The characterization of the greatest common divisor as given in Proposition 2.1.12 leads to many
consequences about the greatest common divisor. The following proposition gives one such example.

Proposition 2.1.14
Let a and b be nonzero integers that are relatively prime. For any integer c, if a|bc, then
a|c.

Proof. Since a and b are relatively prime, then gcd(a, b) = 1. By Proposition 2.1.12, there exist
integers s, t ∈ Z such that sa + tb = 1. Since a | bc, there exists k ∈ Z such that ak = bc. Then

atk = tbc = c(1 − as) = c − cas,

which implies that


c = atk + acs = a(tk + cs).

From this we conclude that a | c. 


2.1. BASIC PROPERTIES OF INTEGERS 49

2.1.4 – The Least Common Multiple

Definition 2.1.15
If a, b ∈ Z∗ , a least common multiple is an element m ∈ Z such that:
• a|m and b|m (m is a common multiple);
• if a|m0 and b|m0 , then m|m0 .

Similar to our presentation of the greatest common divisor, we should note that from this defi-
nition, it is not obvious that a least common multiple always exists. Again, we must show that it
exists.

Proposition 2.1.16
There exists a unique positive least common multiple m for all pairs of integers (a, b) ∈
Z × Z − {(0, 0)}.

Proof. If m1 and m2 are least common multiples of a and b, then m1 |m2 and m2 |m1 . Therefore, by
Proposition 2.1.6(4), if a and b have a least common multiple m, the integer −m is the only other
least common multiple.
Without loss of generality, assume that a and b are positive in the rest of the proof.
Since gcd(a, b) divides a and divides b, then gcd(a, b)|ab. Also, we can write a = k gcd(a, b) and
b = l gcd(a, b). Let M be the positive integer such that M gcd(a, b) = ab. From M gcd(a, b) =
gcd(a, b)kb, we get M = bk and similarly M = al and hence M is a common multiple of a and b.
Let m0 be another common multiple of a and b with m0 = pa and m0 = qb. Since pa = qb then

pk gcd(a, b) = ql gcd(a, b)

and hence pk = ql. Since gcd(k, l) = 1, by Proposition 2.1.14 we conclude that k|q with q = kc for
some integer c. Then m0 = (kc)b = c(bk) = cM and we deduce that m0 | M .
This shows that M = ab/ gcd(a, b) satisfies the criteria of a least common multiple and the
proposition follows. 

We regularly call this unique positive least common multiple of a and b “the” least common
multiple. We denote this positive integer as lcm(a, b). The proof of Proposition 2.1.16 establishes
the following important result

gcd(a, b) lcm(a, b) = ab for all a, b ≥ with (a, b) 6= (0, 0). (2.1)

When restricted to the positive integers N∗ , the existence theorems given in Propositions 2.1.9
and 2.1.16 show that the poset (N∗ , |) is a lattice. The greatest common divisor gcd(a, b) of two
positive integers a and b is the greatest lower bound of {a, b} in the terminology of posets and the
least common multiple is the least upper bound of {a, b}. Remark that (N∗ , |) is an infinite lattice
while for n ≥ 3, the subposet ({1, 2, . . . , n}, |) is not a lattice.

2.1.5 – Prime Numbers

Definition 2.1.17
An element p ∈ Z is called a prime number if p > 1 and the only divisors of p are 1 and
itself. If an integer n > 1 is not prime, then n is called composite.

For short, we often say “p is prime” instead of “p is a prime number.” As simple as the concept
of primality is, properties about prime numbers have intrigued mathematicians since Euclid and
50 CHAPTER 2. NUMBER THEORY

before. The distribution of prime numbers in N or in sequences of integers, additive properties of


prime numbers, fast algorithms for checking if a number is prime, and many other questions still
offer active areas of research in number theory.
By definition, every positive integer is either 1, a prime number, or a composite number. Let
S be the set of composite integers that are not divisible by a prime number. Suppose that S is
nonempty. By the well-ordering of the integers, S has a least element m. Since m is composite, we
can write m = ab, where neither a nor b is 1. Then either a is prime or a is composite. Now a
cannot be prime because m is not divisible by a prime number. Hence, a is composite. Since a < m,
then a ∈/ S so a is divisible by a prime number. By Proposition 2.1.6(3), m must also be divisible
by a prime number. This contradicts the assumption that S 6= ∅. This reasoning establishes the
fundamental result that every positive integer greater than 1 is divisible by a prime number.
One of the earliest results about prime numbers dates back to Euclid, who cleverly applied an
argument by contradiction to prove that there exists an infinite number of prime numbers.

Theorem 2.1.18 (Euclid’s Prime Number Theorem)


The set of prime numbers is infinite.

Proof. Assume that the set of prime numbers is finite. Write the set as {p1 , p2 , . . . , pn }. Consider
the integer
Q = (p1 p2 · · · pn ) + 1.
The integer Q is obviously larger than 1 so it is divisible by a prime number, say pk . Then since

1 = Q − (p1 p2 · · · pn )

we conclude by Proposition 2.1.6(2) that 1 is divisible by pk . This is a contradiction since 1 is only


divisible by 1 and −1. Hence, the set of prime numbers is not finite. 

The problem of finding a fastest algorithm for determining whether a given integer n is prime is a
difficult problem. This problem is not just one of simple curiosity but has applications to industrial
information security. It is possible to simply take in sequence all integers 1 < d ≤ n and perform
an integer division of n by d. The least integer d that divides n is a prime number. This d satisfies
d = n if and only if n is prime. This method offers an exhaustive algorithm to determine whether n
is prime but many improvements can be made. We can shorten the exhaustive algorithm with the
following result.

Proposition 2.1.19

If n is composite, then it has a divisor d such that 1 < d ≤ n.


Proof. Suppose that all the divisors of n are greater than n. Since n is composite,
√ there exist

positive integers a and b greater than 1 with n = ab. The supposition that a > n and b > n
implies that ab = n > n, a contradiction. The proposition follows. 

Corollary 2.1.20

An integer n is prime if and only if none of the integers 1 < d ≤ n divides n.

Techniques for finding prime numbers have grown increasingly technical. See [36] for a recent
text on advanced prime detecting techniques.
We mention two other key properties about prime numbers. We omit here a proof for the second
theorem since we will discuss these topics in greater generality in Section 6.4.
The following proposition gives an alternative characterization for primality.
2.1. BASIC PROPERTIES OF INTEGERS 51

Proposition 2.1.21 (Euclid’s Lemma)


If p > 1, then p is prime if and only if for all a, b ∈ Z, p|ab implies p|a or p|b.

Proof. Suppose that p|ab. If p|a then the proof is done. Suppose instead that p - a. Then, since the
only divisors of p are 1 and itself, gcd(p, a) = 1. By Proposition 2.1.14, p|b and the proposition is
still true. 

Theorem 2.1.22 (Fundamental Theorem of Arithmetic)


If n ∈ Z and n ≥ 2, then there is a unique factorization (up to rearrangement) of n into a
product of prime numbers. More precisely, if n can be written as the product of primes in
two different ways as
n = p1 p2 · · · pr = q1 q2 · · · qs (2.2)
with pi and qj primes, then r = s and there is a bijective function f : {1, 2, . . . , n} →
{1, 2, . . . , n} such that pi = qf (i) .

In the factorizations in (2.2), we do not assume that the prime numbers pi are all unique. It is
common to write the generic factorization of integers as
n = pα 1 α2 αn
1 p2 · · · pn (2.3)
with the primes pi all distinct and αi are nonzero integers. It is also common to list the primes in
increasing order. Using these latter two habits, we call the expression in (2.3) the prime factorization
of n. The prime factorization inspires the so-called prime order function ordp : N∗ → N defined by
ordp (n) = k ⇐⇒ pk | n and pk+1 - n. (2.4)
Example 2.1.23. Let n = 2016. By dividing about by appropriate primes, we find that the prime
factorization is 2016 = 25 × 32 × 7. Thus,
ord2 (2016) = 5, ord3 (2016) = 2, ord7 (2016) = 1,
and the ordp (2016) = 0 for all p ∈
/ {2, 3, 7}. 4
m
The order function extends to a function ordp : Q>0 → Z in the following way. Let n be a
fraction written in reduced form. Then
m
ordp = ordp (m) − ordp (n).
n
Example 2.1.24. Let n/m = 48/55. We have
       
48 48 48 48
ord2 = 4, ord3 = 1, ord5 = −1, ord11 = −1,
55 55 55 55
 
48
and ordp = 0 for all p ∈
/ {2, 3, 5, 11}. 4
55

2.1.6 – The Euler ϕ-Function


As we will soon see, given a positive integer n, counting the number of integers less than n that are
relatively prime to n appears in numerous contexts.

Definition 2.1.25
Euler’s totient function (or Euler’s φ-function) is the function φ : N∗ → N∗ such that φ(n)
is the number of positive integers less than n that are relatively prime to n. In other words,
def
φ(n) = {a ∈ N∗ | 1 ≤ a ≤ n and gcd(a, n) = 1} . (2.5)
52 CHAPTER 2. NUMBER THEORY

Example 2.1.26. A few sample calculations of Euler’s totient function:

(1) φ(8) = 4, because in {1, 2, 3, 4, 5, 6, 7, 8} only 1, 3, 5, and 7 are relatively prime to 8.

(2) φ(20) = 8, because for 1 ≤ a ≤ n, the integers relatively prime to 20 are those that are not
divisible by 2 or by 5. Thus,

{a ∈ Z | 1 ≤ a ≤ n and gcd(a, n) = 1} = {1, 3, 7, 9, 11, 13, 17, 19}.

(3) φ(243) = ϕ(35 ) = 35 − 34 = 243 − 81 = 162, because the integers relatively prime to 243 are
precisely those that are not divisible by 3. 4

Exercises 2.1.29 through 2.1.31 guide the reader to develop the following formula for Euler’s
totient function.

Proposition 2.1.27
α`
If a positive integer n has the prime decomposition of n = pα 1 α2
1 p2 · · · p` , then

α` −1
φ(n) = p1α1 − p1α1 −1 pα α2 −1
· · · pα
  
2 − p2 ` − p` . (2.6)
2 `

Proof. (Left as an exercise for the reader. See Exercise 2.1.31.) 

Exercises for Section 2.1


1. Find the prime factorization of the following integers: (a) 56; (b) 97; (c) 126; (d) 399; (e) 255; (f)
1728.
2. Find the prime factorization of the following integers: (a) 111; (b) 470; (c) 289; (d) 743; (e) 2345; (f)
101010.
3. Draw the Hasse diagram of ({1, 2, 3, . . . , 12}, |).
4. Let n be a positive integer. Show that the number of edges in the Hasse diagram of ({1, 2, 3, . . . , n}, |)
is  
X n
.
p
p: primes≤n

5. Use the Euclidean Algorithm to find the greatest common divisor of the following pairs of integers.
(a) a = 234, and b = 84
(b) a = 5241, and b = 872
(c) a = 1010101, and b = 1221
6. Use the Euclidean Algorithm to find the greatest common divisor of the following pairs of integers.
(a) a = 55, and b = 34
(b) a = 4321, and b = 1234
(c) a = 54321, and b = 1728
7. Define the Fibonacci sequence {fn }n≥0 by f0 = 0, f1 = 1 and fn = fn−1 + fn−2 for all n ≥ 2. Let fn
and fn+1 be two consecutive terms in the Fibonacci sequence. Prove that gcd(fn+1 , fn ) = 1 and show
that for all n ≥ 2, the Euclidean algorithm requires exactly n − 1 integer divisions (including the last
one that has a remainder of 0).
8. Let a, b, c ∈ Z. Prove that a|b implies that a|bc.
9. Perform the Extended Euclidean Algorithm on the three pairs of integers in Exercise 2.1.5.
10. Perform the Extended Euclidean Algorithm on the three pairs of integers in Exercise 2.1.6.
11. Suppose that a, b ∈ Z∗ and that s, t ∈ Z∗ such that sa+tb = gcd(a, b). Show that s and t are relatively
prime.
2.1. BASIC PROPERTIES OF INTEGERS 53

12. Consider the relation of “relatively prime” on Z∗ . Determine whether it is reflexive, symmetric,
antisymmetric, or transitive.
13. Let a, b, c be positive integers. Prove that gcd(ab, ac) = a gcd(b, c).
14. Let a and b be positive integers. Show that the set of common multiples of a and b is lcm(a, b)Z, i.e.,
the set of multiples of lcm(a, b).
15. Prove that any integer greater than 3 that is 1 less than a square cannot be prime.
16. Prove that if 2n − 1 is prime then n is prime. [Hint: Recall that for all real numbers,

an − bn = (a − b)(an−1 + an−2 b + an−3 b2 + · · · + bn−1 ).

Prime numbers of the form 2p − 1, where p is prime, are called Mersenne primes and have been
historically of great research interest. The converse implication is not true however. For example,
211 − 1 = 2047 = 23 × 89.]
17. Prove or disprove that p1 p2 · · · pn + 1 is a prime number where p1 , p2 , . . . , pn are the n smallest
consecutive prime numbers.
18. Prove that the product of two consecutive positive integers is even.
19. Prove that the product of four consecutive positive integers is divisible by 24.
20. Suppose that the prime factorizations of a and b are

a = pα 1 α2 αn
1 p2 · · · pn and b = pβ1 1 pβ2 2 · · · pβnn ,

with pi distinct primes and αi , βi ≥ 0.


min(α1 ,β1 ) min(α2 ,β2 )
(a) Prove that gcd(a, b) = p1 p2 · · · pmin(α
n
n ,βn )
.
max(α1 ,β1 ) max(α2 ,β2 ) max(αn ,βn )
(b) Prove that lcm(a, b) = p1 p2 · · · pn .
21. Use the result of Exercise 2.1.20 to prove (2.1) for positive integers a and b.
√ √
22. Prove that 2 is not a rational number. [Hint: Assume 2 = ab as a reduced fraction and argue by
contradiction.]

23. Prove that for all primes p and all integers k ≥ 2, the number k p is irrational. [Hint: See the previous
exercise.]
24. Determine all the nonzero ordp (n), defined in (2.4), for all primes p, where n is one of the following

(a) 450; (b) 392; (c) 2310; (d) 121212.

25. Find ord5 (200!). Use this to determine the number of 0s to the right in the decimal expansion of 200!.
26. Let p be a prime number. Prove that the function ordp : Q → Z defined in (2.4) satisfies the following
logarithmic-type properties.
(a) ordp (mn) = ordp (m) + ordp (n) for all m, n ∈ Z;
(b) ordp (mk ) = k ordp (m) for all m ∈ Z and k ∈ N∗ .
27. For the following integers, calculate ϕ(n) by directly listing the set in (2.5).

(a)φ(30); (b)φ(33); (c)φ(12).

28. Prove that for all integers n, X


n= φ(d),
d|n

where this summation notation means we sum over all positive divisors d of n. [Hint: Consider the
set of fractions n1 , n2 , n3 , . . . , n
n

written in reduced form.]
29. Prove that for any prime p, the following identities hold.
(a) φ(p) = p − 1
(b) φ(pk ) = pk − pk−1
30. Prove that if a and b are relatively prime, then Euler’s totient function satisfies φ(ab) = φ(a)φ(b).
31. Using Exercises 2.1.29 and 2.1.30, prove Proposition 2.1.27.
54 CHAPTER 2. NUMBER THEORY

2.2
Modular Arithmetic
In this section we assume that n represents an integer greater than or equal to 2.
One of the theorems often presented as a highlight in an introduction to modular arithmetic,
Fermat’s Little Theorem, dates as far back as to 1640. However, many properties of congruences, a
fundamental notion in modular arithmetic, appear in Leonhard Euler’s early work on number theory,
circa 1736 (see [9, p.131] and [57, p.45]). The modern formulation of congruences first appeared in
Gauss’ Disquisitiones Arithmeticae in 1801 [28]. He applied the theory of congruences to the study
of Diophantine equations, algebraic equations in which we look for only integer solutions.
We introduce modular arithmetic here for its fundamental value in number theory and because
modular arithmetic will provide relatively easy examples of groups.

2.2.1 – Congruence

Definition 2.2.1
Let a and b be integers. We say that a is congruent to b modulo n if n | (b − a) and we
write
a ≡ b (mod n).
If n is understood from context, we simply write a ≡ b. The integer n is called the modulus.

Proposition 2.2.2
The congruence modulo n relation on Z is an equivalence relation.

Proof. We assume n is fixed. For any integer a ∈ Z, we have n | (a − a) = 0 so the congruence


relation is reflexive.
Suppose that a ≡ b. Then n|(b − a), so there exists k ∈ Z with nk = b − a. Then n(−k) = a − b
and so n|(a − b) and hence b ≡ a. This shows that the congruence relation is symmetric.
Suppose that a ≡ b and b ≡ c. Then n | (b − a) and n | (c − b). By Proposition 2.1.6(2),

n | ((b − a) + (c − b)) so n | (c − a).

Hence, a ≡ c and we deduce that congruence is transitive. The result follows. 

Section 1.3.1 introduced notation that is standard for equivalence classes and quotient sets in the
context of generic equivalence relations. However, the congruence relation has such a long history,
that it carries its own notations.
When the modulus n is clear from context, we denote by a the equivalence class of a mod n and
call it the congruence class of a. If we consider the integer division of a by n with a = nq + r, we see
that n | a − r. Hence, r ∈ a. In fact, we can characterize the equivalence class of a in a few different
ways:

a = {b ∈ Z | b ≡ a (mod n)}
= {b ∈ Z | a and b have the same remainder when divided by n}
= {a + kn | k ∈ Z} = a + nZ.

It is important for applications of congruences to note that

a≡0 (mod n) ⇐⇒ a | n.
2.2. MODULAR ARITHMETIC 55

Instead of writing Z/ ≡ for the quotient set for the congruence relation, we always write
def
Z/nZ = {0, 1, . . . , n − 1}

for the set of equivalence classes modulo n. We pronounce this set as “Z mod n Z.”
Example 2.2.3. Suppose n = 15, then we have the equalities 2 = 17 = −13 and many others
because these numbers are all congruent to each other. We will also say that 2, 17, −13, are repre-
sentatives of the congruence class 2. 4

The set {0, 1, 2, . . . , n − 1} is not the only useful complete set of distinct representatives for
congruence modulo n. If n is odd, the set

n−1 n−1
 
− , . . . , −2, −1, 0, 1, 2, . . . ,
2 2

is another complete set of distinct representatives.


It is useful to recall Remark 1.3.11 in the context of functions f : Z/nZ → X, where X is any
set. If we define the value of f (a) as equal to F (a), for some function F : Z → X, we must check
that this is indeed a well-defined function. For example, suppose that we said “let f : Z/5Z → Z be
defined by f (a) = a2 .” This is not well-defined because, for example, 1̄ = 6̄ but 12 = 1 6= 62 = 36.
On the other hand, if we said f (a) is k 2 , where k is the remainder of a when divided by 5, then this
function is well-defined.

2.2.2 – Modular Arithmetic

Proposition 2.2.4
Fix a modulus n. Let a, b, c, d ∈ Z such that a ≡ c and b ≡ d. Then

a+b≡c+d and ab ≡ cd.

Proof. By definition, n | (c − a) and n | (d − b). Thus, there exist k, ` such that

c − a = nk (2.7)
d − b = n`. (2.8)

Adding these two expressions, we get

(d + c) − (b + a) = nk + n` = n(k + `).

This shows that n | (d + c) − (b + a) so a + b ≡ c + d.


To show the multiplication, multiply Equation (2.8) by c and subtract from it Equation (2.7)
multiplied by b. We obtain

c(d − b) − b(c − a) = cn` − bnk ⇐⇒ cd − ab = n(c` − bk).

This illustrates that n | (cd − ab), which means that ab ≡ cd (mod n). 

Corollary 2.2.5
Let n be a modulus and let a, b ∈ Z. If we define a + b (resp. a · b) as the set consisting of
all sums (resp. products) of an element from a with an element from b, then

a+b=a+b and a · b = a · b.
56 CHAPTER 2. NUMBER THEORY

Proof. That a + b ⊆ a + b and a · b ⊆ a · b as subsets of Z is obvious from how we temporarily define


the addition and multiplication of integer subsets. Proposition 2.2.4 provides the reverse inclusion.
Intuitively speaking, Proposition 2.2.4 shows that the congruence relation behaves nicely with
respect to congruence classes modulo n. In other words, the addition function
A : Z/nZ × Z/nZ → Z/nZ
(a, b) 7→ a + b
is well-defined as is the multiplication function.
Modular arithmetic modulo n is the arithmetic arising from the addition and multiplication
operations as defined in Corollary 2.2.5 on the set Z/nZ.
Example 2.2.6. To illustrate a few examples of modular arithmetic, we show the addition and
multiplication tables corresponding to Z/5Z and Z/6Z.
In Z/5Z = {0, 1, 2, 3, 4}, the tables of operations are:
+ 0 1 2 3 4 × 0 1 2 3 4
0 0 1 2 3 4 0 0 0 0 0 0
1 1 2 3 4 0 1 0 1 2 3 4 (2.9)
2 2 3 4 0 1 2 0 2 4 1 3
3 3 4 0 1 2 3 0 3 1 4 2
4 4 0 1 2 3 4 0 4 3 2 1
In Z/6Z = {0, 1, 2, 3, 4, 5}, the tables of operations are:
+ 0 1 2 3 4 5 × 0 1 2 3 4 5
0 0 1 2 3 4 5 0 0 0 0 0 0 0
1 1 2 3 4 5 0 1 0 1 2 3 4 5
2 2 3 4 5 0 1 2 0 2 4 0 2 4 (2.10)
3 3 4 5 0 1 2 3 0 3 0 3 0 3
4 4 5 0 1 2 3 4 0 4 2 0 4 2
5 5 0 1 2 3 4 5 0 5 4 3 2 1
The addition and multiplication tables for Z/5Z and Z/6Z display some similarities but also
some differences. The patterns in the addition tables are similar. In Z/5Z every nonzero element a
has a multiplicative inverse, i.e., some b such that ab = 1. However, in Z/6Z the nonzero elements
2, 3, and 4 do not have inverses. Furthermore, in Z/6Z, there exist nonzero elements a and b such
that ab = 0. 4
We introduce modular arithmetic here because it will offer examples of groups and rings when
we introduce these algebraic structures. More importantly for number theory, however, modular
arithmetic offers techniques to prove divisibility properties that would be more difficult to prove
otherwise.
Example 2.2.7. In this example, we consider divisibility of integers by 3. Hence, we work modulo
3.
Let n be a positive integer written in base 10 as n = bk bk−1 · · · b1 b0 , by which we mean that
n = bk 10k + bk−1 10k−1 + · · · b1 10 + b0 where 0 ≤ bi ≤ 9 for all bi .
We note that 10 ≡ 1 (mod 3). Then
102 ≡ 12 = 1, 103 ≡ 13 , ... 10k ≡ 1 for all k ∈ N.
Then
n ≡ bk 10k + bk−1 10k−1 + · · · b1 10 + b0 ≡ bk + bk−1 + · · · + b1 + b0 (mod 3).
Hence, an integer n has the same remainder when divided by 3 as the remainder of the sum of its
digits when divided by 3. In particular, an integer n is divisible by 3 if and only if the sum of its
digits is divisible by 3. 4
2.2. MODULAR ARITHMETIC 57

To use the term “arithmetic” connotes the ability to do addition, multiplication, subtract, and
division; to solve equations; and to study various properties among these operations. Subtraction of
two elements is defined
def
a − b = a + (−b),
where −b is the additive inverse of b. The additive inverse of b is an element c such that b + c =
b + c = 0. We can take −b = −b. If we use {0, 1, 2, . . . , n − 1} as the complete set of distinct
representatives, then we would write −b = n − b.
However, as the multiplication table for Z/6Z in Example 2.2.6 illustrates, there exist nonzero el-
ements that do not have multiplicative inverses. This is just one of the differences between arithmetic
in Z and Q and modular arithmetic.

2.2.3 – Units and Powers


In this section, we work in Z/nZ.

Definition 2.2.8
If a has a multiplicative inverse, it is called a unit. We denote the set of units in Z/nZ as

U (n) = {a ∈ Z/nZ | ∃c̄ ∈ Z/nZ, ac = 1}.

We denote the inverse of a by a−1 .

Proposition 2.2.9
As sets, U (n) = {a ∈ Z/nZ | gcd(a, n) = 1}.

Proof. Suppose that ac = 1. Then ac ≡ 1 (mod n) and so there exists k ∈ Z such that ac = 1 + kn.
Thus, ac − kn = 1. Hence, there is a linear combination of a and n that is 1. The number 1 is the
least positive integer so by Proposition 2.1.12, gcd(a, n) = 1. So far, this shows that if a is a unit
modulo n, then a is relatively prime to n.
To show the converse, suppose now that gcd(a, n) = 1. Then again by Proposition 2.1.12, there
exists s, t ∈ Z such that sa + tn = 1. Then sa = 1 − tn and so sa ≡ 1 (mod n). The proposition
follows. 

Corollary 2.2.10
The number of units in Z/nZ is |U (n)| = ϕ(n) (Euler’s totient function).

Proof. We use {0, 1, 2, . . . , n − 1} as a complete set of distinct representatives for congruence modulo
n. Euler’s totient function ϕ(n) gives the number of integers a ∈ {0, 1, 2, . . . , n − 1} such that
gcd(a, n) = 1. Hence, by Proposition 2.2.9, we have |U (n)| = ϕ(n). 

Note that it does not make sense to say, for example, that the inverse of 2 modulo 5 is 21 . The
fraction 21 is a specific element in Q. The following sentences are proper. In Q, 2−1 = 12 . In Z/5Z,
−1 −1
2 = 3. In Z/6Z, 2 does not exist.
Finding the inverse of a in Z/nZ is not easy, especially for large values of n. If n is small, then
we can find an inverse by inspection. The proof of Proposition 2.2.9 shows that s = a−1 in the linear
combination
sa + tn = 1,
which must hold for some integers s and t if a has an inverse modulo n. The Extended Euclidean
Algorithm described in Example 2.1.13 provides a method to find such s and t.
58 CHAPTER 2. NUMBER THEORY

Example 2.2.11. We look for the inverse of 79 in Z/123Z. We write the Euclidean Algorithm and,
to the right, the Extended Euclidean Algorithm applied to 123 and 79. (The following should be
read top to bottom down the left half and then bottom to top on the right half.)

123 = 79 × 1 + 44 1 = (123 − 79) × 9 − 79 × 5 = 123 × 9 − 79 × 14


79 = 44 × 1 + 35 1 = 44 × 4 − (79 − 44) × 5 = 44 × 9 − 79 × 5
44 = 35 × 1 + 9 1 = (44 − 35) × 4 − 35 × 1 = 44 × 4 − 35 × 5
35 =9×3+8 1 = 9 − (35 − 9 × 3) × 1 = 9 × 4 − 35 × 1
9 =8×1+1 1 =9−8×1
8 =1×8+0

According to Proposition 2.2.9, that the Euclidean Algorithm leads to gcd(123, 79) = 1 establishes
that 79 is a unit in Z/123Z. The identity 1 = 123 × 9 − 79 × 14 gives that 1 ≡ −14 × 79 ≡ 109 × 79
−1
(mod 123). Thus, in Z/123Z, we have 79 = 109. 4
−1 −1
Example 2.2.12. Let n = 13. We calculate 3 (6 − 11). First note that 3 = 9 because 3 × 9 =
27 ≡ 1 (mod 13). Thus,
−1
3 (6 − 11) = 9 × (6 − 11) = 9(−5) = −45 = −6 = 7. 4

Example 2.2.13. Consider the equation 3x + 7 ≡ 5 (mod 11). We can solve it by searching ex-
haustively through {0, 1, . . . , 10} to find which values of x solve it. Otherwise, we can solve it using
standard methods of algebra as follows.

3x + 7 ≡ 5 ⇐⇒ 3x ≡ 5 − 7 ≡ 9
⇐⇒ 4 × 3x ≡ 4 × 9 4 is the inverse of 3 modulo 11
⇐⇒ x ≡ 3

All integers that solve the congruence equation are x ≡ 3 (mod 11). 4

Example 2.2.14. Suppose we are in Z/15Z. We show how to solve the equation 7x + 10 = y.
Note first that 2 · 7 = 14 = −1. So −2 = 13 is the multiplicative inverse of 7 modulo 15. Now
we have
7x + 10 = y =⇒ 7x = y − 10 =⇒ −2 · 7x = −2(y − 10) =⇒ x = −2y + 20 = 13y + 5. 4

If an integer a is greater than 2 in absolute value, then in Z the absolute value of powers ak
increase without bound. However, since Z/nZ is a finite set, powers of elements in Z/nZ demonstrate
interesting patterns.
Example 2.2.15. We calculate the powers of 2 and 3 in Z/7Z.

k 0 1 2 3 4 5 6 7 8
k
2 1 2 4 1 2 4 1 2 4
k
3 1 3 2 6 4 5 1 3 2

We notice that the powers follow a repeating pattern. This is because if a ∈ Z/nZ and ak = ak+l ,
then
ak+2l = ak+l al = ak al = ak+l = ak
and, by induction, we can prove that ak+ml = ak for all m ∈ N. Observing the pattern for 3, we see
for example that, in congruences,

33201 ≡ 36×533+3 ≡ (36 )533 · 33 ≡ 33 ≡ 6 (mod 7).

Therefore, using congruences, we have easily calculated the remainder of 33201 when divided by 7,
without ever calculating 33201 , which has blog10 (33201 )c + 1 = b3201 log10 3c = 1, 528 digits. 4
2.2. MODULAR ARITHMETIC 59

Some of the patterns in powers of a number in a given modulus are not always easy to detect.
The following theorem gives a general result.

Theorem 2.2.16 (Fermat’s Little Theorem)


Let p be a prime number and a an integer with p - a. Then

ap−1 ≡ 1 (mod p).

Proof. By Proposition 2.2.9 and since a prime p is relatively prime to any positive integer less than
itself, then p - a if and only if a is a unit modulo p. Note that |U (p)| = p − 1.
Consider the sequence of congruence classes a, a2 , a3 , . . . This infinite sequence stays in (Z/pZ)×
and so the sequence must repeat some terms. Thus, there exist i, j ∈ N such that aj = ai . Then
multiplying both sides by (a−1 )i , we get

aj (a−1 )i = 1 ⇐⇒ aj−i = 1.

Let k be the smallest positive integer such that ak = 1. Note that the elements {1, a, a2 , . . . , ak−1 }
are all distinct.
Define the equivalence relation on U (p) by b ∼ c to mean b = caj for some j ∈ Z. Each equivalence
class consists of the form {c, ca, ca2 , . . . , cak−1 } for some c and in particular has k elements. Since
equivalence classes partition U (p), the fraction (p − 1)/k counts the number of equivalence classes
and hence is an integer. Thus, k | (p − 1) and hence,

ap−1 = 1. 

As a point of notation, the notation Z/nZ comes from quotient sets (discussed in Section 1.3.1)
and is consistent with quotient groups (discussed in Section 4.3). However, if p is a prime number,
then Z/pZ has the special structure of a field. (See Definition 5.1.22.) Because of the particular
importance of modular arithmetic over a prime, we also denote Z/pZ by Fp .

Exercises for Section 2.2


1. List ten elements in the conjugacy class 3 in modulo 7.
2. List all the elements in Z/13Z.
3. List all the elements in Z/24Z and in U (24).
4. Perform the following calculations in the modular arithmetic of the given modulus n.
(a) 3̄ + 5̄ · 7̄ with n = 9
(b) (5̄ · 4̄ − 72 · 3̄)2 with n = 11
(c) 13 · 42 · 103 with n = 15
5. Write out the elements in the set U (30).
6. In Z/17Z, solve for x in terms of y in y = 2̄x + 3̄.
7. In Z/29Z, solve for x in terms of y in y = 17x + 20.
8. Show that for all integers a, we have a2 ≡ 0 or 1 (mod 4). Show how this implies that for all integers
a, b ∈ Z, the sum of squares a2 + b2 never has a remainder of 3 when divides by 4.
9. Prove that if d|m and a ≡ b (mod m), then a ≡ b (mod d).
10. Prove that if a, b, c, and m are integers with m ≥ 2 and c > 0, then a ≡ b (mod m) implies that
ac ≡ bc (mod mc).
−1
11. Perform the Extended Euclidean Algorithm to calculate 52 in Z/101Z.
−1
12. Perform the Extended Euclidean Algorithm to calculate 72 in Z/125Z.
13. Find the smallest positive integer n such that 2n ≡ 1 (mod 17).
14. Find the smallest positive integer n such that 3n ≡ 1 (mod 19).
60 CHAPTER 2. NUMBER THEORY

15. Show that the powers of 7 in Z/31Z account for exactly half of the elements in U (31).
16. Show that a number is divisible by 11 if and only if the alternating sum of its digits is divisible by
11. (An alternating sum means that we alternate the signs in the sum + − + − . . ..) [Hint: 10 ≡ −1
(mod 11).]
17. Prove that if n is odd then n2 ≡ 1 (mod 8).
18. Show that the difference of two consecutive cubes (an integer of the form n3 ) is never divisible by 3.
19. Use Fermat’s Little Theorem to determine the remainder of 734171 modulo 13.
20. Find the units digit of 78357 .
21. Let {bn }n≥1 be the sequence of integers defined by b1 = 1, b2 = 11, b3 = 111, and in general
n digits
z }| {
bn = 111 · · · 1.

Prove that for all prime numbers p different from 2 or 5, there exists a positive n such that p | bn .
22. Show that 3 | n(n + 1)(n + 2) for all integers n.
p

23. Let p be a prime. Prove that p divides the binomial coefficient k
for all k with 1 ≤ k ≤ p − 1. Use
the binomial theorem to conclude that

(a + b)p ≡ ap + bp (mod p) (2.11)

for all integers a, b ∈ Z.


24. Let n1 and n2 be relatively prime. Prove that

x ≡ a1 (mod n1 ) and x ≡ a2 (mod n2 ) ⇐⇒ x ≡ a1 t1 n2 + a2 t2 n1 (mod n1 n2 )

where t1 ≡ n−1
2 (mod n1 ) and t2 ≡ n−1
1 (mod n2 ).
25. Apply the result of Exercise 2.2.24 to solve the system
(
x ≡ 2 (mod 9)
x ≡ 4 (mod 11).

26. Apply the result of Exercise 2.2.24 to solve the system


(
x ≡ 15 (mod 17)
x ≡ 10 (mod 169).

m
27. Prove that if ac ≡ bc (mod m) then a ≡ b (mod d
) where d = gcd(m, c).
28. Consider the sequence of integers {cn }n≥0 defined by

c0 = 1, c1 = 101, c2 = 10101, c3 = 1010101, ...

Prove that for all integers n ≥ 2, the number cn is composite.

2.3
Mathematical Induction
2.3.1 – Weak Induction
In logic, a predicate P (x) is a statement that is true or false depending on the specific instance of
the variable x. For example, the algebraic expression “x ≥ 1” is a predicate because it is neither
true or false in itself but has a truth value that depends on the numerical value of x. We say that a
predicate P (x) is instantiated when x is given a value.
2.3. MATHEMATICAL INDUCTION 61

Figure 2.1: Falling dominoes illustrating the principle of induction

Theorem 2.3.1 (Principle of Induction)


Let P (n) be a predicate about integers. Suppose that P (n0 ) is true for some integer n0 and
that for all n ≥ n0 , if P (n) then P (n + 1). Then the statement P (n) is true for all n ≥ n0 .

Proof. Without loss of generality, we prove Theorem 2.3.1 with the assumption that n0 = 0. A
linear shift in the meaning of the predicate P then establishes the general statement with arbitrary
n0 .
Let P (n) be a predicate on the integers such that P (0) is true and such that P (n) implies
P (n + 1). Let S = {n ∈ N | P (n) is true} and consider the set S = N − S.
Suppose that S 6= ∅. Then by the well-ordering property, S contains a least element m. Note
that m 6= 0 because 0 ∈ S. Then m − 1 ∈ S so P (m − 1) is true. But then P (m) = P ((m − 1) + 1)
is true, so m ∈ S. This is a contradiction. Hence, S = ∅, S = N and P (n) is true for all n ≥ 0. 

Section 2.1.1 pointed out that there are a few different but ultimately equivalent sets of axioms
for the integers. Some formulations, for example Peano’s axioms for the integers [59, Section 2.1],
have the principle of induction as an axiom and prove the well-ordered property on the nonnegative
integers as a theorem.
A common mental image for induction is a chain of dominoes that begins at a certain spot but
continues ad infinitum. Suppose that the n0 th domino falls and suppose also that if one domino falls
then the subsequent one falls. Then the n0 th domino and all dominoes after it fall. In Figure 2.1,
the first domino to fall is the second one, but all subsequent dominoes fall as well.
The Principle of Induction as stated above is also called weak induction in contrast to strong
induction, discussed in Section 2.3.2. In an induction proof where n0 is given, we call the step
of proving P (n0 ) the basis step. The basis step is usually easy, especially when it requires just a
calculation check. The part of an induction proof that involves proving P (n) → P (n + 1) for all
n ≥ n0 is called the induction step. During the induction step, one commonly refers to P (n) as the
induction hypothesis.

Example 2.3.2. Prove that for all positive integers n,


n
X n(n + 1)(2n + 1)
i2 = . (2.12)
i=1
6

Set n = 1. The left-hand side of (2.12) is 12 = 1 while the right-hand side is 1×2×3
6 = 1. The formula
holds. (This is the basis step.) Now suppose that (2.12) is true for some n ≥ 1. Then

n+1 n
!
X
2
X
2 n(n + 1)(2n + 1)
i = i + (n + 1)2 = + (n + 1)2 ,
i=1 i=1
6
62 CHAPTER 2. NUMBER THEORY

where the last equality holds by the induction hypothesis. Then,


n+1
X n+1 n+1 (n + 1)(n + 2)(2n + 3)
i2 = (n(2n + 1) + 6(n + 1)) = (2n2 + 7n + 6) =
i=1
6 6 6
(n + 1) ((n + 1) + 1) (2(n + 1) + 1)
= .
6
This proves the induction step. Hence, by induction (2.12) is true for all integers n ≥ 1. 4
Example 2.3.3 (Fibonacci Numbers). Consider the sequence of Fibonacci numbers defined by
f0 = 0, f1 = 1 and satisfying fn+2 = fn+1 + fn for all n ≥ 0. The first few terms of the sequence
are as follows.
n 0 1 2 3 4 5 6 7 8 9 10 11
fn 0 1 1 2 3 5 8 13 21 34 55 89
The Fibonacci sequence has many interesting properties. In this example, we prove that if 5 | n,
then 5 | Fn .
We need to rephrase the problem in a manner that is suitable for an induction proof. We propose
to prove that 5 divides f5k for all nonnegative integers k. The basis step occurs with k = 0, in which
5 certainly divides f0 = 0. Now, suppose that 5 | f5k for some nonnegative integer k. Then f5k = 5m
for some integer m. Furthermore,
f5(k+1) = f5k+5 = f5k+4 + f5k+3
= (f5k+3 + f5k+2 ) + f5k+3 = 2f5k+3 + f5k+2
= 2(f5k+2 + f5k+1 ) + f5k+2 = 3f5k+2 + 2f5k+1
= 3(f5k+1 + f5k ) + 2f5k+1 = 5f5k+1 + f5k
= 5f5k+1 + 5m = 5(f5k+1 + m).
Hence, 5 divides f5(k+1) . By induction on k, the Fibonacci number f5k is divisible by 5 for all k,
which we can restate as 5|n =⇒ 5|fn . 4
Example 2.3.4. We prove that 2n ≥ n2 for all n ≥ 4. For the basis step, we notice that 24 = 16
and 42 = 16 so equality holds.
For the induction step, we assume that for some n ≥ 4, the inequality 2n ≥ n2 holds. Then
2n+1
= 2 · 2n ≥ 2n2 by the induction hypothesis. We would be done if we knew that 2n2 ≥ (n + 1)2 .
We consider the following logical equivalences
2n2 ≥ (n + 1)2 ⇐⇒ 2n2 − (n2 + 2n + 1) ≥ 0 ⇐⇒ n2 − n − 1 ≥ 0 ⇐⇒ (n − 1)2 − 2 ≥ 0
√ √ √ √
⇐⇒ n − 1 ≤ − 2 or n − 1 ≥ 2 ⇐⇒ n ≤ 1 − 2 or n ≥ 2 + 1. (2.13)

Since we are under the assumption that n ≥ 4, then n ≥ 1 + 2 and hence 2n2 ≥ (n + 1)2 . We
conclude that
2n+1 ≥ (n + 1)2
and, by induction, we conclude that 2n ≥ n2 for all n ≥ 4.
By checking the validity of the inequality for n = 0, 1, 2, 3, we find that the inequality holds for
all integers n 6= 3. If we had attempted to establish the result for n ≥ 0 with a proof by induction,
then the necessary inequality (2.13) would have shown that P (n) → P (n + 1) fails if n = 1 or 2. 4

2.3.2 – Strong Induction

Theorem 2.3.5 (Principle of Strong Induction)


Let Q(n) be a predicate about integers. Suppose that Q(n0 ) is true for some integer n0
and that Q(k) for all k with n0 ≤ k ≤ n implies Q(n + 1). Then the statement Q(n) is true
for all n ≥ n0 .
2.3. MATHEMATICAL INDUCTION 63

At first glance, the principle of strong induction appears more powerful than the first principle
of induction. The conjunctive statement
Q(n0 ) and Q(n0 + 1) and · · · and Q(n)
is false not only when Q(n) is false but when any of the other instantiated predicates are false. A
conditional statement p → q, meaning “if p then q” or “p implies q,” is false when p is true but q is
false and is true otherwise. Hence, the conditional statement
 
Q(n0 ) and Q(n0 + 1) and · · · and Q(n) implies Q(n + 1)

will be false if all of the Q(k) with n0 ≤ k ≤ n are true and Q(n + 1) is false, whereas
Q(n) implies Q(n + 1)
is false only when Q(n) is true and Q(n + 1) is false. Thus, the induction step of strong induction
is less likely to occur than the induction step of weak induction.
However, we can see that the strong induction and weak induction are in fact the same. If the
induction hypothesis holds in weak induction, then the strong induction hypothesis holds. On the
other hand, by setting the predicate P (n) to be “Q(k) is true for all integers k with n0 ≤ k ≤ n,”
we see that the principle of strong induction is simply a problem in weak induction.
Example 2.3.6. Consider the sequence {an }n≥0 defined by a0 = 0, a1 = 1, and an+2 = an+1 + 2an
for all n ≥ 0. We prove that for all n ≥ 0,
1 n
an = (2 − (−1)n ) . (2.14)
3
Notice that (2.14) holds for n = 0 and n = 1. We will use n = 1 as our basis step. For the induction
step, assume that (2.14) is true for all indices k with 0 ≤ k ≤ n. Note that if n = 0, the induction
step is true because Q(0) and Q(1) is true so the induction hypothesis is true. Hence, a general
proof is needed only for n ≥ 1. According to the induction hypothesis,
1 n 1 n−1
(2 − (−1)n ) − (−1)n−1 .

an = and an−1 = 2
3 3
Thus,
1 n 2 n−1
(2 − (−1)n ) + − (−1)n−1

an+1 = an + 2an−1 = 2
3 3
1 n n−1
 1
= 2 +2·2 − (−1) − 2(−1)n−1 = (2n + 2n − (−1)n + 2(−1)n )
n
3 3
1 n+1 n
 1 n+1 n+1

= 2 + (−1) = 2 − (−1) .
3 3
This proves the induction hypothesis and hence, by strong induction, (2.14) is true for all n ≥ 0. 4
In Example 2.3.6, the proof of the induction hypothesis only required using the formula for Q(n)
and Q(n − 1) to establish Q(n + 1). However, since the proof used more than just the one previous
step Q(n) to establish Q(n + 1), it is not a (weak) induction proof.
The following proposition is important in its own right for future sections. We provide it here
because its proof relies on strong induction.

Proposition 2.3.7
Let S be a set and let ? be a binary operation on S that is associative. In an operation
expression with a finite number of terms,

a1 ? a2 ? · · · ? an with n ≥ 3, (2.15)

all possible orders in which we pair operations (i.e., parentheses orders) are equal.
64 CHAPTER 2. NUMBER THEORY

Proof. Before starting the proof, we define a temporary but useful notation. Given a sequence
X
a1 , a2 , . . . , ak of elements in S, by analogy with the notation, we define

def
Fki=1 ai = (· · · ((a1 ? a2 ) ? a3 ) · · · ak−1 ) ? ak .

In this notation, we perform the operations in (2.15) from left to right. Note that if k = 1, the
expression is equal to the element a1 .
We prove by (strong) induction on n, that every operation expression in (2.15) is equal to Fni=1 ai .
The basis step with n ≥ 3 is precisely the assumption that ? is associative.
We now assume that the proposition is true for all integers k with 3 ≤ k ≤ n. Consider an
operation expression (2.15) involving n + 1 terms. Suppose without loss of generality that the last
operation performed occurs between the jth and (j + 1)th term, i.e.,
j terms n−j terms
z }| { z }| {
q = (operation expression1 ) ? (operation expression2 ).

Since both operation expressions involve n terms or less, by the induction hypothesis
 
q = Fji=1 ai ? Fni=j+1 ai .


Furthermore,
 
q = Fji=1 ai ? aj+1 ? Fni=j+2 ai

by the induction hypothesis
  
= Fji=1 ai ? aj+1 ? Fni=j+2 ai

by associativity
 
= Fj+1 n

i=1 ai ? Fi=j+2 ai .

Repeating this n − j − 2 more times, we conclude that

q = Fn+1
i=1 ai .

The proposition follows. 

2.3.3 – Recursive Set Definitions


We conclude this section with a few paragraphs on a recursive manner to define subsets of a given
set. This method occurs frequently in later chapters so we present the idea here not as a curio but
as a precursor to a common practice. We introduce the concept of a recursive set definition with an
example.
Example 2.3.8. Let a, b ∈ Z and define the subset S ⊆ Z as follows: 0 ∈ S and if x ∈ S, then so
are x − a, x + a, x − b, x + b. We will prove that S = gcd(a, b)Z, i.e., all multiples of gcd(a, b).
We first show (by induction) that all multiple sa ∈ S for all s ∈ Z. Obviously, this is true
when s = 0. Suppose that sa and −sa are in S for some s ∈ N. Then sa + a = (s + 1)a ∈ S and
−sa − a = (−s − 1)a ∈ S. Hence, by induction on s in N, all integers of the form sa with s ∈ Z are
in S.
We now show that all integers of the form sa + tb, for s, t ∈ Z are in S. Given any s, we know
that sa ∈ S. Now suppose that sa + tb and sa − tb for some t ∈ N are in S. Then by the recursive
definition of S,

(sa + tb) + b = sa + (t + 1)b ∈ S and (sa − tb) − b = sa − (t + 1)b ∈ S.

Hence, by induction on t in N, all integers of the form sa + tb with s, t ∈ Z are in S. By Propo-


sition 2.1.12, the set of integers of the form sa + tb with s, t ∈ Z is precisely gcd(a, b)Z. We have
shown that gcd(a, b)Z ⊆ S.
2.3. MATHEMATICAL INDUCTION 65

We show the reverse inclusion S ⊆ gcd(a, b)Z as follows. Note that 0 ∈ gcd(a, b)Z. Furthermore,
if x ∈ gcd(a, b)Z, then by properties of divisibility, all four integers x + a, x − a, x + b, x − b will be
divisible by gcd(a, b). Hence, every integer in S satisfies the property of being divisible by gcd(a, b),
and thus S ⊆ gcd(a, b)Z.
We can now conclude that S = gcd(a, b)Z . 4

We should note that the definition of S in Example 2.3.8 differs from both of the standard ways
to define sets as presented in Section 1.1.1. A recursive definition of a subset S in a context set U
neither explicitly lists all the elements of S nor provides a property that can be immediately tested
on each element of U . Instead a recursive definition contains a basis step and a recursive step of the
following form.
Basis Step. a1 , a2 , . . . , ak ∈ S (for some specific elements in U );
Recursion Step. if x1 , x2 , . . . , xm ∈ S, then

f1 (x1 , x2 , . . . , xm ), f2 (x1 , x2 , . . . , xm ), . . . , fn (x1 , x2 , . . . , xm )

are also in S, where fi are functions U m → U .


From a recursive definition of a set, it is sometimes difficult to provide a nonrecursive definition
of the set and thereby decide whether an element is in or not in the set. In fact, the careful reader
may wonder whether a recursive definition of a subset S of a set U is well-defined, namely always
determines one specific subset of U . Recursive definitions of subsets are indeed well-defined in the
following sense. Let

C = {A ∈ P(U ) | a1 , a2 , . . . , ak ∈ A and A satisfies the recursion step}.

Then the recursively defined set S is \


S= A,
A∈C

which we can also think of as the smallest set (by inclusion) that satisfies both the basis step and
the recursion step of the recursive definition.

Exercises for Section 2.3


n
X n2 (n + 1)2
1. Prove that i3 = for all integers n ≥ 1.
i=1
4
2. Use mathematical induction to prove the geometric summation formula:
n
X A(rn+1 − 1)
Ari = where r 6= 1
i=0
r−1

for all nonnegative integers n.


3. Prove that for every positive integer n,
n(n + 1)(n + 2)
1 · 2 + 2 · 3 + · · · + n(n + 1) = .
3

4. Prove that 1 + nh ≤ (1 + h)n that for all h ≥ −1 and for nonnegative integers n ≥ 0.
5. Prove that 5|(n5 − n) for all nonnegative integers n in the following two ways:
(a) Using Fermat’s Little Theorem.
(b) By induction on n.
n
X
6. Prove by induction that (2i + 1) = (n + 1)2 .
i=0
66 CHAPTER 2. NUMBER THEORY

In Exercises 2.3.7 through 2.3.11, let {fn }n≥0 be the sequence of Fibonacci numbers.
7. Prove that fn−1 fn+1 − fn2 = (−1)n for all n ≥ 1.
8. Prove that fn2 + fn−1
2
= f2n−1 for all n ≥ 1.
Xn
9. Prove that fi = fn+2 − 1.
i=0

10. Prove 2 | fn if and only if 3 | n.


Xn
11. Prove that fi2 = fn fn+1 .
i=0
12. Prove that for all real numbers r 6= 1,
n
X ((r − 1)n − 1) rn+1 + r
krk = .
(r − 1)2
k=0

n+1 2n−1
13. Prove that 13 divides 3 +4 for all n ≥ 1.
14. A set of lines in the plane is said to be in general position if no two lines are parallel and no three lines
intersect at a single point. Prove that for any set {L1 , L2 , . . . , Ln } of lines in R2 in general position,
the complement R2 − (L1 ∪ L2 ∪ · · · ∪ Ln ) consists of (n2 + n + 2)/2 disjoint regions in the plane.
1 1 1
15. Let Hn = 1 + + + · · · + be the nth harmonic number. Prove that H2n ≤ 1 + n.
2 3 n
16. Prove that n! ≤ nn−1 for all positive integers n.
Xn
17. Show that i(i!) = (n + 1)! − 1.
i=1
18. Show that any amount of postage of value 48 cents or higher can be formed using just 5-cent and
12-cent stamps.
19. Let A1 , A2 , . . . , An and B be sets. Use mathematical induction to prove that

(A1 − B) ∩ (A2 − B) ∩ · · · (An − B) = (A1 ∩ A2 ∩ · · · ∩ An ) − B.


1
20. Let α be any real number such that α + α
∈ Z. Prove that for all nonnegative integers n,
1
αn + ∈ Z.
αn

2.4
Projects
Project I. Sums of Powers and Divisibility. Revisit Exercise 2.3.13. Attempt to generalize
this result in as many ways as possible.

Project II. A Diophantine Equation. It is not hard to show that 2 is not a rational number.
(See Exercise 2.1.22.) In other words, the equation

a2 − 2b2 = 0

has no roots for a and b as nonzero integers. In this project, consider instead a modified
equation. Attempt to find integer solutions to

a2 − 2b2 = 1 or a2 − 2b2 = −1 or |a2 − 2b2 | = 1.

Try to find some individual solutions. It is known that there exists an infinite number of
solutions to any of the three equations. Attempt to find an infinite number of solutions by
2.4. PROJECTS 67

giving a pattern that generates some. Try to prove that you have all of them. Consider other
modifications to the problem as you deem interesting.
[Note: A Diophantine equation is an equation in which we attempt to find all the solutions of
an equation where the variables are assumed to take on only integer solutions.]
Project III. Prime Factorization. Without consulting other sources, use the theorems in this
section to write an algorithm to obtain the prime factorization of a positive integer. Attempt
to make your algorithm the most time-efficient as possible and argue why it is time-efficient.
Project IV. Properties of ϕ(2n − 1). Try to discern patterns in the sequence of numbers ϕ(2n −
1). If you find pattern attempt to generalize with numbers of the form ϕ(an − 1). If you find
conjectures attempt to prove them. Can you generalize your results even further?

Project V. Strong Divisibility Sequences. A strong divisibility sequence of positive integers


is a sequence (an )n≥1 of positive integers satisfying

gcd(am , an ) = agcd(m,n) for all m, n ≥ 1.

(1) Show that the Fibonacci sequence is a strong divisibility sequence.


(2) Are there any geometric sequences or arithmetic sequences that are strong divisibility
sequences?
(3) Consider sequences (an )n≥1 defined by fixing a0 and a1 and which satisfy

an = Aan−1 + Ban−2 (2.16)

for some integers A and B. The relation (2.16) is called a second-order recurrence relation.
Try to find other sequences defined by second-order recurrence relations that are strong
divisibility sequences.
(4) Try to find an infinite family of strong divisibility sequences that are second-order recur-
rence relations.
3. Groups

As a field in mathematics, group theory did not develop in the order that this and subsequent
chapters follow. The first definition of a group is generally credited to Evariste Galois. As he
studied methods to find the roots of polynomials of high degree, he considered polynomials with
symmetries in their roots and functions on the set of roots that preserved those symmetries. Galois’
approach to studying polynomials turned out to be exceedingly fruitful and this book covers Galois
theory in Chapter 11.
As the dust settled and mathematicians separated the concept of a group from Galois’ application,
mathematicians realized two things. First, groups occur naturally in many areas of mathematics.
Second, group theory presents many challenging problems and profound results in its own right.
Like modern calculus texts that slowly lead to the derivative concept after a rigorous definition of
the limit, we begin with the definition of a group and methodically prove results with a view towards
as many applications as possible, rather than a single application.
In comparison to some algebraic structures presented later in the book, groups are particularly
easy to define. Whereas a poset involved a set and a relation with certain properties, a group involves
a set and one binary operation with certain properties. It may come as a surprise to some readers
that despite the brevity of the definition of a group, group theory is a vast branch of mathematics
with applications to many other areas.
In Section 3.1 we precede a general introduction to groups with an interesting example from
geometry. Section 3.2 introduces the axioms for groups and presents many elementary examples.
Section 3.3 presents some elementary properties of groups and introduces the notion of a classification
theorem. Section 3.4 introduces the symmetric group, a family of groups that plays a central role
in group theory.
Section 3.5 studies subgroups, how to describe them or how to prove that a given subset is a
subgroup, while Section 3.6 borrows from the Hasse diagrams of a partial order to provide a visual
representation of subgroups within a group. Section 3.7 introduces the concept of a homomorphism
between groups, functions that preserve the group structure. Section 3.8 introduces a particular
method to describe the content and structure of a group and introduces the fundamental notion of
a free group.
The last three sections are optional to a concise introduction to group theory. Sections 3.9
and 3.10 provide two applications of group theory, one to patterns in geometry and the other to
information security. Finally, Section 3.11 introduces the concept of a monoid, offering an example
of another not uncommon algebraic structure, similar to groups but with fewer properties.

3.1
Symmetries of the Regular n-gon
3.1.1 – Dihedral Symmetries
Let n ≥ 3 and consider a regular n-sided polygon, Pn . Recall that by regular we mean that all the
edges of the polygon are of same length and all the interior angles of edges meeting at a vertex are
equal. The set of vertices V = {v1 , v2 , . . . , vn } of the regular polygon Pn is a subset of the Euclidean
plane R2 . For simplicity, assume that the center of Pn is at the origin and that one of its vertices is
on the x-axis.
A symmetry of a regular n-gon is a bijection σ : V → V that is the restriction of a bijection
F : R2 → R2 , that leaves the overall vertex-edge structure of Pn in place, i.e., if the pair (vi , vj ) is
an edge of the regular n-gon then the pair (σ(vi ), σ(vj )) is also an edge of the n-gon.

69
70 CHAPTER 3. GROUPS

Consider, for example, a regular hexagon P6 and the bijection σ : V → V such that σ(v1 ) = v2 ,
σ(v2 ) = v1 and σ stays fixed on all the other vertices. Then σ is not a symmetry of P6 . Figure 3.1
shows that σ fails to preserve the vertex-edge structure of the hexagon because for example, the
segment joining σ(v2 ) and σ(v3 ) is not an edge of the original hexagon.

v3 v2 σ(v3 ) σ(v1 )
σ

v1 σ(v2 )
v4 σ(v4 )

v5 v6 σ(v5 ) σ(v6 )

Figure 3.1: Not a hexagon symmetry

In contrast, consider the bijection τ on the vertices of the regular hexagon such that
τ (v1 ) = v2 , τ (v2 ) = v1 , τ (v3 ) = v6 , τ (v4 ) = v5 , τ (v5 ) = v4 , τ (v6 ) = v3 .
This bijection on the vertices is a symmetry of the hexagon because it preserves the edge structure of
the hexagon. Figure 3.2 shows that τ can be realized as the reflection through the line L as drawn.

v3 v2 τ (v6 ) τ (v1 )
L τ

v1 τ (v2 )
v4 τ (v5 )

v5 v6 τ (v4 ) τ (v3 )

Figure 3.2: A reflection symmetry of the hexagon

Definition 3.1.1
We denote by Dn the set of symmetries of the regular n-gon and call it the set of dihedral
symmetries.

To count the number of bijections on the set V = {v1 , v2 , . . . , vn }, we note that a bijection
f : V → V can map
f (v1 ) to any element in V,
f (v2 ) to any element in V − {f (v1 )},
..
.
f (vn ) to any element in V − {f (v1 ), f (v2 ), . . . , f (vn−1 )}.
Hence, there are
n × (n − 1) × (n − 2) × · · · × 2 × 1 = n!
3.1. SYMMETRIES OF THE REGULAR N-GON 71

distinct bijections on V . However, a symmetry σ ∈ Dn can map σ(v1 ) to any element in V (n


options) but then σ must map v2 to a vertex adjacent to σ(v1 ) (2 options). Once σ(v1 ) and σ(v2 )
are chosen, all σ(vi ) for 3 ≤ i ≤ n are determined. For example, σ(v3 ) must be the vertex adjacent
to σ(v2 ) that is not σ(v1 ); σ(v4 ) must be the vertex adjacent to σ(v3 ) that is not σ(v2 ); and so on.
This reasoning leads to the following proposition.

Proposition 3.1.2
The cardinality of Dn is |Dn | = 2n.

It is not difficult to identify these symmetries by their geometric meaning. Half of the symmetries
in Dn are rotations that shift vertices k spots counterclockwise, for k ranging from 0 to n − 1. We
denote by Rα the rotation symmetry of angle α. The rotation R2πk/n of angle 2πk/n on the set of
vertices, performs
R2πk/n (vi ) = v((i−1+k) mod n)+1 for all 1 ≤ i ≤ n.
Note that R0 is the identity function on V .
As remarked above, regular n-gons possess symmetries that correspond to reflections through
lines that pass through the center of the polygon. Assuming the regular n-gon is centered at the origin
and with a vertex on the x-axis, then there are n distinct reflection symmetries, each corresponding
to a line through the origin and making an angle of πk/n with the x-axis, for 0 ≤ k ≤ n − 1. The
reflection symmetry in Figure 3.2 is through a line that makes an angle of π/6 with respect to the
x-axis. We denote by Fβ the reflection symmetry through the line that makes an angle of β with
the x-axis.
Since |Dn | = 2n, the rotations and reflections account for all dihedral symmetries.
If two bijections on V preserve the polygon structure, then their composition does as well. Con-
sequently, the function composition of two dihedral symmetries is again another dihedral symmetry
and thus ◦ is a binary operation on Dn . However, having listed the dihedral symmetries as rota-
tions or reflections, it is interesting to determine the result of the composition of two symmetries as
another symmetry.
First, it is easy to see that rotations compose as follows:

Rα ◦ Rβ = Rα+β ,

where we subtract 2π from α + β if α + β ≥ 2π. However, the composition of a given rotation and
a given reflection or the composition of two reflections is not as obvious. There are various ways
to calculate the compositions. The first is to determine how the function composition acts on the
vertices. For example, let n = 6 and consider the compositions of R2π/3 and Fπ/6 .

vi v1 v2 v3 v4 v5 v6
R2π/3 (vi ) v3 v4 v5 v6 v1 v2
Fπ/6 (vi ) v2 v1 v6 v5 v4 v3
(Fπ/6 ◦ R2π/3 )(vi ) v6 v5 v4 v3 v2 v1
(R2π/3 ◦ Fπ/6 )(vi ) v4 v3 v2 v1 v6 v5

From this table and by inspection on how the compositions act on the vertices, we determine that

Fπ/6 ◦ R2π/3 = F5π/6 and R2π/3 ◦ Fπ/6 = Fπ/2 .

We notice with this example that the composition operation is not commutative.
Another approach to determining the composition of elements comes from linear algebra. Ro-
tations about the origin by an angle α and reflections through a line through the origin making an
angle β with the x-axis are linear transformations. With respect to the standard basis, these two
types of linear transformations respectively correspond to the following 2 × 2 matrices
   
cos α − sin α cos 2β sin 2β
Rα : and Fβ : . (3.1)
sin α cos α sin 2β − cos 2β
72 CHAPTER 3. GROUPS

For example, let n = 6 and consider the composition of Fπ/6 and Fπ/3 . The composition
symmetry corresponds to the the matrix product

cos π3 sin π3 cos 2π sin 2π


  
Fπ/6 ◦ Fπ/3 : 3 3
sin π − cos π3 sin 2π − cos 2π
 3π 3 3
cos 3 cos 3 + sin 3 sin 3 cos 3 sin 2π
2π π 2π π π 2π
cos π3 sin π3
  
= 3 − sin 3 cos 3 = .
sin π3 cos 2π π 2π
3 − cos 3 sin 3 sin π3 sin 2π π 2π
3 + cos 3 cos 3 − sin π3 cos π3

This matrix corresponds to a rotation and shows that

Fπ/6 ◦ Fπ/3 = R−π/3 = R5π/3 .

Whether we use one method or the other, the following table gives the composition a ◦ b for the
symmetries of the hexagon.
a\b R0 Rπ/3 R2π/3 Rπ R4π/3 R5π/3 F0 Fπ/6 Fπ/3 Fπ/2 F2π/3 F5π/6
R0 R0 Rπ/3 R2π/3 Rπ R4π/3 R5π/3 F0 Fπ/6 Fπ/3 Fπ/2 F2π/3 F5π/6
Rπ/3 Rπ/3 R2π/3 Rπ R4π/3 R5π/3 R0 Fπ/6 Fπ/3 Fπ/2 F2π/3 F5π/6 F0
R2π/3 R2π/3 Rπ R4π/3 R5π/3 R0 Rπ/3 Fπ/3 Fπ/2 F2π/3 F5π/6 F0 Fπ/6
Rπ Rπ R4π/3 R5π/3 R0 Rπ/3 R2π/3 Fπ/2 F2π/3 F5π/6 F0 Fπ/6 Fπ/3
R4π/3 R4π/3 R5π/3 R0 Rπ/3 R2π/3 Rπ F2π/3 F5π/6 F0 Fπ/6 Fπ/3 Fπ/2
R5π/3 R5π/3 R0 Rπ/3 R2π/3 Rπ R4π/3 F5π/6 F0 Fπ/6 Fπ/3 Fπ/2 F2π/3
F0 F0 F5π/6 F2π/3 Fπ/2 Fπ/3 Fπ/6 R0 R5π/3 R4π/3 Rπ R2π/3 Rπ/3
Fπ/6 Fπ/6 F0 F5π/6 F2π/3 Fπ/2 Fπ/3 Rπ/3 R0 R5π/3 R4π/3 Rπ R2π/3
Fπ/3 Fπ/3 Fπ/6 F0 F5π/6 F2π/3 Fπ/2 R2π/3 Rπ/3 R0 R5π/3 R4π/3 Rπ
Fπ/2 Fπ/2 Fπ/3 Fπ/6 F0 F5π/6 F2π/3 Rπ R2π/3 Rπ/3 R0 R5π/3 R4π/3
F2π/3 F2π/3 Fπ/2 Fπ/3 Fπ/6 F0 F5π/6 R4π/3 Rπ R2π/3 Rπ/3 R0 R5π/3
F5π/6 F5π/6 F2π/3 Fπ/2 Fπ/3 Fπ/6 F0 R5π/3 R4π/3 Rπ R2π/3 Rπ/3 R0
(3.2)

From this table, we can answer many questions about the composition operator on D6 . For
example, if asked what f ∈ Dn satisfies R2π/3 ◦ f = F5π/6 , we simply look in the row corresponding
to a = R2π/3 for the b that gives the composition of F5π/6 . A priori, without any further theory,
there does not have to exist such an f , but in this case there does and f = Fπ/2 .
Some other properties of the composition operation on Dn are not as easy to identify directly
from the table in (3.2). For example, by Proposition 1.1.15, ◦ is associative on Dn . Verifying
associativity from the table in (3.2) would require checking 123 = 1, 728 equalities. Also, ◦ has an
identity on Dn , namely R0 . Indeed R0 is the identity function on V . Finally, every element in Dn
has an inverse: the inverse to R2πk/n is R2π(n−k)/n and the inverse to Fπk/n is itself. We leave the
proof of the following proposition as an exercise.

Proposition 3.1.3
Let n be a fixed integer with n ≥ 3. Then the dihedral symmetries Rα and Fβ satisfy the
following relations:

Rα ◦ Fβ = Fα/2+β , Fα ◦ Rβ = Fα−β/2 , and Fα ◦ Fβ = R2(α−β) .

Proof. (See Exercise 3.1.7.) 

3.1.2 – Abstract Notation


We introduce a notation that is briefer and aligns with the abstract notation that we will regularly
use in group theory.
Given any integer n ≥ 3, denote by r the rotation of angle 2π/n and by s the reflection through
the x-axis. In other words,
r = R2π/n and s = F0 .
3.1. SYMMETRIES OF THE REGULAR N-GON 73

In abstract notation, similar to our habit of notation for multiplication of real variables, we simply
write ab to mean a ◦ b for any two elements a, b ∈ Dn . Since ◦ is associative, by Proposition 2.3.7,
an expression such as rrsr is well-defined, regardless of the order in which we pair terms to perform
the composition. In this example, still with n = 6,

rrsr = Rπ/3 ◦ Rπ/3 ◦ F0 ◦ Rπ/3 = R2π/3 ◦ F0 ◦ Rπ/3 = Fπ/3 ◦ Rπ/3 = Fπ/6 .

To simplify notation further, if a ∈ Dn and k is a positive integer, then we write ak to represent


k times
ak = aaa · · · a.
z }| {

Hence, we could write, r2 sr for rrsr. It is important to note that since composition ◦ is not
commutative, r3 s is not necessarily equal to r2 sr. Finally, note that R0 is the identity function so
R0 ◦ f = f ◦ R0 = f for all f ∈ Dn . Consequently, we will denote R0 by ι to stand for the identity
function. Though we use multiplicative notation, we must continue to think of symbols as functions
and not as representing real variables.
It is not hard to see that
k times
z }| {
k
r = R2π/n ◦ R2π/n ◦ · · · ◦ R2π/n = R2πk/n .

Furthermore, by looking at the F0 column in (3.2), we suspect that in general

rk s = Fπk/n ,

where k satisfies 0 ≤ k ≤ n − 1. The result of Exercise 3.1.7 proves this. Consequently, as a set

Dn = {ι, r, r2 , . . . , rn−1 , s, rs, r2 s, . . . , rn−1 s}.

The symbols r and s have a few interesting properties. First, rn = ι and s2 = ι. These are
obvious as long as we do not forget the geometric meaning of the functions r and s. Less obvious is
the equality in the following proposition.

Proposition 3.1.4
Let n be an integer n ≥ 3. Then in Dn equipped with the composition operation,

sr = rn−1 s.

Proof. We first prove that rsr = s. By Exercise 3.1.7, the composition of a rotation with a reflection
is a reflection. Hence, (rs)r is a reflection. For any n, r maps v1 to v2 , then s maps v2 to vn , and
then r maps vn to v − 1. Hence, rsr is a reflection that keeps v1 fixed. There is only one reflection
in Dn that keeps v1 fixed, namely s.
Since rsr = s, we multiply both sides by rn−1 and get

rn sr = rn−1 s =⇒ ιsr = sr = rn−1 s. 

Corollary 3.1.5
Consider the dihedral symmetries Dn with n ≥ 3. Then

srk = rn−k s.

Proof. This follows by a repeated application of Proposition 3.1.4. 


74 CHAPTER 3. GROUPS

3.1.3 – Geometric Objects with Dihedral Symmetry


Regular polygons are not the only geometric objects that possess dihedral symmetry. A geometric
pattern is said to have dihedral Dn symmetry if the pattern remains identical when the plane is
transformed by all the rotations and reflections in Dn in reference to a given center C and an axis
L.
For example, if we ignore the color, both six-leafed shapes in Figure 3.3 possess D6 dihedral
symmetry.

Figure 3.3: Shapes with D6 symmetry

In Figure 3.4, the shape on the left displays D7 symmetry while the shape on the right displays
D5 symmetry.

Figure 3.4: D7 and D5 symmetry

On the other hand, consider the paramet-


ric plot in Figure 3.5. There does not exist an
axis through which the shape is preserved under
a reflection. Consequently, the shape does not
possess D3 symmetry. It only has rotational
symmetry with the smallest rotation angle of
2π/3.
If we know that a geometric pattern has
a certain dihedral symmetry, we only need to
draw a certain portion of the shape before it is
possible to determine the rest of the object. Let Figure 3.5: Only rotational symmetry.
F be a set of bijections of the plane and let S
be a subset of the plane that is preserved by all
the functions in F, i.e., f (S) = S for all f ∈ F. We say that a subset S 0 generates S by F if
[
S= f (S 0 ).
f ∈F

Furthermore, a subset S0 ⊆ S is a minimal generating subset of S if S0 generates S by F, and S0


is minimal by inclusion among all subsets of S that generates S by F. A minimal generating subset
is more commonly called a fundamental region of S under the set of transformations F.
For example, consider the shape shown in Figure 3.6. In the figure on the left, the dark gray
subset S0 is a generating subset for the overall set S but it is not a minimal generating subset. On
the other hand, in the figure on the right, the dark gray subset S0 is a minimal generating subset.
3.1. SYMMETRIES OF THE REGULAR N-GON 75

S0

S0

S S

Figure 3.6: D5 symmetry with generating subsets

Exercises for Section 3.1


1. Use diagrams to describe all the dihedral symmetries of the equilateral triangle.
2. Write down the composition table for D4 .
3. Determine what r3 sr4 sr corresponds to in dihedral symmetry of D8 .
4. Determine what sr6 sr5 srs corresponds to as a dihedral symmetry of D9 .
5. Let n be an even integer with n ≥ 4. Prove that in Dn , the element rn/2 satisfies rn/2 w = wrn/2 for
all w ∈ Dn .
6. Let n be an arbitrary integer n ≥ 3. Show that an expression of the form

ra sb rc sd · · ·

is a rotation if and only if the sum of the powers on s is even.


7. Using (3.1) and linear algebra prove that

Rα ◦ Fβ = Fα/2+β , Fα ◦ Rβ = Fα−β/2 , and Fα ◦ Fβ = R2(α−β) .

8. Describe the symmetries of an ellipse with unequal half-axes.


9. List all the symmetries of the circle and describe the compositions between them.
10. List all the symmetries and describe the compositions between them for the infinitely long sine curve
shown below:

... ...

11. List all the symmetries and describe the compositions between them for the infinitely long pattern
shown below:

... ...

12. List all the symmetries and describe the compositions between them for the infinitely long pattern
shown below:

... ...

13. Determine the set of symmetries for each of the following shapes (ignoring shading):
76 CHAPTER 3. GROUPS

(c)
(a)
(b)

(d) (f)
(e)

14. Sketch a pattern/shape (possibly a commonly known logo) that has D8 symmetry but does not have
Dn symmetry for n > 8.
π
15. Sketch a pattern/shape (possibly a commonly known logo) that has rotational symmetry of angle 2
but does not have full D4 symmetry.
16. Consider a regular tetrahedron. We call a rigid motion of the tetrahedron any rotation or composition
of rotations in R3 that map a regular tetrahedron back into itself, though possibly changing specific
vertices, edges, and faces. Rigid motions of solids do not include reflections through a plane.
(a) Prove that there are exactly 12 rigid motions of the tetrahedron. Call this set R.
(b) Using a labeling of the tetrahedron, explicitly list all rigid motions of the tetrahedron.
(c) Explain why function composition ◦ is a binary operation on R.
(d) Write down the composition table of ◦ on R.

17. Consider the hexagonal tiling pattern on the plane drawn below.

Define r as the rotation by 60◦ about O and t as the translation that moves the whole plane one
hexagon to the right. We denote by ◦ the operation of composition on functions R2 → R2 and we
denote by r−1 and t−1 the inverse functions to r and t.
3.2. INTRODUCTION TO GROUPS 77

(a) Show that r ◦ t is not equal to t ◦ r.


(b) Show that r ◦ t ◦ r ◦ t−1 ◦ r−1 has the effect of rotating the plane by 60◦ about the point P .
(c) Prove or disprove that there exists a composition of functions involving only r, t, and their

−→
inverses that has the effect of translating the whole plane in the direction OP ? If there is, give
it.
18. Consider the diagram S 0 below. Sketch the diagram S that has S 0 as a fundamental region with (a)
D4 symmetry; (b) only rotational square symmetry. [Assume reflection through the x-axis is one of
the reflections.]

S0

19. Consider the diagram S 0 below. Sketch the diagram S that has S 0 as a fundamental region with (a)
D6 symmetry; (b) D3 symmetry. [Assume reflection through the x-axis is one of the reflections.]

S0

3.2
Introduction to Groups
In the preface, we claimed that abstract algebra does not study properties of just one particular
algebraic object but rather studies properties of all objects with a given algebraic structure. An alge-
braic structure typically consists of a set equipped with various properties: a relation with specified
properties, a binary operation with specified properties, or some other set theoretic construction. In
Section 1.4, we presented posets as an algebraic structure. A group is an algebraic structure that
involves a set and one binary operation with certain properties.
At first glance, it may seem arbitrary why we might deem a certain set of properties as more
important than another. However, the long list of examples we will develop, the numerous connec-
tions to other branches of math, and the fruitful areas of research in group theory have given groups
a place of prominence in mathematics.

3.2.1 – Group Axioms

Definition 3.2.1
A group is a pair (G, ∗) where G is a set and ∗ is a binary operation on G that satisfies the
following properties:
(1) associativity: (a ∗ b) ∗ c = a ∗ (b ∗ c) for all a, b, c ∈ G;

(2) identity: there exists e ∈ G such that a ∗ e = e ∗ a = a for all a ∈ G;


(3) inverses: for all a ∈ G, there exists b ∈ G such that a ∗ b = b ∗ a = e.
78 CHAPTER 3. GROUPS

Proposition 1.2.7 showed that if any binary operation has an identity, then that identity is unique.
Similarly, any element in a group has exactly one inverse element.

Proposition 3.2.2
Let (G, ∗) be a group. Then for all a ∈ G, there exists a unique inverse element to a.

Proof. Let a ∈ G be arbitrary and suppose that b1 and b2 satisfy the properties of the inverse axiom
for the element a. Then
b1 = b1 ∗ e by identity axiom
= b1 ∗ (a ∗ b2 ) by inverse axiom
= (b1 ∗ a) ∗ b2 by associativity
= e ∗ b2 by definition of b1
= b2 by identity axiom.
Therefore, for all a ∈ G there exists a unique inverse. 
Since every group element has a unique inverse, our notation for inverses can reflect this. We
denote the inverse element of a by a−1 .
The defining properties of a group are often called the group axioms. In logic, one often uses the
term “axiom” to mean a truth that is self-evident or for which there can exist no further justification.
That is not the sense in which we use the term axiom in this case. In algebra, when we say that
such and such are the axioms of a given algebraic structure, we mean the defining properties of the
algebraic structure.
In the group axioms, there is no assumption that the binary operation ∗ is commutative. We say
that two particular elements a, b ∈ G commute (or commute with each other) if a ∗ b = b ∗ a. The
following property is named after Niels Abel, one of the founders of group theory.

Definition 3.2.3
A group (G, ∗) is called abelian if for all a, b ∈ G, a ∗ b = b ∗ a.

Usually, the groups we encounter possess a binary operation with a natural description. Some-
times, however, it is useful or even necessary to list out all operation pairings. If (G, ∗) is a finite
group and if we label all the elements as G = {g1 , g2 , . . . , gn }, then the group Cayley table (also
operation table) is the n × n array in which the (i, j)th entry is the result of the operation gi ∗ gj .
When listing the elements in a group it is customary that g1 be the identity element of the group.

3.2.2 – A Few Examples


It is important to develop a long list of examples of groups that show the breadth and restriction of
the group axioms.
Example 3.2.4. The pairs (Z, +), (Q, +), (R, +), and (C, +) are groups. In each case, addition is
associative and has 0 as the identity element. For a given element a, the additive inverse is −a. 4
Example 3.2.5. The pairs (Q∗ , ×), (R∗ , ×), and (C∗ , ×) are groups. Recall that A∗ mean A − {0}
when A is a set that includes 0. In each group, 1 is the multiplicative identity, and, for a given
element a, the (multiplicative) inverse is a1 . Note that (Z∗ , ×) is not a group because it fails the
inverse axiom. For example, there is no nonzero integer b such that 2b = 1.
On the other hand (Q>0 , ×) and (R>0 , ×) are groups. Multiplication is a binary operation on
Q and on R>0 , and it satisfies all the axioms.
>0
4
Example 3.2.6. A vector space V is a group under vector addition with ~0 as the identity. The
(additive) inverse of a vector ~v is −~v . Note that the scalar multiplication of a vector spaces has no
bearing on the group properties of the addition. 4
3.2. INTRODUCTION TO GROUPS 79

Example 3.2.7. In Section 2.2, we introduced modular arithmetic. Recall that Z/nZ represents
the set of congruence classes modulo n and that U (n) is the subset of Z/nZ of elements with
multiplicative inverses. Given any integer n ≥ 2, both (Z/nZ, +) and (U (n), ×) are groups. The
element 0 is the identity in Z/nZ and the element 1 is the identity U (n).
The tables for addition in (2.9) and (2.10) are the Cayley tables for (Z/5Z, +) and (Z/6Z, +).
By ignoring the column and row for 0 in the multiplication table in Equation (2.9), we obtain the
Cayley table for (U (5), ×).
× 1 2 3 4
1 1 2 3 4
2 2 4 1 3
3 3 1 4 2
4 4 3 2 1 4

All the examples so far are of abelian groups. We began this chapter by introducing dihedral
symmetries precisely because it offers an example of a nonabelian group.

Example 3.2.8 (Dihedral Groups). Let n ≥ 3 be an integer. The pair (Dn , ◦), where Dn is the
set of dihedral symmetries of a regular n-gon and ◦ is function composition, is a group. We call
(Dn , ◦) the nth dihedral group. Since rs = sr−1 and r−1 6= r for any n ≥ 3, the group (Dn , ◦) is
not abelian. The table given in Equation (3.2) is the Cayley table for D6 . 4

Example 3.2.9. The pair (R3 , ×), where × is the vector cross product, is not a group. First of
all × is not associative. Indeed, if ~ı, ~, and ~k are respectively the unit vectors in the x-, y-, and z-
directions, then
~ı × (~ı × ~) = ~ı × ~k = −~ 6= (~ı ×~ı) × ~ = ~0 × ~ = ~0.
Furthermore, × has no identity element. For any nonzero vector ~a and any other vector ~v , the
product ~a × ~v is perpendicular to ~a or is ~0. Hence, for no vector ~v do we have ~a × ~v = ~a. 4

Example 3.2.10. Let S be a set with at least 1 element. The pair (P(S), ∪) is not a group. The
union operation ∪ on P(S) is both associative and has an identity ∅. However, if A 6= ∅, there does
not exist a set B ⊆ S such that A ∪ B = ∅. Hence, (P(S), ∪) does not have inverses. 4

Example 3.2.11 (Matrix Groups). Let n be a positive integer. The set of n × n invertible
matrices with real coefficients is a group with the multiplication operation. In this group, the
identity is the identity matrix and the inverse of a matrix A is the matrix inverse A−1 . This group
is called the nth general linear group over R and is denoted by GLn (R).
In Section 5.3, we discuss properties of matrices in more generality. However, without yet pro-
viding the full algebraic theory, we point out that matrix operations are well-defined if we consider
matrices with only rational coefficients, matrices with complex coefficients or even matrices with
coefficients from Fp , modular arithmetic modulo a prime number p. In fact, matrix addition, matrix
multiplication, matrix inversion, and the Gauss-Jordan elimination algorithm only require that the
coefficients are in a field. (See Definition 5.1.22.)
Suppose that F represents Q, R, C, or Fp . Of key importance is the fact that an n × n matrix A
with coefficients in F is invertible if and only if the columns are linearly independent if and only if
det(A), the determinant of A, is nonzero. We denote by GLn (F ) the nth general linear group over
F and we always denote the identity matrix by I.
As an explicit example, consider GL2 (F5 ). The number of 2 × 2 matrices over F5 is 54 = 625.
However, all are invertible. To determine the cardinality GL2 (F5 ), we consider the columns of a
matrix A, which must be linearly independent. The only condition on the first column is that it is
not 0. Hence, there are 52 − 1 = 24 options for the first column. Given the first column, the only
necessary condition on the second column is that it is not a F5 -multiple of the first column. This
accounts for 52 − 5 = 20 (all columns minus the 5 multiples of the first column) options. Hence,
GL2 (F5 ) has 24 × 20 = 480 elements.
80 CHAPTER 3. GROUPS

An example of multiplication in GL2 (F5 ) is


      
1̄ 3̄ 1̄ 1̄ 1̄ + 9 1̄ + 6̄ 0̄ 2̄
= =
2̄ 4̄ 3̄ 2̄ 2̄ + 12 9̄ + 8̄ 4̄ 2̄

while the following illustrates calculating the inverse of a matrix


 −1        
3̄ 3̄ −1 1̄ −3̄ −1 1̄ 2̄ 1̄ 2̄ 4̄ 3̄
= (3̄ − 4̄ · 3̄) = 4̄ = 4̄ = .
4̄ 1̄ −4̄ 3̄ 1̄ 3̄ 1̄ 3̄ 4̄ 2̄

The reader should verify that all the matrices involved in the above calculations have a determinant
in F5 that is different from 0̄. 4

The above examples give us an initial repertoire of groups from which to draw intuition. Oc-
casionally, we will also encounter methods to create new groups from old ones. The following
construction is one such method.

Definition 3.2.12 (Direct Sum)


Let (G1 , ∗1 ) and (G2 , ∗2 ) be two groups. The direct sum of the groups is a new group
(G1 × G2 , ∗) where the set operation is defined by

(a1 , a2 ) ∗ (b1 , b2 ) = (a1 ∗1 b1 , a2 ∗2 b2 ).

The direct sum is denoted by G1 ⊕ G2 .

The direct sum generalizes to any finite number of groups. For example, the group (R3 , +) is
the triple direct sum of group (R, +) with itself.

3.2.3 – Notation for Arbitrary Groups


The common terminology in group theory is not often rigorous. Some authors give the definition
of a group as “a set G is a group with respect to the binary operation ∗ if” and then state the
axioms. This phrasing is precise and can be used rigorously, though it is important to remember
that a group consist of two pieces of data: the set and the binary operation on it. To borrow from
an object-oriented programming paradigm, a group is an object that has the underlying set as an
attribute and the binary operation (satisfying Definition 3.2.1) as a method.
In group theory, we will regularly discuss the properties of an arbitrary group. In this case,
instead of writing the operation as a ∗ b, where ∗ represents some unspecified binary operation, it
is common to write the generic group operation as ab. With this convention of notation, it is also
common to indicate the identity in an arbitrary group as 1 instead of e. In this chapter, however,
we will continue to write e for the arbitrary group identity in order to avoid confusion. Finally, with
arbitrary groups, we denote the inverse of an element a as a−1 .
With these conventions of notation, we regularly say “Let G be a group,” without providing a
symbol for a generic binary operation. The notation for the direct sum G1 ⊕ G2 of two groups G1
and G2 reflects the lack of rigor in the terminology. The set corresponding to G1 ⊕ G2 is G1 × G2 ,
while the operation on G1 ⊕ G2 is the componentwise operation.
By a similar abuse of language, we often refer, for example, to “the dihedral group Dn ,” as
opposed to “the dihedral group (Dn , ◦).” In this expression, the operation of composition is under-
stood. Similarly, when we talk about “the group Z/nZ,” we mean (Z/nZ, +) and when we refer to
“the group U (n),” we mean the group (U (n), ×). We will explicitly list the pair of set and binary
operation if there could be confusion as to which binary operation the group refers. Furthermore,
even if a group is equipped with a natural operation, we often just write ab to indicate that opera-
tion. Following the analogy with multiplication, in a group G, if a ∈ G and k is a positive integer,
by ak we mean
k times
def z }| {
ak = aa · · · a.
3.2. INTRODUCTION TO GROUPS 81

k
We extend the power notation so that a0 = e and a−k = a−1 , for any positive integer k.
Groups that involve addition give an exception to the above habit of notation. In that case, we
always write a + b for the operation, −a for the inverse, and, if k is a positive integer,

k times
def
z }| {
k · a = a + a + · · · + a. (3.3)

We refer to k·a as a multiple of a instead of as a power. Again, we extend the notation to nonpositive
“multiples” just as above with powers.

Proposition 3.2.13
Let G be a group and let x ∈ G. For all n, m ∈ Z, the following identities hold
n
a) xm xn = xm+n b) (xm ) = xmn

Proof. (Left as an exercise for the reader. See Exercise 3.2.17.) 

The process of simply considering the successive powers of an element gives rise to an important
class of groups.

Definition 3.2.14
A group G is called cyclic if there exists an element x ∈ G such that every element of G is
a power of x. The element x is called a generator of G.

For example, we notice that for all integers n ≥ 2, the group Z/nZ (with addition as the
operation) is a cyclic group because all elements of Z/nZ are multiples of 1. As we saw in Section 2.2,
one of the main differences with usual arithmetic is that n·1 = 0. The intuitive sense that the powers
of an element cycle back motivate the terminology. The group Z (with addition) is also a cyclic
because every element in Z is n · 1 with n ∈ Z.

Example 3.2.15 (Finite Cyclic Groups). Let n be a positive integer. We denote by Zn the
group with elements {e, x, x2 , . . . , xn−1 }, where x has the property that xn = e. We point out two
things about this notation.
First, we do not define this group as existing in any previously known arithmetic context. The
element x does not represent some complex number or matrix or any other object; we have simply
defined how it operates symbolically.
Second, whether we use the variable name x or any other letter of the alphabet, the group is
the same. In fact, if a certain discussion involves two groups Zm and Zn with m 6= n, then we will
commonly use different letters for this variable name for which all other elements exist as powers.
For example, if we work with the group Z4 ⊕ Z2 we might write

Z4 ⊕ Z2 = {(xm , y n ) | x4 = e and y 2 = e}.

We take the opportunity in this example to point out that Z4 ⊕ Z2 is not another cyclic group.
If we were to consider all the powers of all the elements (xi , y j ) with 0 ≤ i ≤ 3 and 0 ≤ j ≤ 1, we
would find that no element of the group is such that the set of all its powers gives the whole group.
For example, if g = (x3 , y), then

g = (x3 , y), g 2 = (x6 , y 2 ) = (x2 , e), g 3 = (x5 , y) = (x, y), g 4 = (x4 , y 2 ) = (e, e).

Hence, the set of powers {e, g, g 2 , . . .} contains only 4 elements and not all eight elements of Z4 ⊕Z2 .4
82 CHAPTER 3. GROUPS

3.2.4 – First Properties

Proposition 3.2.16
Let (G, ∗) be a group.
(1) The identity in G is unique.
(2) For each a ∈ G, the inverse of a is unique.

(3) For all a ∈ G, (a−1 )−1 = a.


(4) For all a, b ∈ G, (a ∗ b)−1 = b−1 ∗ a−1 .
(5) For any a1 , a2 , . . . , an ∈ G, the value of a1 ∗ a2 ∗ · · · ∗ an is independent of how you
place the parentheses.

Proof. We have already seen (1) and (2) in Proposition 1.2.7 and Proposition 3.2.2, respectively.
For (3), by definition of the inverse of a we have a ∗ (a−1 ) = (a−1 ) ∗ a = e. However, this shows
that a satisfies the inverse axiom for the element a−1 .
For (4), we have
(a ∗ b)−1 ∗ (a ∗ b) = e ⇐⇒ ((a ∗ b)−1 ∗ a) ∗ b = e associativity
−1 −1 −1
⇐⇒ (((a ∗ b) ∗ a) ∗ b) ∗ b =e∗b operate on right by b−1
⇐⇒ ((a ∗ b)−1 ∗ a) ∗ (b ∗ b−1 ) = b−1 assoc and id
−1 −1
⇐⇒ ((a ∗ b) ∗ a) ∗ e = b inverse axiom
−1 −1
⇐⇒ (a ∗ b) ∗a= b identity axiom
−1 −1 −1
⇐⇒ (a ∗ b) = b ∗a .

We proved (5) for any associative operation in Proposition 2.3.7. 

Proposition 3.2.17 (Cancellation Law)


A group G satisfies the left and right cancellation laws, namely

au = av =⇒ u = v (Left cancellation)
ub = vb =⇒ u = v. (Right cancellation)

Proof. If au = av, then multiplying on the left by a−1 , we obtain


a−1 au = a−1 av =⇒ eu = ev =⇒ u = v.
Similarly, if ub = vb, then by multiplying the equality on the right by b−1 , we obtain
ubb−1 = vbb−1 =⇒ ue = ve =⇒ u = v. 
It is important to note that Proposition 3.2.17 does not claim that au = va implies u = v. In
fact, au = va implies u = v if and only if a commutes with u or v.
The Cancellation Law leads to an interesting property about the Cayley table for a finite group.
In combinatorics, a Latin square is an n × n array, filled with n different symbols in such a way
that each symbol appears exactly once in each column and exactly once in each row. Since au = av
implies u = v, in the row corresponding to a, each distinct column has a different group element
entry. Hence, each row of the Cayley table contains n different group elements. Similarly, right
cancellation implies that in a given column, different rows have different group elements. This
shows that the Cayley graph for every group is a Latin square.
3.2. INTRODUCTION TO GROUPS 83

3.2.5 – Useful CAS Commands


Various computer algebra systems implement procedures for working with group theory.
In Maple version 16 or below, the command with(group); accesses the appropriate package.
In Maple version 17 or higher, the group package was deprecated in favor of with(GroupTheory);.
Mathematica also boasts a number of commands dedicated to permutation groups (subgroups of Sn ).
A freeware package called GAP (which stands for Groups, Algorithms, Programming) implements
algorithms for computational discrete algebra, with an emphasis on group theory.
Because of the abstract nature of group theory, the available commands in various CAS implement
computations ranging from elementary to specialized for group theorists.

Exercises for Section 3.2

In Exercises 3.2.1 through 3.2.14, decide whether the given set and the operation pair forms a group. If it is,
prove it. If it is not, decide which axioms fail. You should always check that the symbol is in fact a binary
operation on the given set.
1. The pair (N, +).
2. The pair (Q − {−1}, +
×), where +
× is defined by a+
×b = a + b + ab.
a
3. The pair (Q − {0}, ÷), with a ÷ b = b
.
4. The pair (A, +), where A = {x ∈ Q | |x| < 1}.
5. The pair (Z × Z, ∗), where (a, b) ∗ (c, d) = (ad + bc, bd).
6. The pair ([0, 1), ), where x  y = x + y − bx + yc.
7. The pair (A, +), where A is the set of rational numbers that when reduced have a denominator of 1
or 3.

8. The pair (A, +), where A = {a + b 5 | a, b ∈ Q}.

9. The pair (A, ×), where A = {a + b 5 | a, b ∈ Q}.

10. The pair (A, ×), where A = {a + b 5 | a, b ∈ Q and (a, b) 6= (0, 0)}.
11. The pair (U (20), +).
12. The pair (P(S), 4), where S is any set and 4 is the symmetric difference of two sets.
13. The pair (G, ×), where G = {z ∈ C |z| = 1}.

14. The pair (D, ∗), where D is the set of open disks in R2 , including the empty set ∅, and where D1 ∗ D2
is the unique open disk of least radius that encloses both D1 and D2 .
15. Show that Z5 ⊕ Z2 is cyclic.
16. Show that Z4 ⊕ Z2 is not cyclic.
17. Prove Proposition 3.2.13. [Hint: Pay careful attention to when powers are negative, zero, or positive.]
18. Is U (11) a cyclic group?
19. Is U (10) a cyclic group?
20. Prove that (Q, +) is not a cyclic group.
21. Construct the Cayley table for U (15).
22. Construct the Cayley table for Z3 ⊕ Z3 .
23. Prove that a group is abelian if and only if its Cayley table is symmetric across the main diagonal.
24. Prove that S = {2a 5b | a, b ∈ Z} as a subset of rational numbers is a group under multiplication.
25. Prove that the set {1, 13, 29, 41} is a group under multiplication modulo 42.
26. Prove that if xn = e, then x−1 = xn−1 .
27. Let A and B be groups. Prove that the direct sum A ⊕ B is abelian if and only if A and B are both
abelian.
28. Prove that if a group G satisfies x2 = e for all x ∈ G, then G is abelian.
29. Prove that if a group G satisfies (xy)−1 = x−1 y −1 for all x, y ∈ G, then G is abelian.
84 CHAPTER 3. GROUPS

30. Let g1 , g2 , g3 ∈ G. What is (g1 g2 g3 )−1 ? Generalize your result.


31. Prove that every cyclic group is abelian.
32. Prove that GLn (Fp ) contains
(pn − 1)(pn − p)(pn − p2 ) · · · (pn − pn−1 )
elements. [Hint: Use the fact the GLn (Fp ) consists of n × n matrices with coefficients in Fp that have
columns that are linearly independent.]
33. Write out the Cayley table for GL2 (F2 ).
34. In the given general linear group, for the given matrices A and B, calculate the products A2 , AB, and
B −1 .
   
0 2 1 1
(a) GL2 (F3 ) with A = and B = .
2 1 1 2
   
1 3 0 1
(b) GL2 (F5 ) with A = and B = .
4 1 2 3
   
4 6 5 4
(c) GL2 (F7 ) with A = and B = .
3 2 3 2
35. Let F be Q, R, C, or Fp . The Heisenberg group with coefficients in F is
  
 1 a b 
H(F ) = 0 1 c  ∈ GL3 (F ) a, b, c ∈ F .
0 0 1
 

(a) Show that H(F ) is a group under matrix multiplication.


(b) Explicitly show the inverse of an element in H(F ).

3.3
Properties of Group Elements
As we progress through group theory, we will encounter more and more internal structure to groups
that is not readily apparent from the three axioms for groups. This section introduces a few ele-
mentary properties of group operations.

3.3.1 – Order of Elements

Definition 3.3.1
Let G be a group.
(1) If G is finite, we call the cardinality of |G| the order of the group.
(2) Let x ∈ G. If xk = e for some positive integer k, then we call the order of x, denoted
|x|, the smallest positive value of n such that xn = e. If there exists no positive n
such that xn = e, then we say that the order of x is infinite.

Note that the order of a group element g is |g| = 1 if and only if g is the group’s identity element.
As a reminder, we list the orders of a few groups we have encountered so far:
|Dn | = 2n, (Proposition 3.1.2)
|Zn | = n,
|U (n)| = φ(n), (Corollary 2.2.10)
| GLn (Fp )| = (p − 1)(pn − p)(pn − p2 ) · · · (pn − pn−1 ).
n
(Exercise 3.2.32)
3.3. PROPERTIES OF GROUP ELEMENTS 85

Example 3.3.2. Consider the group G = (Z/20Z, +). We calculate the orders of 5̄ and 3̄.
For 5̄, we calculate directly that

2 · 5̄ = 10, 3 · 5̄ = 15, 4 · 5̄ = 20 = 0̄.

Hence, |5̄| = 4. However, for 3̄ we find that

k · 3̄ = 3k for 0 ≤ k ≤ 6,
k · 3̄ = 3(k − 7) + 1 for 7 ≤ k ≤ 13,
k · 3̄ = 3(k − 14) + 2 for 14 ≤ k ≤ 20.

This shows that the first positive integer k such that k · 3̄ = 0̄ is 20. Hence, |3̄| = 20. 4

Example 3.3.3. Consider the group GL2 (F3 ) and we calculate the order of
 
2 1
g=
1 0

by evaluating various powers of g:


     
2 1 2 2 1 2 2
g= , g =g = ,
1 0 1 0 2 1
       
3 2 2 1 0 2 2 1 2 0
g =g = , g4 = g3 = ,
1 0 2 2 1 0 0 2
       
2 1 1 2 2 1 1 1
g5 = g4 = , g6 = g5 = ,
1 0 2 0 1 0 1 2
       
2 1 0 1 2 1 1 0
g7 = g6 = , g8 = g7 = .
1 0 1 1 1 0 0 1

From these calculations, we determine that the order of g is |g| = 8. 4

The orders of elements in groups will become a particularly useful property to consider in a
variety of situations. We present a number of propositions concerning the powers and orders of
elements.

Proposition 3.3.4
Let G be any group and let x ∈ G. Then x−1 = |x|.

Proof. (Left as an exercise for the reader. See Exercise 3.3.7.) 

Proposition 3.3.5
Let x ∈ G with xn = e and xm = e, then xd = e where d = gcd(m, n).

Proof. From Proposition 2.1.12, the greatest common divisor d = gcd(m, n) can be written as a
linear combination sm + tn = d, for some s, t ∈ Z. Then

xd = xsm+tn = (xm )s (xn )t = es · et = e. 

Corollary 3.3.6
Suppose that x is an element of a group G with xm = e. Then the order |x| divides m.
86 CHAPTER 3. GROUPS

Proof. If |x| = n, then xn = xm = e. By Proposition 3.3.5, xgcd(m,n) = e. However, n is the least


positive integer k such that xk = e. Furthermore, since gcd(m, n) ≤ n, then gcd(m, n) = n. This
implies that n divides m. 

Proposition 3.3.7
Let G be a group and let a ∈ N∗ . Then we have the following results about orders:

(1) If |x| = ∞ then |xa | = ∞.


n
(2) If |x| = n < ∞, then |xa | = .
gcd(n, a)

Proof. For (1), suppose that xa had finite order k. Then (xa )k = xak = e. This contradicts |x| = ∞.
Hence, xa has infinite order.
For (2), let y = xa and d = gcd(n, a). Writing n = db and a = dc, by Proposition 2.1.11, we
know that gcd(b, c) = 1. Then

y b = xab = xbcd = xnc = ec = e.

By Corollary 3.3.6, the order |y| divides b. Conversely, suppose that |y| = k. Then k divides b.
Since y k = xak = e, then n|ak. Thus db|dck, which implies that b|ck. However, gcd(b, c) = 1, so we
conclude that b|k. However, since b|k and k|b and b, k ∈ N∗ , we can conclude that b = k.
Hence, |y| = |xa | = b = n/d = n/ gcd(a, n). 

Proposition 3.3.7 presents two noteworthy cases. If gcd(a, n) = 1, then |xa | = n. Second, if a
divides n, then |xa | = n/a.
It is important to point out that in a general group, there is very little that can be said about the
relationship between |xy|, |x|, and |y| for two elements x, y ∈ G. For example, consider the dihedral
group Dn . Both s and rs refer to reflections through lines and hence have order 2. However,
s(sr) = r, and r has order n. Thus, given any integer n, there exists a group where |g1 | = 2 and
|g2 | = 2 and |g1 g2 | = n.
Example 3.3.8 (Infinite Dihedral Group). As a more striking example, consider the group D∞
that contains elements labeled as x and y with x2 = ι and y 2 = ι and no other conditions. Since
there are no other conditions between the elements, the element xy has infinite order. Hence, all
elements of the group are of the form

(xy)n , y(xy)n , (xy)n x, or y(xy)n x

for n ∈ Z. In fact, by Exercise 3.3.19, all elements can be written as (xy)n or (xy)n x. We denote
this group by D∞ because it is called the infinite dihedral group. 4

There is a particular case in which we can calculate the orders of a product of elements from the
orders of the original elements.

Theorem 3.3.9
Let G1 , G2 , . . . , Gn be groups and let (g1 , g2 , . . . , gn ) ∈ G1 ⊕ G2 ⊕ · · · ⊕ Gn be an element
in the direct sum. Then the order of (g1 , g2 , . . . , gn ) is

(g1 , g2 , . . . , gn ) = lcm(|g1 |, |g2 |, . . . , |gn |).

Proof. We have (g1 , g2 , . . . , gn )m = (e1 , e2 , . . . , en ) if and only if gim = ei for all gi ∈ Gi . Note that
each symbol ei represents the identity in each of the group Gi . By Corollary 3.3.6, |gi | divides m
for all i = 1, 2, . . . , n. Thus, lcm(|g1 |, |g2 |, . . . , |gn |) divides m.
3.3. PROPERTIES OF GROUP ELEMENTS 87

Suppose now that k is the order of (g1 , g2 , . . . , gn ), namely the least positive integer m such
that (g1 , g2 , . . . , gn )m = (e1 , e2 , . . . , en ). Then lcm(|g1 |, |g2 |, . . . , |gn |) divides k = |(g1 , g2 , . . . , gn )|.
However, lcm(|g1 |, |g2 |, . . . , |gn |) is a multiple of |gi | for each i, and hence,
(g1 , g2 , . . . , gn )lcm(|g1 |,|g2 |,...,|gn |) = (e1 , e2 , . . . , en ).
Hence, by Corollary 3.3.6, k divides lcm(|g1 |, |g2 |, . . . , |gn |). Since lcm(|g1 |, |g2 |, . . . , |gn |) and k are
positive numbers that divide each other, they are equal. The theorem follows. 

3.3.2 – Classification Theorems


Throughout mathematics, a classification theorem is a theorem that identifies all objects with a
given structure that satisfy a certain property. For example, Greek mathematicians knew that
there existed only five convex regular polyhedra: tetrahedron, cube, octahedron, dodecahedron,
and icosahedron—the Platonic solids. We could call this result the “classification of convex regular
polyhedra,” where polyhedron is the structure and “convex and regular” is the property. Depending
on the mathematical structure and the particular property, such theorems may be quite profound
and consequently challenging to prove.
We give a very simple classification theorem by way of example.
Example 3.3.10 (Groups of Order 4). We propose to find all groups of order 4. Instead of
listing out all the groups of order 4 we have seen so far, we try to fill out a Cayley table for a group
of order 4. Suppose that G = {e, a, b, c} with a, b, and c distinct nonidentity group elements. A
priori, all we know about the Cayley graph is the first column and first row.
e a b c
e e a b c
a a
b b
c c
Note that if a group G contains an element g of order n, then {e, g, g 2 , . . . , g n−1 } is a subset of n
distinct elements. (See Exercise 3.3.24.) Hence, a group of order 4 cannot contain an element of
order 5 or higher.
Suppose that G contains an element of order 4, say, the element a. Then G = {e, a, a2 , a3 }.
Without loss of generality, we can call b = a2 and c = a3 and the Cayley table becomes the
following.
e a b c
e e a b c
a a b c e
b b c e a
c c e a b
We recognize this table as corresponding to the cyclic group Z4 .
Suppose that G does not contain an element of order 4 but contains one of order 3. So assume
that |a| = 3. Then G = {e, a, a2 , c}, with all elements distinct. We now try to determine ac. The
element ac cannot be equal to ak for any k for then c = ak−1 , a contradiction. We cannot have
ac = c for then a = e, again a contradiction. Hence, a group of order 4 cannot contain an element
of order 3.
Suppose now that all the elements in G have order 2. We must have ab = c. This is because ab
cannot be e because a is its own inverse and a 6= b; ab cannot be a because this would imply that
b = e and ab cannot be b because this would imply that a = e. The same reasoning applies to all
other pairings and we conclude that the Cayley table is
e a b c
e e a b c
a a e c b
b b c e a
c c b a e
88 CHAPTER 3. GROUPS

This group is often denoted by V4 and called the Klein-4 group. Our approach covered all possible
cases for orders of elements in a group of order 4 so we conclude that Z4 and V4 are the only two
groups of order 4. 4

The conclusion of the previous example might seem striking at first. In particular, we already
know two groups of order 4 that have different properties: Z4 and Z2 ⊕ Z2 . Consequently, V4 and
Z2 ⊕ Z2 must in some sense be the same group. However, we do not yet have the background to
fully develop an intuitive concept of sameness for groups. We return to that issue in Section 3.7.
We will discuss classification theorems then and at various points in later sections.
Though not fully cast as a classification question, the next two examples illustrate how we can
discover new groups through abstract reasoning as used in the previous example.

Example 3.3.11. Suppose that G is a group of order 8 that contains an element x of order 4. Let
y be another element in G that is distinct from any power of x. With these criteria, we know so far
that G contains the distinct elements e, x, x2 , x3 , y. The element xy cannot be
e because that would imply y = x3 ;
x because that would imply y = e;
x2 because that would imply y = x;
x3 because that would imply y = x2 ;
y because that would imply x = e.

So xy is a new element of G. By similar reasonings which we leave to the reader, the elements x2 y
and x3 y are distinct from all the others. Hence, G must contain the 8 distinct elements

{e, x, x2 , x3 , y, xy, x2 y, x3 y}. (3.4)

Now let us assume that |y| = 2. Consider the question of the value of yx. By the identical
reasoning by cases provided above, yx cannot be e, x, x2 , x3 , or y. Thus, there are three cases: (1)
yx = x3 y; (2) yx = xy; and (3) yx = x2 y.

Case 1. If yx = x3 y, then the group is in fact D4 , the dihedral group of the square, where x serves
the role of r and y serves the role of s.

Case 2. If yx = xy, then G xs y t = y t xs and so G is abelian. We leave it up to the reader to show


that this group is Z4 ⊕ Z2 .

Case 3. If yx = x2 y then yxy = x2 . Hence,

x = y 2 xy 2 = y(yxy)y = yx2 y = yxy 2 xy = (yxy)(yxy) = x4 = e.

We conclude that x = e, which contradicts the assumption that x has order 4. Hence, there
exists no group of order 8 with an element x of order 4 and an element y of order 2 with
yx = x2 y.

Assume now that |y| = 3. Consider the element y 2 in G. A quick proof by cases shows that y 2
cannot be any of the eight distinct elements listed in (3.4). Hence, there exists no group of order 8
containing an element of order 4 and one of order 3.
Assume now that |y| = 4. Again, we consider the possible value of y 2 . If there exists a group with
all the conditions we have so far, then y 2 must be equal to an element in (3.4). Now |y 2 | = 4/2 = 2
so y 2 cannot be e, x, x3 , or y, which have orders 1, 4, 4, 4, respectively. Furthermore, y 2 cannot be
equal to xy, (respectively x2 y or x3 y) because that would imply x = y (respectively x2 = y or
x3 = y), which is against the assumptions on x and y. We have not ruled out the possibility that
y 2 = x2 .
We focus on this latter possibility, namely a group G containing x and y with |x| = 4, |y| = 4,
y∈ / {e, x, x2 , x3 } and x2 = y 2 . If we now consider possible values of the element yx, we can quickly
eliminate all possibilities except xy and x3 y. If G = Z4 ⊕ Z2 = {(z, w) | z 4 = e and w2 = e}, then
3.3. PROPERTIES OF GROUP ELEMENTS 89

setting x = (z, e) and y = (z, w), it is easy to check that Z4 ⊕ Z2 satisfies x2 = y 2 and yx = xy. On
the other hand, if yx = xy 3 , then G is a nonabelian group in which x, x3 , y, y 3 = x2 y are elements
of order 4. However, D4 is the only nonabelian group of order 8 that we have encountered so far
and in D4 only r and r3 have order 4. Hence, G must be a new group. 4

We now introduce the new group identified in this example but using the symbols traditionally
associated to it.
Example 3.3.12 (The Quaternion Group). The Quaternion group, denoted by Q8 , contains
the following eight elements:
1, −1, i, −i, j, −j, k, −k.
The operations on the elements are in part inspired by how the imaginary number operates on itself.
In particular, 1 is the identity element and multiplication by (−1) changes the sign of any element.
We also have
i2 = −1, i3 = −i, i4 = 1,
2 3
j = −1, j = −j, j 4 = 1,
2 3
k = −1, k = −k, k 4 = 1,
ij = k, jk = i, ki = j,
ji = −k, kj = −i, ik = −j.
Matching symbols to Example 3.3.11, note that i4 = j 4 = 1, i2 = −1 = j 2 , and ji = −k = (−i)j =
i3 j. This shows that Q8 is indeed the new group of order 8 discovered at the end of the previous
example. 4

A bigger classification question would involve finding all the groups of order 8. We could solve this
problem at this point, but we will soon encounter theorems that establish more internal structure on
groups that would make such questions easier. Consequently, we delay most classification questions
until later.

Exercises for Section 3.3


1. Find the orders of 5̄ and 6̄ in (Z/21Z, +).
2. Find the orders of all the elements in (Z/18Z, +).
3. Find the orders of all the elements in U (21).
4. Find the orders of all the elements in GL2 (F3 ).
5. Find the orders of all the elements in Q8 .
6. Find the orders of all the elements in Z4 ⊕ Z2 .
7. Prove Proposition 3.3.4.
8. Show that for all integers n ≥ 1, the element
2π 2π
  
cos n 
− sin n
R= 2π 2π
sin n
cos n

in GL2 (R) has order n.


9. Calculate the order 20 in Z/52Z.
10. Calculate the order of 285 in the group Z/360Z.
11. Calculate the order of r16 in D24 .
12. What is the largest order of an element in Z75 ⊕ Z100 ? Illustrate with a specific element.
13. Find an element of order 3 in GL3 (F2 ).
 
0̄ 2̄
14. Revisit Example 3.3.3. Calculate the order of in GL2 (F3 ) without performing a single matrix
2̄ 2̄
operation.
15. Find all the generators of the cyclic group Z40 .
16. Find all the generators of the cyclic group Z/36Z.
90 CHAPTER 3. GROUPS

17. Let p be an odd prime.


n
(a) Use the Binomial Theorem to prove that (1 + p)p ≡ 1 (mod pn+1 ) for all positive integers n.
n−1
(b) Prove also that (1 + p)p 6≡ 1 (mod pn+1 ) for all positive integers n.
(c) Conclude that 1 + p has order pn in U (pn+1 ).
18. Let G be a group such that for all a, b, c, d, x ∈ G, the identity axb = cxd implies ab = cd. Prove that
G is abelian.
19. Consider the infinite dihedral group as presented in Example 3.3.8.
(a) Show that y(xy)n = (xy)−n−1 x for all integers n.
(b) Conclude that every element in D∞ can be written as (xy)n or (xy)n x for some n ∈ Z.
20. Let a and b be elements in a group G. Prove that if a and b commute, then the order of ab divides
lcm(|a|, |b|).
21. Find a nonabelian group G and two elements a, b ∈ G such that |ab| does not divide lcm(|a|, |b|).
22. Let G and H be two finite groups. Prove that G ⊕ H is cyclic if and only if G and H are both cyclic
with gcd(|G|, |H|) = 1.
23. Let G1 , G2 , . . . , Gn be n finite groups. Prove that G1 ⊕ G2 ⊕ · · · ⊕ Gn is cyclic if and only if Gi is cyclic
for all i = 1, 2, . . . , n and gcd(|Gi |, |Gj |) = 1 for all i 6= j. [Hint: Use Exercise 3.3.22 and induction.]
24. Let x ∈ G be an element of finite order n. Prove that e, x, x2 , . . . , xn−1 are all distinct. Deduce that
|x| ≤ |G|.
25. Prove that for elements x and y in any group G we have |x| = |yxy −1 |.
26. Use the preceding exercise to show that for any elements g1 and g2 in a group |g1 g2 | = |g2 g1 |, even if
g1 and g2 do not commute.
27. Prove that the group of rigid motions of the cube has order 24.
28. Prove that the group of rigid motions of the octahedron has order 24.
29. Prove that the group of rigid motions of a classic soccer ball has order 60.

30. Find all groups of order 5.


31. We consider groups of order 6. We know that Z6 is a group of order 6. We now look for all the others.
Let G be any group of order 6 that is not Z6 , i.e., does not contain an element of order 6.
(a) Show that G cannot have an element of order 7 or greater.
(b) Show that G cannot have an element of order 5.
(c) Show that G cannot have an element of order 4.
(d) Show that the nonidentity elements of G have order 2 or 3.
(e) Conclude that exist only two subgroups of order 6. In particular, there exists one abelian group
of order 6 (the cyclic group Z6 ) and one nonabelian group of order 6 (D3 is such a group).
[Comment: We will encounter a number of nonabelian groups of order 6 but the result of this exercise
establishes that they are all in some sense the same. We will make precise this notion of sameness in
Section 3.7.3.]
32. Let G = {e, v, w, x, y, z} be a group of order 6. For the following partial table, decide if it can be
completed to the Cayley table of some G and if so fill it in. [Hint: You may need to use associativity.]
e v w x y z
e − − − − − −
v − − − − − w
w − − − z e −
x − y − − − −
y − − − − − −
z − − − − − −
3.4. SYMMETRIC GROUPS 91

33. Let G = {e, t, u, v, w, x, y, z} be a group of order 8. For the following partial table, decide if it can be
completed to the Cayley table of some G and if so fill it in. [Hint: You may need to use associativity.]
e t u v w x y z
e − − − − − − − −
t − − − − − − − e
u − − e − − y x t
v − − − u − t − −
w − x v − − − − y
x − − − z − − − −
y − − − t z − − −
z − − − − x − − u
34. Let {Gi }i∈I be a collection of groups, indexed by a set I that is not necessarily finite. We define the
direct sum of this collection, denoted by M
Gi
i∈I

as the set of choice functions f that for each i ∈ I associates


L f (i) ∈ Gi , such that f (i) = 1 for all but
a finite number of indices i. We define an operation · on i∈I Gi by f · g as the choice function such
that (f · g)(i) = f (i)g(i) in each group Gi . Prove that this direct sum of the collection {Gi }i∈I is itself
a group. [In the special case that I = N, the direct sum consists of infinite sequences (g0 , g1 , g2 , . . .)
such that gi ∈ Gi and gi = 1 for all but a finite number of indices i. In the case that I is a finite
set, then this definition is identical to Definition 3.2.12 and its generalization to a finite number of
groups.]

3.4
Symmetric Groups
Symmetric groups play a key role in group theory and applications of group theory. This section
introduces the terminology and elementary properties of symmetric groups.

3.4.1 – Permutations

Definition 3.4.1
Let A be a nonempty set. Define SA as the set of all bijections from A to itself.

Proposition 3.4.2
The pair (SA , ◦) is a group, where the operation ◦ is function composition.

Proof. The composition of two bijections is a bijection so composition is a binary operation on SA .


Proposition 1.1.15 establishes that ◦ is associative in SA .
The function idA : A → A such that idA (x) = x for all x ∈ A is the group identity.
Since f ∈ SA is a bijection, there exists an inverse function, denoted f −1 : A → A. By definition
of the inverse function,
f ◦ f −1 = f −1 ◦ f = idA .
Hence, (SA , ◦) has inverses. SA satisfies all the axioms of a group. 

We call SA the symmetric group on A. In the case that A = {1, 2, · · · , n}, then we write Sn
instead of the cumbersome S{1,2,...,n} . We call the elements of SA permutations of A.
92 CHAPTER 3. GROUPS

In Section 3.1, we discussed the symmetries of a regular n-gon in the plane. Though we described
the symmetries (reflections, rotations) with terms pertaining to the whole plane, we also described
the transformations as the symmetries on the set of vertices that preserve the n-gon incidence
structure. The symmetric group on a given set is simply the group of all bijections on that set
without imposing any conditions. In contrast to the regular n-gon that is not preserved by all of
Sn , the complete graph on n vertices (see Figure 3.7 with n = 6) is preserved as a graph under any
permutation in Sn .

3 2

4 1

5 6

Figure 3.7: The complete graph on six vertices

Proposition 3.4.3
|Sn | = n!.

Proof. A function from {1, 2, . . . , n} to another set is injective if and only if the range has n elements.
Hence, a function from {1, 2, . . . , n} to itself is a bijection if and only if it is an injection.
The order of Sn is the number of distinct bijections on {1, 2, . . . , n}. To enumerate the bijections,
we count the injections from {1, 2, . . . , n} to itself. Note that there are n options for f (1). Since
f (2) 6= f (1), for each choice of f (1), there are n − 1 choices for f (2). Given values for f (1) and f (2),
there are n − 2 possible choices for f (3) and so on. Since an enumeration of injections requires an
n-part decision, we use the product rule. Hence,

|Sn | = n(n − 1)(n − 2) · · · 3 × 2 × 1 = n! . 

Symmetric groups arise in a variety of natural contexts. In a 100-meter Olympic race, eight
runners are given lane numbers. The function from the runner’s lane number to the rank they place
in the race is a permutation of S8 . A cryptogram is a word game in which someone writes a message
but replacing each letter of the alphabet with another letter and a second person attempts to recover
the original message. The first person’s choice of how to scramble the letters of the alphabet is a
permutation in Sa , where a is the number of letters in the alphabet used. When someone shuffles a
deck of 52 cards, the resulting reordering of the cards represents a permutation in S52 .
We need a few convenient ways to visualize and represent a permutation on {1, 2, . . . , n}.

Directed Graph. A visual method of representing a permutation σ ∈ Sn involves using a directed


graph. Each element of {1, 2, . . . , n} is written as a point on the plane and we draw an arrow
from a to b if f (a) = b. In this way, a permutation will create a directed graph in which one
arrow leaves each point and arrives at each point. See Figure 3.8 for an example.

Chart Notation. Another way of writing a permutation is to record in a chart or matrix the
outputs like
 
1 2 ··· n
σ= .
σ(1) σ(2) · · · σ(n)
3.4. SYMMETRIC GROUPS 93

3
4 2

5 1

6 8
7

Figure 3.8: A permutation as a directed graph

Using the chart notation, the permutation in Figure 3.8 is written as


 
1 2 3 4 5 6 7 8
σ= .
3 8 7 4 6 2 1 5

n-tuple. If the value n is clear from context, then the top row of the chart notation is redundant.
Hence, we can represent the permutation σ by the n-tuple (σ(1), σ(2), . . . , σ(n)). Using the
n-tuple notation, the permutation in Figure 3.8 is written as σ = (3, 8, 7, 4, 6, 2, 1, 5).
Cycle Notation. A different notation turns out to be more useful for the purposes of group theory.
In cycle notation, the expression
σ = (a1 a2 · · · am1 )(am1 +1 am1 +2 · · · am2 ) · · · (amk−1 +1 amk−1 +2 · · · amk ),
where a` are distinct elements in {1, 2, . . . , n}, means that for any index i,
(
ai+1 if i 6= mj for all j
σ(ai ) =
amj−1 +1 if i = mj for some j,

where m0 = 0. Any of the expressions (amj−1 +1 amj−1 +2 · · · amj ) is called a cycle because σ
“cycles” through these elements in order as σ iterates.
Using the cycle notation for the permutation in Figure 3.8, we note that
σ(1) = 3, σ(3) = 7, and σ(7) = 1;
then σ(2) = 8, σ(8) = 5, σ(5) = 6, σ(6) = 2;
and then σ(4) = 4.
Therefore, in cycle notation, the permutation in Figure 3.8 is written as σ = (1 3 7)(2 8 5 6)(4).
There are many different ways of expressing a permutation using the cycle notation. For example,
as cycles, (1 3 7) = (3 7 1) = (7 1 3). Standard cycle notation imposes four additional habits. (1) If
σ is the identity function, we just write σ = id. (Advanced texts commonly refer to the identity
permutation as 1 but, for the moment, we will use id or idn in order to avoid confusion.) (2) We
write each cycle of σ starting with the lowest integer in the cycle. (3) The order in which we list
the cycles of σ is such that initial elements of each cycle are in increasing order. (4) Finally, we
omit any cycle of length 1. We say that a permutation in Sn is written in standard cycle notation
if it satisfies these requirements. The standard cycle notation for the permutation in Figure 3.8 is
σ = (1 3 7)(2 8 5 6).
An m-cycle is a permutation that in standard cycle notation consists of only one cycle of length
m. Two cycles are called disjoint if they involve no common integers. By the construction of the
standard cycle notation for a permutation, we notice that the cycles of σ must be disjoint. A 2-cycle
is also called a transposition because it simply interchanges (transposes) two elements and leaves
the rest fixed.
94 CHAPTER 3. GROUPS

Example 3.4.4. To illustrate the cycle notation, we list all the permutations in S4 in standard
cycle notation:

(1 2 3 4), (1 2 4 3), (1 3 2 4), (1 3 4 2), (1 4 2 3), (1 4 3 2),


(1 2 3), (1 3 2), (1 2 4), (1 4 2), (1 3 4), (1 4 3), (2 3 4), (2 4 3),
(1 2), (1 3), (1 4), (2 3), (2 4), (3 4),
(1 2)(3 4), (1 3)(2 4), (1 4)(2 3),
id .

We can verify that we have all the 3-cycles by calculating how many we should  have. Each cycle
consists of 3 integers. The number of ways of choosing 3 from 4 integers is 43 . For each selection
of 3 integers, we list the least one first in the cycle. Then, there are 2 options for how to order the
remaining two integers in the 3-cycle. Hence, there are 2 · 43 = 8 three-cycles in S4 .

4

The cycle type of a permutation describes how many disjoint cycles of a given length make up the
standard cycle notation of that permutation. Hence, we say that (1 3)(2 4) is of cycle type (a b)(c d).
Example 3.4.5. As another example, consider the symmetric group S6 . There are 6! = 720 ele-
ments in S6 . We count how many permutations there are in S6 of a given cycle type. In order to
count the 6-cycles, note that every integer from 1 to 6 appears in the cycle notation of a 6-cycle. In
standard cycle notation, we write 1 first and then all 5! = 120 orderings of {2, 3, 4, 5, 6} give distinct
6-cycles. Hence, there are 120 6-cycles in S6 .
We now count the permutations in S6 of the form σ = (a1 a2 a3 )(a4 a5 a6 ). The standard cycle
notation of a permutation has a1 = 1. To choose the values in a2 and a3 , there are 5 choices for
a2 and then 4 remaining choices for a3 . With a1 , a2 , and a3 chosen, we know that {a4 , a5 , a6 } =
{1, 2, 3, 4, 5, 6} − {a1 , a2 , a3 }. The value of a4 must be the minimum value of {a4 , a5 , a6 }. Then there
are two ways to order the two remaining elements in the second 3-cycle. Hence, there are 5 · 4 · 2 = 40
permutations that consist of the product of two disjoint 3-cycles.

cycle type number of elements


id 1
6
(a b) 2 = 15
(a b c) 2 · 63 = 40
(a b c d) 3! 64 = 90
(a b c d e) 4! 65 = 144
(a b c d e f ) 5! =  120
6
(a b c d)(e f ) 3! 4 = 90
(a b c)(d e f ) 40
(a b c)(d e) 2 63 =  120
1 6 4
(a b)(c d) 2 2  2 = 45
6 4
(a b)(c d)(e f ) 2 2 /3! = 15
Total: 720

The above table counts all the different permutations in S6 , organized by lengths of cycles in standard
cycle notation. 4

3.4.2 – Operations in Cycle Notation


When calculating operations in the symmetric group, we must remember that permutations are
bijective functions and that function composition is read from right to left. Because of this, if
σ, τ ∈ SA , then the composition στ (short for σ ◦ τ ) means the bijection where we apply τ first and
then σ.
Consider the following example in which we determine the cycle notation for a product. Suppose
that we are in S6 and σ = (1 4 2 6)(3 5) and τ = (2 6 3). We write

στ = (1 4 2 6)(3 5)(2 6 3)
3.4. SYMMETRIC GROUPS 95

and read from right to left how στ maps the integers as a composition of cycles, not necessarily
disjoint now. (In the diagrams below, the arrows below the permutation indicate the direction of
reading and the arrows above indicate the action of a cycle on an element along the way.)

( 1 4 2 6 ) ( 3 5 ) ( 2 6 3 ) so στ = (1 4 . . .
4 1
4 1

( 1 4 2 6 ) ( 3 5 ) ( 2 6 3 ) so στ = (1 4 2 . . .
2 4
2 4

( 1 4 2 6 ) ( 3 5 ) ( 2 6 3 ) so στ = (1 4 2)(3 . . .
1 2
6 2
Note that since στ (2) = 1, we closed the first cycle and start a new cycle with the smallest integer
not already appearing in any previous cycle of στ .

( 1 4 2 6 ) ( 3 5 ) ( 2 6 3 ) so στ = (1 4 2)(3 6 . . .
6 3
6 2

( 1 4 2 6 ) ( 3 5 ) ( 2 6 3 ) so στ = (1 4 2)(3 6 5).
5 6
5 3
And we are done. We closed the cycle at the end of 5 because all integers 1 through 6 already appear
in the standard cycle notation of στ so the cycle must be closed. However, it is a good practice to
verify by the same method that στ (5) = 3.
It should be obvious that the cycle notation expresses a permutation as the composition of
disjoint cycles.

Proposition 3.4.6
Disjoint cycles in Sn commute.

Proof. Let σ, τ ∈ Sn be disjoint cycles. Let i ∈ {1, 2, . . . , n}.


If i is one of the integers in the cycle of τ , then τ (i) is another integer in the cycle of τ . Since the
cycles are disjoint, σ(τ (i)) = τ (i). On the other hand, since i is not in the cycle σ, τ (σ(i)) = τ (i).
If i is one of the integers in the cycle of σ, then σ(i) is another integer in the cycle of σ. Since the
cycles are disjoint, τ (σ(i)) = σ(i). On the other hand, since i is not in the cycle τ , σ(τ (i)) = σ(i).
If i is neither one of the integers in the cycle of σ nor one of the integers in the cycle of τ , then
σ(τ (i)) = σ(i) = i and τ (σ(i)) = τ (i) = i.
Hence, σ(τ (i)) = τ (σ(i)) for all i ∈ {1, 2, . . . , n} so στ = τ σ. 

The fact that disjoint cycles commute implies that to understand powers and inverses of permuta-
tions, understanding of how powers and inverses work on cycles is sufficient. Indeed, if τ1 , τ2 , . . . , τk
are disjoint cycles and σ = τ1 τ2 · · · τk , then

σ m = τ1m τ2m · · · τkm , for all m ∈ Z, and in particular σ −1 = τ1−1 τ2−1 · · · τk−1 .

Consequently, some properties about a permutation σ and its powers depend only on the cycle type
of σ. We leave a number of these results for the exercises but we state one proposition here because
of its importance.
96 CHAPTER 3. GROUPS

Proposition 3.4.7
For all σ ∈ Sn , the order |σ| is the least common multiple of the lengths of the disjoint
cycles in the standard cycle notation of σ.

Proof. (Left as an exercise for the reader. See Exercise 3.4.18.) 

The cycle notation also makes it easy to find the inverse of a permutation. The inverse function
to a permutation σ simply involves reading the cycle notation backwards.
Example 3.4.8. Let σ = (1 3 7)(2 5 4)(6 10) in S10 . We propose to calculate σ −1 and then to
determine the order of σ by calculating all the powers of σ.
To calculate σ −1 , we read the cycles backwards so

σ −1 = (7 3 1)(4 5 2)(10 6) = (1 7 3)(2 4 5)(6 10).

The second equals follows by rewriting the cycle with the lowest integer first. This is equivalent to
starting at the lowest integer in the cycle and reading the cycle backwards.
For the powers of σ we have

σ = (1 3 7)(2 5 4)(6 10),


σ 2 = (1 3 7)(2 5 4)(6 10)(1 3 7)(2 5 4)(6 10) = (1 7 3)(2 4 5),
σ 3 = σ 2 σ = (1 7 3)(2 4 5)(1 3 7)(2 5 4)(6 10) = (6 10),
σ 4 = σ 3 σ = (6 10)(1 3 7)(2 5 4)(6 10) = (1 3 7)(2 5 4),
σ 5 = σ 4 σ = (1 3 7)(2 5 4)(1 3 7)(2 5 4)(6 10) = (1 7 3)(2 4 5)(6 10),
σ 6 = σ 5 σ = (1 7 3)(2 4 5)(6 10)(1 3 7)(2 5 4)(6 10) = id .

The result |σ| = 6 and this illustrates Proposition 3.4.7. 4

We briefly consider the product of cycles that are not disjoint. Some of the simplest products
involve two transpositions,

(1 2)(1 3) = (1 3 2) and (1 3)(1 2) = (1 2 3).

The fact that these products are different establishes the following proposition.

Proposition 3.4.9
The group Sn is nonabelian for all n ≥ 3.

3.4.3 – Even and Odd Permutations


Permutations and the symmetric group arise in many areas of mathematics, especially in combina-
torics, a field that studies techniques for counting the possible arrangements in any kind of discrete
structure.
As one example, suppose that we consider 5 events in history and attempt to remember the
order in which they occurred. There are 5! = 120 possible orderings of this time line. Suppose that
we number the events in historical order as E1 , E2 , E3 , E4 , E5 and suppose that someone guesses
the historical order as G1 , G2 , G3 , G4 , G5 . Any guess about their historical order corresponds to a
permutation σ ∈ S5 via
Gσ(i) = Ei for all i ∈ {1, 2, 3, 4, 5}.
This means that the person guessed the actual ith historical event to be the σ(i)th event in chrono-
logical order.
Suppose that someone guesses the chronological order of the births of five mathematicians and
puts them in the following order.
3.4. SYMMETRIC GROUPS 97

Correct order Guessed order

Niels Abel Carl Jacobi

Carl Jacobi Evariste Galois

Evariste Galois Niels Abel

Karl Weierstrass Ada Lovelace

Ada Lovelace Karl Weierstrass

The corresponding permutation is σ = (1 3 2)(4 5).


What is a natural way to evaluate how incorrect the guess is? If a guess was correct except for
interchanging the first two, i.e. σ = (1 2), that should not be considered egregious. The worst guess
would reverse the chronological order, i.e. σ = (1 5)(2 4). A measure of incorrectness for the guessed
ordering is to count the number of inversions.

Definition 3.4.10
Let n be an integer with n ≥ 2. Define Tn as the set
n o
(i, j) ∈ {1, 2, . . . , n}2 | i < j .

The number of inversions of σ ∈ Sn is

inv(σ) = {(i, j) ∈ Tn | σ(i) > σ(j)} .

In other words, Tn consists of all possible pairs (i, j) of indices, where the first index is less than
the second and inv(σ) is the number of times σ would reverse the order of the pair. The set Tn has
cardinality
n−1 n−1
X X (n − 1)n (n − 1)n
|Tn | = (n − i) = n(n − 1) − i = n(n − 1) − = .
i=1 i=1
2 2

Hence, 0 ≤ inv(σ) ≤ 21 n(n − 1).


In the above example about birth order of five mathematicians, σ acts on the pairs of T5 as
follows.
(i, j) (1, 2) (1, 3) (1, 4) (1, 5) (2, 3) (2, 4) (2, 5) (3, 4) (3, 5) (4, 5)
(σ(i), σ(j)) (3, 1) (3, 2) (3, 5) (3, 4) (1, 2) (1, 5) (1, 4) (2, 5) (2, 4) (5, 4)

Hence, we find that inv(σ) = 3.

Definition 3.4.11
A permutation σ ∈ Sn is called even (resp. odd ) if inv(σ) is an even (resp. odd) integer.
The designation even or odd is called the parity of the permutation σ.

We conclude this section with a few propositions about the parity of permutations.

Proposition 3.4.12
The number of inversions of a transposition is odd. More precisely, inv((a b)) = 2(b − a) − 1.
98 CHAPTER 3. GROUPS

Proof. (Left as an exercise for the reader. See Exercise 3.4.32.) 

Proposition 3.4.13
Let σ, τ ∈ Sn . Then
inv(στ ) ≡ inv(σ) + inv(τ ) (mod 2).

Proof. Define the following numbers that depend on σ and τ :

k11 = {(i, j) ∈ Tn | τ inverts (i, j) and σ inverts (τ (i), τ (j))} ,


k12 = {(i, j) ∈ Tn | τ inverts (i, j) and σ does not invert (τ (i), τ (j))} ,
k21 = {(i, j) ∈ Tn | τ does not invert (i, j) and σ inverts (τ (i), τ (j))} ,
k22 = {(i, j) ∈ Tn | τ does not invert (i, j) and σ does not invert (τ (i), τ (j))} .

Notice that inv(στ ) = k12 + k21 , inv(σ) = k11 + k21 , and inv(τ ) = k11 + k12 . Hence,

inv(σ) + inv(τ ) = k12 + k21 + 2k11 ≡ k12 + k21 ≡ inv(στ ) (mod 2). 

The following theorem about how the parity of permutations relates to composition is an imme-
diate corollary of Proposition 3.4.13.

Theorem 3.4.14
The composition of two even permutations or of two odd permutations is an even permu-
tation. The composition of an odd and an even permutation is an odd permutation.

Example 3.4.15 (Vandermonde Polynomials). The Vandermonde polynomial of n variables


x1 , x2 , . . . , xn is the multivariable polynomial
Y
(xj − xi ).
1≤i<j≤n

n
This product has 2 terms (and hence has degree n2 ). Note that each term in the product
 

corresponds uniquely to one pair in Tn .


For a given σ ∈ Sn and a given pair (i, j) ∈ Tn , the term (xσ(j) − xσ(i) ) is equal to ±(xj 0 − xi0 )
for some other pair (i0 , j 0 ) ∈ Tn and with the sign being negative if and only if σ inverts the pair
(i, j). Since sign(σ) = (−1)inv(σ) , the sign of a permutation satisfies
Y Y
(xσ(j) − xσ(i) ) = sign(σ) (xj − xi ).
1≤i<j≤n 1≤i<j≤n

Some authors give this property of the Vandermonde polynomial as the definition for sign(σ), without
reference to inversions. 4

We conclude this section with a characterization of even and odd permutations that further
justifies the terminology.

Theorem 3.4.16
A permutation is even (resp. odd) if and only if it can be written as a product of an even
(resp. odd) number of transpositions.

Proof. By Exercise 3.4.11, every permutation σ ∈ Sn can be expressed as a product of transpositions

σ = τ1 τ2 · · · τm .
3.4. SYMMETRIC GROUPS 99

By Proposition 3.4.13,

inv(σ) ≡ inv(τ1 ) + inv(τ2 ) + · · · + inv(τm ) (mod 2).

By Proposition 3.4.12, inv(τi ) is odd for all i and hence inv(σ) is even if and only if m is even. The
theorem follows. 

Exercises for Section 3.4


1. Write the standard cycle notation for the following permutations expressed in chart notation.
 
1 2 3 4 5 6 7
(a) σ =
6 3 2 4 7 1 5
 
1 2 3 4 5 6 7
(b) τ =
3 5 7 2 1 6 4
2. Suppose the permutation σ ∈ S8 given in n-tuple notation is (3, 2, 6, 4, 5, 8, 7, 1). Depict σ with a
directed graph and express it in standard cycle notation.
3. Suppose the permutation σ ∈ S9 given in n-tuple notation is (4, 6, 5, 2, 3, 1, 8, 7, 9). Depict σ with a
directed graph and express it in standard cycle notation.
4. In S6 , with σ = (1 3 5)(2 6) and τ = (1 3 4 5 6), calculate: a) στ ; b) τ σ; c) σ 2 ; d) τ −1 ; e) στ σ −1 .
5. In S7 , with σ = (1 4)(2 6)(3 5 7) and τ = (1 6 7), calculate: a) στ ; b) τ σ; c) τ −1 σ 2 ; d) στ σ −1 .
6. List all the cycle types in S7 .
7. Let σ = (a0 a1 a2 · · · am−1 ) be an m cycle in Sn . Prove that σ k (ai ) = a(i+k) mod m . Conclude that
the order of σ is m.
8. Let σ be an m-cycle in Sn . Prove that σ k is an m-cycle if and only if gcd(k, m) = 1.
9. Suppose that ai are distinct positive integers of i = 1, 2, . . . , m. Prove that (a1 a2 a3 · · · am ) =
(a1 am ) · · · (a1 a4 )(a1 a3 )(a1 a2 ). Use this to show that an m-cycle is even if and only if m is odd.
10. Suppose that ai are distinct positive integers of i = 1, 2, . . . , m. Prove that (a1 a2 a3 · · · am ) =
(a1 a2 ) · · · (am−2 am−1 )(am−1 am ). Use this to show that an m-cycle is even if and only if m is odd.
11. Use Exercise 3.4.9 (or Exercise 3.4.10) to show that every permutation can be written as a product of
transpositions.
12. Describe the Shell Game using the concept of products of transpositions in S3 .
13. Let σ be an m-cycle and suppose that d < m divides m. Prove that the standard cycle notation of σ d
is the product of d disjoint cycles of length m
d
.
14. What is the highest possible order for permutations in S11 ? Illustrate your answer with a specific
element having that order.
15. What is the highest possible order of an element in each of the following groups. Illustrate with a
specific element.
(a) S5 ⊕ D11
(b) S5 ⊕ S5
16. What is the highest possible order of an element in S7 ⊕ S7 ⊕ S7 ? Illustrate your answer with a specific
element having that order.
17. Prove that a permutation σ ∈ Sn satisfies σ −1 = σ if and only if σ is the identity or, in standard cycle
notation, consists of a product of disjoint 2-cycles.
18. Prove Proposition 3.4.7. [Hint: Use Exercise 3.4.7.]
19. In some Sn , find two elements σ and τ such that |σ| = |τ | = 2 and |στ | = 3.
20. In some Sn , find two elements σ and τ such that |σ| = |τ | = 2 and |στ | = 4.
21. How many permutations of order 4 does S7 have?
22. How many even permutations of order 5 does S8 have? Odd permutations?
n(n − 1)(n − 2) · · · (n − m + 1)
23. Suppose that m ≤ n. Prove that the number of m-cycles in Sn is .
m
100 CHAPTER 3. GROUPS

24. Suppose that n ≥ 4. Prove that the number of permutations of cycle type (ab)(cd) in Sn is

n(n − 1)(n − 2)(n − 3)


.
8

25. Show that the function f : Z/10Z → Z/10Z defined by f (a) = a3 is a permutation on Z/10Z and write
f in cycle notation as an element of S10 . (Use the bijection g : {1, 2, . . . , 10} → Z/10Z with g(a) = a
to set up numerical labels for elements in Z/10Z.)

26. Repeat the previous question with Z/11Z instead of Z/10Z.

27. Calculate the set {|σ| σ ∈ S7 }, i.e., the set of orders of elements in S7 .

28. We work in the group Sn for n ≥ 3. Prove or disprove that if σ1 and σ2 have the same cycle type
and τ1 and τ2 have the same cycle type, then σ1 τ1 has the same cycle type as σ2 τ2 . Does your answer
depend on n?

29. In a six-contestant steeple race, the horses arrived in the order C, B, F, A, D, E. Suppose someone
predicted they would arrive in the order, F, E, C, B, D, A. How many inversions are in the guessed
ordering?

30. In S5 , count the number of inversions of the following permutations: (a) σ = (1 4 2 5); (b) τ =
(1 4 3)(2 5); (c) ρ = (1 5)(2 3).

31. In S6 , count the number of inversions of the following permutations: (a) σ = (1 3 5 6 2); (b) τ =
(1 6)(2 3 4); (c) ρ = (1 3 5)(2 4 6).

32. Consider a 2-cycle in Sn of the form τ = (a b). Prove that inv(τ ) = 2(b − a) − 1.

33. Let A be an n × n matrix and let σ ∈ Sn . Suppose that A0 (respectively A00 ) is the matrix obtained
from A by permuting the columns (respectively rows) of A according to the permutation σ. Prove
that det(A0 ) = det(A00 ) = sign(σ) det(A).

34. (Challenge) Prove that the sum of inversions of all the permutations in Sn is n!n(n − 1)/4. In other
words, prove that
X n!n(n − 1)
inv(σ) = .
σ∈S
4
n

35. Show by example that a permutation can be written in more than one way as a product of transposi-
tions. Prove that if σ = τ1 τ2 · · · τm and σ = ε1 ε2 · · · εn are two different expressions of σ as a product
of transpositions, then m and n have the same parity.

36. Show that for all σ ∈ Sn , the number of inversions inv(σ −1 ) = inv(σ). Conclude that σ and σ −1 have
the same parity.

37. Show that for all σ, τ ∈ Sn , the element στ σ −1 has the same parity as τ .

3.5
Subgroups
In any algebraic structure, it is common to consider a subset that carries the same algebraic struc-
ture. In linear algebra, for example, we encounter subspaces of a vector space. In Section 1.4, we
encountered subposets. This section presents subgroups.
3.5. SUBGROUPS 101

3.5.1 – Subgroup: Definition and Examples

Definition 3.5.1
Let G be a group. A nonempty subset H ⊆ G is called a subgroup if
(1) ∀x, y ∈ H, xy ∈ H (closed under operation);
(2) ∀x ∈ H, x−1 ∈ H (closed under taking inverses).

If H is a subgroup of G, we write H ≤ G.

Since a subgroup H of a group G must be nonempty, there exists some x ∈ H. Since H is


closed under taking inverses, then x−1 ∈ H. Since H is closed under the group operation, then
e = xx−1 ∈ H. Hence, a subgroup contains the identity element. The property of associativity
is inherited from associativity in G and so since e ∈ H and H is closed under taking inverses, H
equipped with the binary operation on G is a group in its own right. (As a point of terminology, it
is important to understand that we do not say that “a group is closed under an operation.” Such
a statement is circular since a binary operation on G by definition maps any pair of elements in G
back into G. The terminology of “closed under an operation” applies to strict subsets of G.)
Example 3.5.2. With the usual addition operation, Z ≤ Q ≤ R ≤ C. With the multiplication
operation we have Q∗ ≤ R∗ ≤ C∗ . However, Z∗ is not a subgroup of Q∗ , written Z∗ 6≤ Q∗ , because
even though Z∗ is closed under multiplication, it is not closed under taking multiplicative inverses.
For example 2−1 = 12 ∈/ Z∗ . 4

Example 3.5.3. Any group G always has at least two subgroups, the trivial subgroup {e} and all
of G. 4

Example 3.5.4. If G = Dn then R = {ι, r, r2 , . . . , rn−1 } is a subgroup. This is the subgroup of


rotations. Also for all integers i between 0 and n − 1, the subsets Hi = {ι, sri } are subgroups. These
subgroups of two elements correspond to reflection about various lines of symmetry. 4

Example 3.5.5. Let G = Sn and consider the subset of permutations that leave the elements
{m + 1, m + 2, . . . n} fixed. This is a subgroup of Sn that consists of all elements in Sm . 4

Example 3.5.6 (Alternating Group). Theorem 3.4.14 shows that the set of even permutations
in Sn is closed under composition. Furthermore, for any permutation σ ∈ Sn must invert the same
number of pairs in Tn as defined in Definition 3.4.10. Hence, the subset of even permutations is
closed under inversion. Thus, the set of even permutations is a subgroup of Sn . The subset of even
permutations in Sn is called the alternating group on n elements and is denoted by An .
If n = 4, then the elements of A4 are
A4 = {id, (123), (124), (132), (134), (142), (143), (234), (243), (12)(34), (13)(24), (14)(23)}. 4

Example 3.5.7 (A Nonexample). As a nonexample, note that U (n) is not a subgroup of Z/nZ.
Even though U (n) is a subset of Z/nZ, the former involves the multiplication operation in modular
arithmetic while the latter involves the addition operation in modular arithmetic. If we considered
the pair (U (n), +), we have 1 and n − 1 in U (n) but 1 + n − 1 = 0 ∈ / U (n). Hence, U (n) is not
closed under addition. 4

The definition of a subgroup has two criteria. It turns out that these two can be combined into
one. This result shortens a number of subsequent proofs.

Proposition 3.5.8 (One-Step Subgroup Criterion)


A nonempty subset H of a group G is a subgroup if and only if ∀x, y ∈ H, xy −1 ∈ H.
102 CHAPTER 3. GROUPS

Proof. (=⇒) If H is a subgroup, then for all x, y ∈ H, the element y −1 ∈ H and hence xy −1 ∈ H.
(⇐=) Suppose that H is a nonempty subset with the condition described in the statement of
the proposition. First, since H is nonempty, ∃x ∈ H. Using the one-step criterion, xx−1 = e ∈ H.
Second, since e and x ∈ H, using the one-step criterion, ex−1 = x−1 ∈ H. We have now proven
that H is closed under taking inverses. Finally, for all x, y ∈ H, we know that y −1 ∈ H so, using
the one-step criterion again, x(y −1 )−1 = xy ∈ H. Hence, H is closed under the group operation. 

The following example introduces an important group but uses the One-Step Subgroup Criterion
to prove it is a group.
Example 3.5.9 (Special Linear Group). Let F denote Q, R, C, or Fp . Define the subset of
GLn (F ) by
SLn (F ) = {A ∈ GLn (F ) | det A = 1}.
Obviously, SLn (F ) is not empty because the identity matrix has determinant 1 so it is in SLn (F ).
Furthermore, according to properties of the determinant,

det(AB −1 ) = det(A) det(B −1 ) = det(A) det(B)−1 = 1 for all A, B ∈ SLn (F ).

Hence, according to the One-Step Subgroup Criterion, SLn (F ) ≤ GLn (F ). We call SLn (F ) the
special linear group. 4

If the subset H is finite, there is another simplification.

Proposition 3.5.10 (Finite Subgroup Test)


Let G be a group and let H be a nonempty finite subset of G. If H is closed under the
operation, then H is a subgroup of G.

Proof. Given the hypotheses of the proposition, in order to establish that H is a subgroup of G, we
only need to show that it is closed under taking inverses.
Let x ∈ H. Since H is closed under the operation, xn ∈ H for all positive integers n. Thus,
the set S = {xn | n ∈ N∗ } is a subset of H and hence finite. Therefore, there exists m, n ∈ N∗ with
n 6= m such that xm = xn . Without loss of generality, suppose that n > m. Then n − m is a positive
integer and xn−m = e. Thus, x has finite order, say |x| = k. If x = e, then it is its own inverse. If
x 6= id, then k ≥ 2. Since k − 1 is a positive integer, xk−1 = x−1 ∈ S ⊆ H. This shows that H is
closed under taking inverses. 

It is also useful to be aware of the interaction between subgroups and subset operations.

Proposition 3.5.11
Let G be a group and let H and K be two subgroups of G. Then H ∩ K ≤ G.

Proof. Note that 1 ∈ H ∩ K so H ∩ K is not empty. Let x, y ∈ H ∩ K. Then since x, y ∈ H, by the


One-Step Subgroup Criterion, xy −1 ∈ H. Similarly for xy −1 ∈ K. Consequently, xy −1 ∈ H ∩ K
and by the One-Step Subgroup Criterion again, we conclude that H ∩ K is a subgroup of G. 

On the other hand, the union of two subgroups is not necessarily another subgroup and the set
difference of two subgroups is never another subgroup. Consequently, in relation to the operations
of union and intersection on subsets, subgroups behave in a similar way as subspaces of a vector
space do: Intersections are again subgroups while unions usually are not and set differences are never
subgroups.
By an induction reasoning, knowing that the intersection of two subgroups is again a subgroup
implies that any intersection of a finite number of subgroups is again a subgroup. However, it is
also true that a general intersection (not necessarily finite) of a collection of subgroups is again a
subgroup. (See Exercise 3.5.25.)
3.5. SUBGROUPS 103

3.5.2 – Abstract Examples


A number of naturally constructed subsets in a group are always subgroups. Many play central roles
in understanding the internal structure of a group so we present a few such subgroups here.

Definition 3.5.12
The center Z(G) is the subset of G consisting of all elements that commute with every
other element in G. In other words,

Z(G) = {x ∈ G | xg = gx for all g ∈ G}.

Proposition 3.5.13
Let G be any group. The center Z(G) is a subgroup of G.

Proof. Note that 1 ∈ Z(G), so Z(G) is nonempty. Let x, y ∈ Z(G). Then


(xy)g = x(yg) = x(gy) = (xg)y = (gx)y = g(xy)
so Z(G) is closed under the operation. Let x ∈ Z(G). By definition xg = gx so g = x−1 gx and
gx−1 = x−1 g. Thus, x−1 ∈ Z(G) and we conclude that Z(G) is closed under taking inverses. 
Note that Z(G) = G if and only if G is abelian. On the other hand, Z(G) = {1} means that
the identity is the only element that commutes with every other element. Intuitively speaking,
Z(G) gives a measure of how far G is from being abelian. The center itself is an abelian subgroup.
However, Z(G) is not necessarily the largest abelian subgroup of G.
Example 3.5.14. Let F be Q, R, C or Fp (where p is prime). In this example, we prove that
Z(GLn (F )) = {aI | a 6= 0}, (3.5)
where I is the identity matrix in GLn (F ).
By properties of matrix multiplication, for all matrices B ∈ GLn (F ) we have B(aI) = a(BI) =
aB = (aI)B. Hence, {aI | a 6= 0} ⊆ Z(GLn (F )). The difficulty lies is proving the reverse inclusion.
Let Eij be the n × n matrix consisting of zeros in all entries except for a 1 in the (i, j)th entry.
The matrix Eij is not in GLn (F ) but I + Eij is. Since BI = IB for all B ∈ GLn (F ), then
B(I + Eij ) = (I + Eij )B if and only if BEij = Eij B. Thus, all B ∈ Z(GLn (F )) satisfy the matrix
product BEij = Eij B for all 1 ≤ i, j ≤ n.
We can show that BEij is the matrix of zeros everywhere except for the jth row of B as its
ith row and that Eij B is the matrix of zeros everywhere except for the ith column of B as its jth
column. (See Exercise 3.5.32.) Thus, for a particular pair (i, j), the identity BEij = Eij B implies
that
bkj = 0 for k with 1 ≤ k ≤ n,
bik = 0 for k with 1 ≤ k ≤ n, and
bii = bjj .
Hence, if B ∈ Z(GLn (F )), then BEij = Eij B for all pairs (i, j). Therefore, all off-diagonal elements
of B are zero and all diagonal elements of B are equal. This establishes Z(GLn (F )) ⊆ {aI | a 6= 0}
and we deduce (3.5). 4

Definition 3.5.15
The centralizer of A in G is the subset

CG (A) = {g ∈ G | gag −1 = a for all a ∈ A}.

In other words, g ∈ CG (A) if and only if g commutes with every element in A.


104 CHAPTER 3. GROUPS

The operation gag −1 occurs in many different areas of group theory. The element gag −1 is called
the conjugation of a by g. The condition that gag −1 = a is tantamount to ga = ag. Consequently,
the centralizer consists of all elements in G that commute with every element of the subset A.
The center Z(G) of a group is a particular example of a centralizer, namely Z(G) = CG (G). If
A = {a}, i.e., is a singleton set, then we write CG (a) instead of CG ({a}).

Proposition 3.5.16
For any subset A of G, the set CG (A) is a subgroup of G.

Proof. Since 1 ∈ CG (A), we know CG (A) 6= ∅.


Let x, y ∈ CG (A) be arbitrary. By definition, xa = ax and ya = ay for all a ∈ A. Hence,

(xy)a = x(ya) = x(ay) since y ∈ CG (A)


= (xa)y = (ax)y since x ∈ CG (A)
= a(xy).

Thus, xy ∈ CG (A).
Let x ∈ CG (A). Then for all a ∈ A since xax−1 = a, then xa = ax and a = x−1 ax. Thus,
−1
x ∈ CG (A).
We conclude that CG (A) is a subgroup of G. 

In order to present the next construction that always gives a subgroup, we introduce some
notation. Let A ⊆ G and let g ∈ G. Then we define the subsets gA, Ag, and gAg −1 as

gA = {ga | a ∈ A}, Ag = {ag | a ∈ A}, gAg −1 = {gag −1 | a ∈ A}.

For the set gA (and similarly for the other two sets), the function f : A → gA defined by f (a) = ga
is a bijection with inverse function f −1 (x) = g −1 x. Consequently, gA, Ag and gAg −1 have the same
cardinality as A.

Definition 3.5.17
Let A be any subset of G. We also define the normalizer of A in G as

NG (A) = {g ∈ G | gAg −1 ⊆ A}.

Proposition 3.5.18
The normalizer NG (A) is a subgroup of G.

Proof. (Left as an exercise for the reader. See Exercise 3.5.34.) 

It is easy to see that in general, for any subset A of G, we have

Z(G) ≤ CG (A) ≤ NG (A) ≤ G.

Furthermore, if G is abelian then equality holds at each ≤, regardless of the subset A.

3.5.3 – Subgroups Generated by a Subset


Let S be a subset of a group G. Providing a complete list of elements in S or describing S by
some property are the typical two ways of uniquely specifying S. However, subgroups H ≤ G
carry additional properties, namely that it is closed under the group operation and taking inverses.
Consequently, if we know that a certain subset S is contained in H, then we also know that all
3.5. SUBGROUPS 105

elements obtained by repeated operations or inverses from elements in S must also be in H. The
concept of generating a subgroup by a subset makes this idea precise.

Definition 3.5.19
Let S be a nonempty subset of a group G. We define hSi as the subset of “words” made
from elements in S, that is to say

hSi = {sα1 α2 αn
1 s2 · · · sn | n ∈ N, si ∈ S, αi ∈ Z}.

The subset hSi is called the subgroup of G generated by S. (Note that the si are not
necessarily distinct.)

It is not hard to see that hSi is indeed a subgroup of G. Since S 6= ∅ and S ⊆ hSi, then hSi is
β1 β2
not empty. For two expressions x = sα 1 α2 αn βm
1 s2 · · · sn and y = t1 t2 · · · tm in hSi, we have

xy −1 = sα 1 α2 αn −βm
1 s2 · · · sn tm · · · t−β
2
2 −β1
t1 .

This product is again an element of hSi so by the One-Step Subgroup Criterion, hSi ≤ G.
Example 3.5.20. Consider the dihedral group on the hexagon G = D6 . The subgroup hri consists
of all powers of the elements r. Hence, hri = {ι, r, r2 , r3 , r4 , r5 }. Notice that hri is the subgroup of
rotations. We can visualize these rotations as follows:

ι r r2 r3 r4 r5

The subgroup hsi = {1, s} consists of only two elements, reflection across the reference axis of
symmetry and the identity transformation. With a similar visualization we have:

ι s

The subgroup hs, r2 i, contains the elements ι, s, r2 , and r4 simply by taking powers of elements in
{s, r2 }. However, hs, r2 i also contains sr2 and sr4 . The defining relation on s and r give ra s = sr6−a .
Hence, as we apply this relation, the parity on the power of r does not change. Hence,

hs, r2 i = {ι, r2 , r4 , s, sr2 , sr4 }.

Again, a visual representation of this subgroup is the following:

ι r2 r4 s sr2 sr4

The subgroup hs, sri obviously contains s but also contains r = s(sr). Hence, hs, sri = D6
because it contains all rotations and all reflections. 4

Obviously, for any element a in a group G, the subgroup hai is a cyclic subgroup of G whose
order is precisely the order |a|. It is important to note that distinct sets of generators may give that
same subgroup. In the previous example, we noted that hr, sri = hr, si. This occurs even with cyclic
subgroups. For example, in D6 , the rotation subgroup is hri = hr5 i. In D6 , we also have hr2 i = hr4 i.
106 CHAPTER 3. GROUPS

Example 3.5.21. Consider the group S4 . Let H = h(1 3), (1 2 3 4)i. We list out all the elements of
H. By taking powers of the generators, we know that

id, (1 3), (1 2 3 4), (1 2 3 4)2 = (1 3)(2 4), (1 2 3 4)3 = (1 4 3 2)

are all in H. By taking products of generators and their powers, H also contains

(1 3)(1 2 3 4) = (1 2)(3 4), (1 3)(1 2 3 4)2 = (2 4), (1 3)(1 2 3 4)3 = (1 4)(2 3).

It is also easy to calculate that

(1 2 3 4)(1 3) = (1 4)(2 3), (1 3)(1 2 3 4)2 (1 3) = (2 4), (1 2 3 4)3 (1 3) = (1 2)(3 4).

At this point, we may suspect that

H = {id, (1 3), (1 2 3 4), (1 3)(2 4), (1 4 3 2), (2 4), (1 2)(3 4), (1 4)(2 3)}

but we have not yet proven that H does not have any other elements. However, the identity
(1 3)(1 2 3 4) = (1 4 3 2)(1 3) shows that though (1 3) and (1 2 3 4) do not commute, it is possible to
pass (1 2 3 4) to the left of (1 3) by changing the power on the 4-cycle. Hence, every element in H
can be written as (1 2 3 4)a (1 3)b where a = 0, 1, 2, 3 and b = 0, 1. Thus, we have indeed found all
the elements in H. 4

Definition 3.5.22
A group (or a subgroup) is called finitely generated if it is generated by a finite subset.

A finite group is always finitely generated. Indeed, a finite group G is generated (not minimally
so) by G itself. On the other hand, in the group (Z, +) we have h1i = Z which gives a simple
example of a infinite group that is finitely generated. It is not hard to find a group that is not
finitely generated.

Example 3.5.23. Let (Q>0 , ×) be the multiplicative group of positive rational numbers and let P
be the set of prime numbers. Every positive rational number r can be written in the form

r = pα1 α2 αm
1 p2 · · · pm

where αi ∈ Z. In our usual way of writing fractions, any prime pi with αi < 0 would be in the
prime factorization of the denominator and with αi > 0 would be in the prime factorization of the
numerator. Consequently, P is a generating set of Q>0 .
This does not yet imply that (Q>0 , ×) is not finitely generated. We now show this by contra-
diction. Assume that it is generated by a finite set {r1 , r2 , . . . , rk } of rational numbers. The prime
factorizations of the numerators and denominators of all the ri (written in reduced form) involve
a finite number of primes, say {p1 , p2 , . . . , pn }. Let p0 be a prime not in {p1 , p2 , . . . , pn }. Then
p0 ∈ Q>0 but p0 6∈ hr1 , r2 , . . . , rk i. Hence, (Q>0 , ×) is not finitely generated. 4

Exercises for Section 3.5

In Exercises 3.5.1 through 3.5.16, prove or disprove that the given subset A of the given group G is a subgroup.
1. G = Z with addition and A are the multiples of 5.
2. G = (Q, +) and A is the set of rational numbers with odd denominators (when written in reduced
form).
p2
3. G = (Q∗ , ×) and A is the set of rational numbers of the form q2
.
4. G = (C∗ , ×) and A = {a + ai | a ∈ R}.
3.5. SUBGROUPS 107

5. G = Z/12Z and A = {0, 4, 8}.


6. G = U (11) and A = {1, 2, 9, 10}.
7. G = S5 and A is the set of transpositions.
8. G = D6 and A = {ι, s, r2 , sr2 }.
9. G = D6 and A = {ι, s, r3 , sr3 }.
q
10. G = (R∗ , ×) and A = { pq | pq ∈ Q>0 }.
q
11. G = (R, +) and A = { pq | pq ∈ Q>0 }.

12. G = U (30) and A = {1, 7, 13, 19}.


13. G = GL2 (R) and A is the subset of matrices with integer coefficients.
14. G = SR , i.e., the set of bijections from R to R with the operation of composition, and A is the subset
of functions f (x) = xp/q , where p and q are odd integers.
15. G = SR and let A = {f ∈ G | f (Z) ⊆ Z}.
16. G = SR and let A = {f ∈ G | f (Z) = Z}.
17. Find an example to illustrate that the union of two subgroups is not necessarily another subgroup.
18. Let G be an abelian group that is not necessarily finite. Define the subset Tor(G) to be the subset
of elements that have finite order. Prove that Tor(G) is a subgroup. [Tor(G) is called the torsion
subgroup of G.]
19. Prove that a finite group G with order greater than 2 cannot have a subgroup H with |H| = |G| − 1.
20. Let G1 and G2 be two groups. Prove that {(x, e2 ) | x ∈ G1 } and that {(e1 , y) | y ∈ G2 } are subgroups
of G1 ⊕ G2 .
21. Determine all the finite subgroups of (R∗ , ×).
22. Let F be Q, R, C, or Fp (or more generally any field). Prove that the subset in GLn (F ) of upper
triangular matrices is a subgroup. [Recall that a matrix is upper triangular if all the entries below the
main diagonal are 0.]
23. Let F be as in the previous exercise. Prove that the subset of GLn (F ) of diagonal matrices is an
abelian subgroup. Explain why this abelian subgroup is strictly larger than Z(GLn (F )).
24. Let G be an abelian group. Prove that the following two subsets are subgroups
(a) {g n | g ∈ G}, where n is a fixed integer.
(b) {g ∈ G | g n = e}, where n is a fixed integer.
25. Let G be a group and let {Hi }i∈C be a collection of subgroups of G, not necessarily finite. Prove that
the intersection \
Hi
i∈C

is a subgroup of G.
\
26. Prove that Z(G) = CG (a).
a∈G

27. Prove that the center Z(Dn ) of the dihedral group is {ι} if n is odd and {ι, rn/2 } if n is even.
28. Prove that for all n ≥ 3, the center of the symmetric group is Z(Sn ) = {id}.
29. In the group Dn , calculate the centralizer and the normalizer for each of the subsets

(a) {s}, (b) {r}, and (c) {ι, r, r2 , . . . , rn−1 }.

30. For the given group G, find the CG (A) and NG (A) of the respective sets A.
(a) G = S3 and A = {(1 2 3)}.
(b) G = S5 and A = {(1 2 3)}.
(c) G = S4 and A = {(1 2)}.
31. For the given group G, find the CG (A) and NG (A) of the respective sets A.
(a) G = D6 and A = {s, r2 }.
108 CHAPTER 3. GROUPS

1
σ = (1 3 4)

120◦
2

4
3

Figure 3.9: A rigid motion of a tetrahedron

(b) G = Q8 and A = {i, j}.


(c) G = S4 and A = {(1 2), (3 4)}.
32. Prove the assertion in Example 3.5.14 that BEij is the matrix of zeros everywhere except that it has
the ith column of B as its jth column, and that Eij B is the matrix of zeros everywhere except that
it has the jth row of B in the ith row.
33. Let V1 ⊆ V2 be subsets of a group G.
(a) Prove that CG (V2 ) ≤ CG (V1 ).
(b) Prove that NG (V2 ) ≤ NG (V1 ).
34. Prove Proposition 3.5.18.
35. In (Z, +), list all the elements in the subgroup h12, 20i.
36. In D8 , list all the element in the subgroup hsr2 , sr6 i.
   
1 1 1 0
37. In GL2 (F3 ), list all the elements in , .
0 1 1 1
38. Let G be a group and let S ⊆ G be a subset. Let C be the collection of subgroups of G that contain
S. Prove that \
hSi = H.
H∈C

[Hint: Use Exercise 3.5.25. From this exercise, we conclude that hSi is the smallest subgroup by
inclusion that contains S.]
39. Prove that if A ⊆ B are subsets in a group G, then hAi ≤ hBi.
40. Prove that in S4 , the subgroup h(1 2 3), (1 2)(3 4)i is A4 .
41. This exercise finds generating subsets of Sn .
(a) Prove that Sn is generated by {(1 2), (2 3), (3 4), . . . , (n − 1 n)}.
(b) Prove that Sn is generated by {(1 2), (1 3), . . . , (1 n)}.
(c) Prove that Sn is generated by {(1 2), (1 2 3 · · · n)}.
(d) Show that S4 is not generated by {(1 2), (1 3 2 4)}.
42. Prove that for any prime number p, the symmetric group Sp is generated by any transposition and
any p-cycle.
43. Label the vertices of a tetrahedron with integers {1, 2, 3, 4}. Prove that the group of rigid motions of
a tetrahedron is A4 . (See Figure 3.9.)
44. Show that if p is prime, then in the alternating group Ap , we have Ap = h(1 2 3), (1 2 3 · · · p)i.
45. Show that (R, +) is not finitely generated.
46. Describe the elements in Tor(C∗ ), the torsion subgroup of C∗ . (See Exercise 3.5.18.) Show that
Tor(C∗ ) is not finitely generated.
47. Let H and K be two subgroups of G. Prove that H ∪ K is a subgroup of G if and only if H ⊆ K or
K ⊆ H.
48. Let H be a subgroup of a group G and let g ∈ G. Prove that if n is the smallest positive integer such
that g n ∈ H, then n divides |g|.
3.6. LATTICE OF SUBGROUPS 109

49. Let H be a subgroup of a group G.


(a) Show that H ≤ NG (H).
(b) Show that if A is a subset of G, then A is not necessarily a subset of NG (A).
50. Consider the group (Q, +).
(a) Prove that if H and K are any two nontrivial subgroups of Q then H ∩ K is nontrivial.
(b) Prove that the above result does not necessarily hold if Q is replaced with R.

3.6
Lattice of Subgroups
In order to develop an understanding of the internal structure of a group, listing all the subgroups of
a group has some value. However, showing how these subgroups are related carries more information.
The lattice of subgroups offers a visual representation of relationships among subgroups.
Let Sub(G) be the set of all subgroups of the group G. Note that Sub(G) ⊆ P(G). The
pair (Sub(G), ≤) is a poset, in fact the subposet of (P(G), ⊆) on the subset Sub(G). Indeed, if
H, K ∈ Sub(G), then H ⊆ K if and only if H ≤ K.

Proposition 3.6.1
For all groups G, the poset (Sub(G), ≤) is a lattice.

Proof. We know that (P(G), ⊆) in which the least upper bound of any two subsets A and B is A ∪ B
and the greatest lower bound is A ∩ B. By Proposition 3.5.11, for any two subgroups H, K ≤ G,
the set H ∩ K is also a subset. Hence, H ∩ K is the greatest lower bound of H and K in the poset
(Sub(G), ≤).
The difficulty lies in that H ∪ K is not necessarily a subgroup. Consequently, if H and K have
a least upper bound in (Sub(G), ≤), it must be something else. The generating subsets formalism
gives us an answer. By Exercise 3.5.38, hH ∪ Ki is the smallest (by inclusion) subset of G that
contains both H and K. Thus, hH ∪ Ki is the least upper bound of H and K in (Sub(G), ≤). Since
every pair of subgroups of G has a least upper bound and a greatest lower bound in (Sub(G), ≤),
the result follows. 

The construction given in the above proof for a least upper bound of H and K, namely hH ∪ Ki
is called the join of H and K.
Since (Sub(G), ≤) is a poset, we can create the Hasse diagram for it. By a common abuse of
language, we often say “draw the lattice of G” for “draw the Hasse diagram of the poset (Sub(G), ≤).”
The lattice of a group shows all subgroups and their containment relationships.
The following list of examples gives a flavor of some group lattices.

Example 3.6.2 (Prime Cyclic Groups). The groups with the least internal structure are groups
Zp where p is a prime number. The lattice of Zp is:

Zp

{e}
4

Example 3.6.3. Consider the cyclic group Z8 = hz | z 8 = ei. It has a total of 4 subgroups, namely
{e}, hz 4 i, hz 2 i, and Z8 . The lattice of Z8 is:
110 CHAPTER 3. GROUPS

Z8

hz 2 i

hz 4 i

{e}

At first pass, one might wonder why we did not consider subgroups generated by other elements, say
for example z 3 . However, hz 3 i = {z 3 , z 6 , z, z 4 , z 7 , z 2 , z 5 , e} so hz 3 i = Z8 . Also hz 6 i = {z 6 , z 4 , z 2 , e} =
hz 2 i. All subgroups of Z8 do appear in the above diagram. 4

Example 3.6.4. The lattice of Z24 is the following:


Z24

hz 2 i
hz 3 i
hz 4 i
hz 6 i
hz 8 i
hz 12 i

{e}
4

By looking at the above examples of groups, we notice a trend in the subgroups of a cyclic group.
We make explicit the pattern of subgroups in cyclic groups.

Proposition 3.6.5
Every subgroup of a cyclic group G is cyclic. Furthermore, if G is finite with |G| = n, then
all the subgroups of G consist of hz d i for all divisors d of n and where z is a generator of G.

Proof. Let G be a cyclic group (not necessarily finite) generated by an element z. Let H be a
subgroup of G and let S = {a ∈ N∗ | z a ∈ H}, the set of positive powers of z in H. By the
well-ordering principle, S has a least element, say c. We prove by contradiction that H = hz c i.
Suppose that H contained a z k where c does not divide k. Then by integer division, there
exists an integer q and an integer r with 0 < r < c such that k = cq + r. Then the element
z r = z k−qc = z k (z c )−q is in H, which contradicts the assumption that z k ∈ H with c - k. Hence,
H = hz c i.
Now consider the case with G finite and |G| = n. Since e = z n ∈ H, using the argument in the
previous paragraph, we see that c must divide n. 

If G is a cyclic group of order n generated by z, then for any d that divides n, we have |z d | = nd .
So we can also say G contains exactly one subgroup of order k where k is any divisor of n. For
example, in Z/45Z, which is generated by 1, the subgroup h36i has order 45/ gcd(45, 36) = 45/9 = 5.
Thus, h36i = h9i, since this is the subgroup of Z/45Z of 5 elements.
We give a few more examples of lattices of noncyclic groups.
Example 3.6.6 (Quaternion Group). The following diagram gives the lattice of Q8 :
3.6. LATTICE OF SUBGROUPS 111

Q8

hii hji hki

h−1i

{1}
4

Finite subgroups of equal cardinality will be incomparable. Consequently, though it is not


necessary to do so, when drawing lattices of groups it is common to place the subgroups of equal
cardinality on the same horizontal level.

Example 3.6.7. The lattice of A4 , the alternating group on four elements (which has order 12), is
the following:

A4

h(1 2)(3 4), (1 3)(2 4)i

h(1 2 3)i h(1 2 4)i h(1 3 4)i h(2 3 4)i

h(1 2)(3 4)i h(1 3)(2 4)i h(1 4)(2 3)i

{id}
4

Example 3.6.8. The lattice of D6 , the hexagonal dihedral group, is the following:

D6

hs, r2 i hri hsr, r2 i

hs, r3 i hsr, r3 i hsr2 , r3 i

hr2 i

hsi hsr2 i hsr4 i hr3 i hsri hsr3 i hsr5 i

{ι}
4
112 CHAPTER 3. GROUPS

As the size or internal complexity of groups increases, the subgroup lattice diagrams become un-
wieldy. Furthermore, in specific examples and in proofs of theoretical results, we are often interested
in only a small number of subgroups. Consequently, we often restrict our attention to a sublattice
of the full subgroup lattice. In a normal subgroup lattice, an unbroken edge indicates that there
exists no subgroup between the subgroups on either end of the edge. However, in a subdiagram of a
subgroup lattice, an unbroken edge no longer means that there does not exist a subgroup between
the endpoints of that edge. For example, given any two subgroups H, K ≤ G, there is a sublattice
of (Sub(G), ≤) containing the following:

hH ∪ Ki

H ∩K

{e}

Having the subgroup lattice of a group G facilitates many calculations related to subgroups.
Calculating intersections and joins of two subgroups is easy. For the intersection of H and K, we
find the highest subgroup L of G in which there is a path up from L to H and a path up from L
to K. For joins, the process is merely reversed. The lattice of subgroups facilitates in determining
centralizers and normalizers because we can usually either follow a path up or down in the lattice,
testing the subgroup whether it continues to have the appropriate properties or not.

Exercises for Section 3.6


1. Determine the subgroup lattice of Z20 .
2. Determine the subgroup lattice of Z/105Z.
3. Determine the subgroup lattice of Z100 .
4. Determine the subgroup lattice of U (20). Also answer whether U (20) is cyclic.
5. Determine the subgroup lattice of U (35). Underline in the lattice the subgroups of U (35) that are
cyclic.
6. Determine the subgroup lattice for Zp2 q where p and q are prime numbers.
7. Determine the subgroup lattice for Zpn where p is a prime number and n is a positive integer.
8. Determine the subgroup lattice of Z3 ⊕ Z9 .
9. Determine the subgroup lattice of D4 .
10. Determine the subgroup lattice of Dp where p is a prime number.
11. Determine the subgroup lattice of D8 .
12. Determine the subgroup lattice of S3 .
13. Determine the subgroup lattice of Z2 ⊕ Z2 ⊕ Z2 .
14. Determine the subgroup lattice of Zp ⊕ Zp where p is a prime number. Also show that all nontrivial
strict subgroups are cyclic.
15. Show that there is no group whose subgroup lattice has the shape of a 3-cube, i.e., has the same lattice
as the poset (P({1, 2, 3}), ⊆).
16. Prove that the lattice of A4 is that which is given in Example 3.6.7.
3.7. GROUP HOMOMORPHISMS 113

17. Let m, n ∈ Z where we consider Z as a group equipped with addition.


(a) Find a generator of hmi ∩ hni.
(b) Find a generator of the join of hmi and hni.
18. Determine (without creating the subgroup lattice) the number and order of all cyclic subgroups of
Z5 ⊕ Z15 .
19. Determine (without creating the subgroup lattice) the number and order of all cyclic subgroups of
Z15 ⊕ Z15 .
20. Let G = D6 . Use the lattice in Example 3.6.8 to determine:
(a) CD6 (s)
(b) ND6 (sr)
(c) ND6 (s, r3 )
21. Let g, h ∈ G such that |g| = 12 and |h| = 5. Prove that hgi ∩ hhi = {e}.
22. Let x and y be elements in a group G such that |x| and |y| are relatively prime. Prove that hxi ∩ hyi =
{e}.
23. Let G be a group that has exactly one nontrivial proper subgroup. Prove that G is a cyclic group of
order p2 where p is a prime number.
24. Let G be an abelian group that has exactly two nontrivial proper subgroups, neither of which is
contained in the other. Prove that G is a cyclic group of order pq where p and q are distinct primes.
25. Prove that if G is a group whose entire lattice of subgroups consists of one chain, then G is cyclic and
of order pn where p is prime and n is a positive integer.

3.7
Group Homomorphisms
The concept of a function is ubiquitous in mathematics. However, in different branches we often
impose conditions on the functions we consider. For example, in linear algebra we do not study
arbitrary functions from one vector space to another but limit our attention to linear transformations.
As exhibited in Section 1.4, when studying posets it is common to restrict attention to monotonic
functions.
Given two objects A and B with a particular algebraic structure, if we consider an arbitrary
function f : A → B, a priori, the only type of information that f carries is set theoretic. In other
words, information about algebraic properties of A would be lost under f . However, if we impose
certain properties on the function, it can, intuitively speaking, preserve the structure.

3.7.1 – Homomorphisms

Definition 3.7.1
Let (G, ∗) and (H, •) be two groups. A function ϕ : G → H is called a homomorphism
from G to H if for all g1 , g2 ∈ G,

ϕ(g1 ∗ g2 ) = ϕ(g1 ) • ϕ(g2 ). (3.6)

Note that the operation on the left-hand side is an operation in G while the operation on the
right-hand side occurs in the group H. With abstract group notation, we write (3.6) as

ϕ(g1 g2 ) = ϕ(g1 )ϕ(g2 )

but take care to remember that the group operations occur in different groups.
114 CHAPTER 3. GROUPS

Example 3.7.2. Fix a positive real number b and consider the function f (x) = bx . Power rules
state that for all x, y ∈ R,
bx+y = bx by .
In the language of group theory, this identity can be restated by saying that the exponential function
f (x) = bx is a homomorphism from (R, +) to (R∗ , ×). 4

Example 3.7.3. The function of inclusion f : (Z, +) → (R, +) given by f (x) = x is a homomor-
phism. 4

Example 3.7.4. The function f : Zn → Zn given by f (x) = x2 is a homomorphism. Let z be a


generator of Zn . Then for all z a , z b ∈ Zn ,
2 2
f (z a z b ) = z a z b = z a+b = z 2(a+b) = z 2a+2b = z 2a z 2b = f (z a )f (z b ). 4

Example 3.7.5. Consider the direct sum Z2 ⊕ Z2 , where each Z2 has generator z. Consider the
function ϕ : Q8 → Z2 ⊕ Z2 defined by

ϕ(±1) = (e, e) ϕ(±i) = (z, e) ϕ(±j) = (e, z) ϕ(±k) = (z, z).

This is a homomorphism but in order to verify it, we must check that ϕ satisfies (3.6) for all 64
products of terms in Q8 . However, we can cut down the work. First notice that for all terms
a, b ∈ {1, i, j, k}, the products (±a)(±b) = ±(ab) with the sign as appropriately defined. The
following table shows ϕ(ab) with a in the columns and b in the rows.
±1 ±i ±j ±k
±1 (e, e) (z, e) (e, z) (z, z)
±i (z, e) (e, e) (z, z) (e, z)
±j (e, z) (z, z) (e, e) (z, e)
±k (z, z) (e, z) (z, e) (e, e)
All the entries of the table are precisely ϕ(a)ϕ(b), which confirms that ϕ is a homomorphism. 4

Example 3.7.6. Let n be an integer greater than 1. The function ϕ(a) = ā that maps an integer
to its congruence class in Z/nZ is a homomorphism. This holds because of Proposition 2.2.4 and
the definition
a + b = a + b. 4

Example 3.7.7 (Determinant). Let F be Q, R, or C. The determinant function det : GLn (F ) →


F ∗ (with the multiplication operation on F ) is a homomorphism. This is precisely the content of
the theorem in linear algebra that

det(AB) = det(A) det(B).

This result also applies with modular arithmetic base p, when p is a prime number. The determinant
function det : GL(Fp ) → U (p) is a homomorphism because of the same identity. 4

Example 3.7.8 (Sign Function). We define the sign function on Sn as

sign(σ) = (−1)inv(σ) .

In other words, sign(σ) = 1 if σ is even and sign(σ) = −1 if σ is odd. Now for all σ, τ ∈ Sn ,

sign(στ ) = (−1)inv(στ ) = (−1)inv(σ)+inv(τ ) = (−1)inv(σ) (−1)inv(τ ) = sign(σ) sign(τ ),

where the second equality holds because of Proposition 3.4.13. Thus, the sign function is a homo-
morphism sign : Sn → ({1, −1}, ×). This sign function plays a crucial role in many applications of
the symmetric group and we will revisit it often. 4
3.7. GROUP HOMOMORPHISMS 115

As a first property preserved by homomorphisms, the following proposition shows that homo-
morphisms map powers of elements to powers of corresponding elements.

Proposition 3.7.9
Let ϕ : G → H be a homomorphism of groups.

(1) ϕ(eG ) = eH .
(2) For all x ∈ G, ϕ(x−1 ) = ϕ(x)−1 .
(3) For all x ∈ G and all n ∈ Z, ϕ(xn ) = ϕ(x)n .

Proof. For (1), denote ϕ(eG ) = u. Let g be some element in G. Then


ϕ(g) = ϕ(eG · g) = uϕ(g).
−1
Thus, u = ϕ(g) (ϕ(g)) = eH .
For (2), let x ∈ G. Applying ϕ to both sides of eG = xx−1 gives
eH = ϕ(eG ) = ϕ(xx−1 ) = ϕ(x)ϕ(x−1 ).
Similarly, ϕ(x−1 )ϕ(x) = eH . Consequently, ϕ(x−1 ) satisfies the axiom for ϕ(x)−1 .
For (3), note that parts (1) and (2) establish the result for n = 0, −1. For all other integers n, a
simple induction proof gives us the result. 

3.7.2 – Kernel and Image


In linear algebra, the concepts of kernel and image of a linear transformation play a central role
in many applications, including finding solutions to systems of equations, establishing properties of
solutions to differential equations, and others. The concepts of kernel and image play a different but
equally important role in group theory.

Definition 3.7.10
Let ϕ : G → H be a homomorphism between groups.

(1) The kernel of ϕ is Ker ϕ = {g ∈ G | ϕ(g) = eH }, where eH is the identity in H.


(2) The image of ϕ is Im ϕ = {h ∈ H | ∃g ∈ G, ϕ(g) = h}. The image is also called the
range of ϕ.

Proposition 3.7.11
Let ϕ : G → H be a homomorphism of groups. The kernel Ker ϕ is a subgroup of G.

Proof. The kernel Ker ϕ is nonempty since eG ∈ Ker ϕ. Now let x, y ∈ Ker ϕ. Then
ϕ(xy −1 ) = ϕ(x)ϕ(y −1 ) since ϕ is a homomorphism
−1
= ϕ(x)ϕ(y) by Proposition 3.7.9(2)
= eH e−1
H = eH since x, y ∈ Ker ϕ.
Hence, xy −1 ∈ Ker ϕ. Thus, Ker ϕ ≤ G by the One-Step Subgroup Criterion. 

Proposition 3.7.12
Let ϕ : G → H be a homomorphism of groups. The image Im ϕ is a subgroup of H.
116 CHAPTER 3. GROUPS

Proof. (Left as an exercise for the reader. See Exercise 3.7.1.) 


Example 3.7.13. Consider the homomorphism det : GLn (R) → R∗ . (See Example 3.7.7.) The
kernel of the determinant homomorphism is the set of matrices whose determinant is 1, namely
SLn (R), the special linear group. 4
Example 3.7.14. Consider the sign function sign : Sn → ({1, −1}, ×) as defined in Example 3.7.8.
The kernel Ker(sign) is precisely the alternating group An as a subgroup of Sn . The function ϕ is
surjective so the image is all of {1, −1}. 4
The kernel and the image of a homomorphism are closely connected to whether the homomor-
phism is injective or surjective. In fact, given the result of the following theorem, stating the kernel
and the image of a homomorphism describe how far the homomorphism is from being an injection
or a surjection.

Proposition 3.7.15
Let ϕ : G → H be a homomorphism of groups. Then
(1) ϕ is injective if and only if Ker ϕ = {eG }.

(2) ϕ is surjective if and only if H = Im ϕ.

Proof. (Left as an exercise for the reader. See Exercise 3.7.11.) 

3.7.3 – Isomorphisms
We have seen some examples where groups, though presented differently, may actually look strikingly
the same. For example (Zn , ·) and (Z/nZ, +) behave identically and likewise for (Z, +) and (2Z, +),
where 2Z means all even numbers. This raises the questions (1) when should we call two groups the
same and (2) what precisely does it mean if we call two groups the same.

Definition 3.7.16
Let G and H be two groups. A function ϕ : G → H is called an isomorphism if

(1) ϕ is a homomorphism;
(2) ϕ is a bijection.
If there exists an isomorphism between two groups G and H, then we say that G and H
are isomorphic and we write G ∼= H.

When two groups are isomorphic, they are for all intents and purposes of group theory the
same. We could have defined an isomorphism as a bijection ϕ such that both ϕ and ϕ−1 are both
homomorphisms. However, this turns out to be heavier than necessary as the following proposition
shows.

Proposition 3.7.17
If ϕ is an isomorphism (as defined in Definition 3.7.16), then ϕ−1 : H → G is a homomor-
phism.

Proof. (Left as an exercise for the reader. See Exercise 3.7.26.) 


Example 3.7.18. Let G = Z2 ⊕Z3 and H = Z6 . Suppose that Z2 is generated by x, Z3 is generated
by y and Z6 is generated by z. Consider the function f : H → G defined by f (z a ) = (xa , y a ). We
immediately have
f (z a z b ) = f (z a+b ) = (xa+b , y a+b ) = (xa , y a )(xb , y b ) = f (z a )f (z b ),
3.7. GROUP HOMOMORPHISMS 117

so f is a homomorphism. By Theorem 3.3.9, the element (x, y) has order lcm(2, 3) = 6. Hence,
Im f = G and so f is a surjective. Since f is a surjection between finite sets of the same cardinality,
then f is a bijection. We conclude that f is an isomorphism. 4

Example 3.7.19. Let b be a positive real number. We know that f (x) = bx is a bijection between
R and R>0 with inverse function f −1 (x) = logb x = (ln x)/(ln b). Example 3.7.2 showed that f is
a homomorphism and thus it is an isomorphism between (R, +) and (R>0 , ×). Proposition 3.7.17
implies that f −1 (x) = logb x is a homomorphism from (R>0 , ×) to (R, +). 4

Example 3.7.20. In this example, we provide an isomorphism between GL2 (F2 ) and S3 . Consider
the following function:
  ϕ
1 0
−→ id
0 1
1 1
−→ (12)
0 1
1 0
−→ (13)
1 1
0 1
−→ (23)
1 0
1 1
−→ (123)
1 0
0 1
−→ (132)
1 1
If we compare the group table on GL2 (F2 ) and the group table on S3 (see Exercise 3.7.17), we
find that this particular function ϕ preserves how group elements operate, establishing that ϕ is a
isomorphism. 4

Proposition 3.7.21
Let ϕ : G → H be an isomorphism of groups. Then
(1) |G| = |H|.
(2) G is abelian if and only if H is abelian.
(3) ϕ preserves orders of elements, i.e., |x| = |ϕ(x)| for all x ∈ G.

Proof. Part (1) follows immediately from the requirement that ϕ is a bijection.
For (2), suppose that G is abelian. Let h1 , h2 ∈ H. Since ϕ is surjective, there exist g1 , g2 ∈ G
such that ϕ(g1 ) = h1 and ϕ(g2 ) = h2 . Then

h1 h2 = ϕ(g1 )ϕ(g2 ) = ϕ(g1 g2 ) = ϕ(g2 g1 ) = ϕ(g2 )ϕ(g1 ) = h2 h1 .

Hence, we have shown that if G is abelian, then H is abelian. Repeating the argument with ϕ−1
establishes the converse, namely that if H is abelian, then G is abelian.
Consider first the case in which the order |x| = n is finite. For (3), we have already seen
that ϕ(xn ) = ϕ(x)n by Proposition 3.7.9(3). Then 1H = ϕ(1G ) = ϕ(xn ) = ϕ(x)n . Hence, by
Corollary 3.3.6, the order of |ϕ(x)| is finite and divides |x|. Applying the same argument to ϕ−1
and the element ϕ(x), we deduce that |x| divides the order of |ϕ(x)|. Hence, since |x| and |ϕ(x)| are
both positive and divide each other |x| = |ϕ(x)|.
Consider now the case in which the order of x is infinite. Suppose that ϕ(x)m = 1H for some
m > 0. Then ϕ(xm ) = 1H and since ϕ is injective, we again deduce that xm = 1G . Hence, the
order of x is finite. But this is a contradiction so if |x| is infinite then |ϕ(x)| is infinite. Conversely,
applying the same argument to ϕ−1 , establishes that if |ϕ(x)| is infinite then |x| is infinite. 
118 CHAPTER 3. GROUPS

Proposition 3.7.21 is particularly useful to prove that two groups are not isomorphic. If either
condition (1) or (2) fails, then the groups cannot be isomorphic. Also, if two groups have a different
number of elements of a given order, then Proposition 3.7.21(3) cannot hold for any isomorphism
and thus the two groups are not isomorphic.
However, we underscore that the three conditions in Proposition 3.7.21 are necessary conditions
but not sufficient: just because all three conditions hold, we cannot deduce that ϕ is an isomorphism.
Example 3.7.24 illustrates this.
Remark 3.7.22. If two groups are isomorphic then they have isomorphic (as posets) subgroup
lattices. However, the converse is not true. There are many pairs of nonisomorphic groups with
subgroup lattices that are isomorphic as posets. See Figure 3.10 for an example. 4
Z/15Z Z/21Z

h3i h3i
h5i h7i

{0} {0}

Figure 3.10: Nonisomorphic groups with identical lattices

Example 3.7.23. We prove that D4 and Q8 are not isomorphic. They are both of order 8 and
they are both nonabelian. However, in D4 only the elements r and r3 have order 4 while in Q8 , the
elements i, −i, j − j, k, −k are all of order 4. Consequently, there cannot exist a bijection between
D4 and Q8 that satisfies Proposition 3.7.21(3). Hence, D4 ∼ 6 Q8 .
= 4

Example 3.7.24. The partial order in Example 1.4.8 establishes a bijection between Q>0 and
N∗ . From this bijection, it is easy to prove that there is a bijection between Q and Z. However,
there does not exist an isomorphism between (Z, +) and (Q, +). This result does not follow from
Proposition 3.7.21. Indeed, |Z| = |Q|, Z and Q are both abelian, and all nonzero elements of both
groups have infinite order.
Suppose there does exist an isomorphism f : Q → Z. If r is a rational number and n ∈ Z, then
by Proposition 3.7.9(3) with addition, f (n · r) = n · f (r). Suppose that we define f (1) = a, for some
nonzero a. Then    
2a 1
a = f (1) = f = 2a · f .
2a 2a
This implies that 1 = 2f (1/2a). But this is a contradiction since f (1/2a) ∈ Z and it contradicts the
hypothesis that there exists an isomorphism between Z and Q. 4

The motivating example that Zn ∼ = Z/nZ offered at the beginning of this subsection generalizes
to a broader result for any cyclic groups.

Proposition 3.7.25
Two cyclic groups of the same cardinality are isomorphic.

Proof. Recall from Exercise 3.2.31 that cyclic groups are abelian.
First suppose that G and H are finite cyclic groups, both of order n. Suppose that G is generated
by x and H is generated by an element y. Define the function ϕ : G → H by ϕ(xa ) = y a . Since H
is abelian, ϕ is a homomorphism; we need to prove that it is a bijection.
The image of ϕ is {ϕ(xk ) | 0 ≤ k ≤ n − 1} = {y k | 0 ≤ k ≤ n − 1} = H, so ϕ is a surjection. A
surjection between finite sets is a bijection. Hence, ϕ is an isomorphism.
3.7. GROUP HOMOMORPHISMS 119

The proof is similar if G and H are infinite cyclic groups. (We leave the proof to the reader. See
Exercise 3.7.27.) 

Because of Proposition 3.7.25, we talk about the cyclic group of order n.


In Section 3.3.2, we introduced the notion of classification theorems of groups. A classification
theorem in group theory is a theorem that, given a property of groups, lists all nonisomorphic groups
with that property. Example 3.3.10 established that every group of order 4 is isomorphic to either
Z4 or Z2 ⊕ Z2 . In this statement, it is implied that Z4 and Z2 ⊕ Z2 are nonisomorphic. (We can
easily see this from the fact that Z4 contains elements of order 4, while Z2 ⊕ Z2 does not.) The
notion of nonisomorphic is the group-theoretic term for “different.”
The question “What is the isomorphism type of G?” means to find a well-known group that is
isomorphic to G. For example, from the result of Example 3.7.20, we could say that the isomorphism
type of GL2 (F2 ) is S3 . The expression isomorphism type is imprecise because there does not exist a
standard list of nonisomorphic groups against which to compare every group. In this text, however,
Section A.2.1 of the appendices provides a list of the first few groups listed by order as a comparison
list.
To understand the internal structure of a group, it is common to determine the isomorphism
type of various subgroups. For example, in D6 all the subgroups of order 2 are isomorphic to Z2 ; the
one subgroup of order 3, namely hr2 i, is isomorphic to Z3 ; the subgroups of order 4, namely hs, r3 i,
hsr, r3 i, and hsr2 , r3 i, are all isomorphic to Z2 ⊕ Z2 ; and for the subgroups of order 6 we have

hri ∼
= Z6 but hs, r2 i ∼
= hsr, r2 i ∼
= D3 .

If a homomorphism ϕ : G → H is injective, then when restricting the codomain, the function


ϕ : G → Im ϕ is an isomorphism. Since ϕ sets up an isomorphism between G and a subgroup of H,
then an injective homomorphism from G to H is often called an embedding of G in H. This term
is used in a similar way in the context of other algebraic structures (e.g., monoids, rings, fields, and
so on).

3.7.4 – The Automorphism Group


Consider the group Z7 generated by the element z and the function f : Z7 → Z7 defined by f (g) = g 2 .
Note first that f (z) = z 2 and since gcd(2, 7) = 1, then z 2 generates Z7 . Hence, since z 2 ∈ Im f , then
Im f = Z7 and f is surjective. Since f is a surjective function between sets of the same cardinality,
it is bijective. Example 3.7.4 showed that f is also a homomorphism and thus it is an isomorphism.
Isomorphisms from a group to itself play an important role and therefore carry a particular
terminology.

Definition 3.7.26
A homomorphism φ : G → G of a group into itself is called an endomorphism. An iso-
morphism ψ : G → G is called an automorphism of G. The set of all automorphisms on a
group G is denoted by Aut(G).

Proposition 3.7.27
The set of automorphisms of a group G is a group with the operation of composition. In
fact, Aut(G) ≤ SG .

Proof. We call SG the group of bijections on G. The set Aut(G) is an nonempty subset since the
identity function idG is an automorphism. Now suppose that ϕ, ψ ∈ Aut(G) and let x, y ∈ G. Then

(ϕ ◦ ψ −1 )(xy) = ϕ(ψ −1 (xy)) = ϕ(ψ −1 (x)ψ −1 (y))


= ϕ(ψ −1 (x))ϕ(ψ −1 (y)) = (ϕ ◦ ψ −1 )(x)(ϕ ◦ ψ −1 )(y).
120 CHAPTER 3. GROUPS

Thus, ϕ ◦ ψ −1 ∈ Aut(G) for arbitrary ϕ, ψ ∈ Aut(G) and, by the One-Step Subgroup Criterion,
Aut(G) is a subgroup of SG . 

The automorphism group of a group G provides some description of internal group-theoretic


symmetry within G. It is not always easy to determine Aut(G) but Proposition 3.7.21(3) consider-
ably restricts possibilities. Various exercises guide the reader to determine the automorphism group
of some basic groups. As a first example, we mention the following proposition for Aut(Zn ).

Proposition 3.7.28
Let n be a integer greater than 2. Then Aut(Zn ) = U (n).

Proof. (The proof is left as a guided exercise for the reader. See Exercise 3.7.40.) 

3.7.5 – Cayley’s Theorem


Previously, we discussed how, in order to understand the internal structure of a given group, we
often compare subgroups of a group to some group that is already well-known. We could consider
the reverse process: Given a group G, does G exist as (isomorphic to) a subgroup of some natural
family of groups? Cayley’s Theorem answers this question and shows that the complexity of any
finite group resides in the complexity of symmetric groups.

Theorem 3.7.29 (Cayley’s Theorem)


Every finite group G is isomorphic to a subgroup of Sn for some n.

Proof. Write the elements of G as G = {g1 , g2 , . . . , gn }. Define the function ψ : G → Sn by ψ(g) = σ


where
ggi = gσ(i) for all i ∈ {1, 2, . . . , n}.
So ψ(g) is the permutation on G of how left multiplication by g permutes the elements of G. We
need to show that ψ is a homomorphism. Let g, h ∈ G with ψ(g) = σ and ψ(h) = τ . Then for all
1 ≤ i ≤ n, we have ggi = gσ(i) and hgi = gτ (i) . Then

(hg)gi = h(ggi ) = hgσ(i) = gτ (σ(i)) = gτ σ(i) for all i ∈ {1, 2, . . . , n}.

Hence, ψ(hg) = τ σ = ψ(h)ψ(g) and thus ψ is a homomorphism.


We next show that ϕ is an injection. Suppose that g, h ∈ G with ψ(g) = ψ(h). Then in particular
gg1 = hg1 . By operating on the right by g1−1 , we deduce that g = h.
If we restrict the codomain of ψ to the subgroup Im ψ in Sn , we see that ψ becomes an isomor-
phism between G and Im ψ. 

The proof of Cayley’s Theorem shows that G is isomorphic to a subgroup of Sn where |G| = n.
It is important to realize that the order |G| is not necessarily the least integer m such that G is
isomorphic to a subgroup of Sm . See the second half of Examples 3.7.30.
The proof of Cayley’s Theorem is not particularly profound. However, it has important conse-
quences especially in regards to using a computer algebra system (CAS) to perform group theoretic
calculations. Implementing function composition of bijections on finite sets is simple to program.
Consequently, defining a group G as some subgroup of some Sn makes it possible (if not easy) to
perform the group operations on G in a CAS.

Example 3.7.30. Let G = D4 , the dihedral group on the square. The proof of Cayley’s Theorem
establishes an isomorphism between D4 and a subgroup of S8 . To find one such isomorphism ϕ,
label the elements of D4 with

g1 = ι, g2 = r, g3 = r2 , g4 = r3 , g5 = s, g6 = sr, g7 = sr2 , g8 = sr3 .


3.7. GROUP HOMOMORPHISMS 121

We easily calculate that

rg1 = r = g2 , rg2 = r2 = g3 , rg3 = r3 = g4 , rg4 = ι = g1 ,


2
rg5 = rs = g8 , rg6 = rsr = g5 , rg7 = rsr = g6 , rg8 = rsr3 = g7 .

Hence, ϕ(r) = (1 2 3 4)(5 8 7 6). Furthermore,

sg1 = s = g5 , sg2 = sr = g6 , sg3 = sr2 = g7 , sg4 = sr3 = g8 ,


sg5 = ι = g1 , sg6 = r = g2 , sg7 = r2 = g3 , sg8 = r3 = g4 .

Hence, ϕ(s) = (1 5)(2 6)(3 7)(4 8). Consequently, we deduce that

D4 ∼
= h(1 2 3 4)(5 8 7 6), (1 5)(2 6)(3 7)(4 8)i.

This is not the only natural isomorphism between D4 and a subgroup of a symmetric group.
Consider how we introduced Dn in Section 3.1.2 as depicted below.

2 r 2

s
3 1 3 1

4 4

By considering how r and s operate on the vertices, we can view r and s as elements in S4 . The
appropriate way to understand this perspective is to define a homomorphism ϕ : D4 → S4 by
ϕ(r) = (1 2 3 4) and ϕ(s) = (2 4). For all elements ra sb ∈ D4 , the function ϕ is

ϕ(ra sb ) = (1 2 3 4)a (2 4)b .

By exhaustively checking all operations, we can verify that this function is a homomorphism. How-
ever, we know that r and s were originally defined as the functions (1 2 3 4) and (2 4) so this ho-
momorphism is natural. The homomorphism ϕ is injective and hence establishes an isomorphism
between D4 and the subgroup h(1 2 3 4), (2 4)i in S4 . 4

Exercises for Section 3.7


1. Let ϕ : G → H be a homomorphism between groups. Prove that Im ϕ ≤ H.
2. Find all homomorphisms from Z15 to Z20 .
3. Find all homomorphisms from Z4 to Z2 ⊕ Z2 .
4. Prove that the function ϕ : G → G defined by ϕ(g) = g −1 is a homomorphism if and only if G is
abelian.
5. Let Z33 be generated by an element y and Z12 by an element z. Suppose that ϕ : Z33 → Z12 is a
homomorphism satisfying ϕ(y 7 ) = z 8 . Find ϕ(x); determine Ker ϕ; and determine Im ϕ.
6. Consider the function f : U (12) ⊕ (Z/16Z) → Z/4Z defined by f (a, b) = ab with elements considered
now modulo 4. Prove that f is a well-defined function. Determine whether or not f is a homomorphism.
If it is, determine Ker f ; if it is not, give a counterexample.
7. What common fallacy in elementary algebra is addressed by the statement that “the function f :
(R, +) → (R, +) with f (x) = x2 is not a homomorphism.”
8. Let G be a group. Determine whether or not the function ϕ : G ⊕ G → G defined by ϕ(x, y) = x−1 y
is a homomorphism.
9. Find all homomorphisms Z5 → S5 . Also find all homomorphisms S5 → Z5 .
122 CHAPTER 3. GROUPS

10. Let ϕ : G → H be a homomorphism. Prove that for all g ∈ G, the order |ϕ(g)| divides the order |g|.
11. Prove Proposition 3.7.15.
12. Prove that ϕ : Z⊕Z → Z defined by ϕ(x, y) = 2x+3y is a homomorphism. Determine Ker ϕ. Describe
the fiber ϕ−1 (6).
13. Let ϕ and ψ be homomorphisms between two groups G and H. Prove or disprove that

{g ∈ G | ϕ(g) = ψ(g)}

is a subgroup of G.
14. Consider the function f : U (33) → U (33) defined by f (x) = x2 . Show that f is a homomorphism and
find the kernel and the image of f .
15. Prove that the function f : GL2 (R) → GL3 (R) with
 2
b2

  a 2ab
a b
f = ac ad + bc bd
c d
c2 2cd d2

is a homomorphism and find the kernel of f .


16. Consider the multiplicative group R× and consider the function f : R× → R>0 × {1, −1} given by
f (x) = (|x|, sign(x)). We assume that all groups have multiplication as their operation. Show that f
is an isomorphism.
17. Construct both the Cayley tables for both S3 and GL2 (F2 ) to show that the function given in Exam-
ple 3.7.20 establishes an isomorphism between S3 and GL2 (F2 ).
18. Find all homomorphisms Z → Z. Determine which ones are isomorphisms.
19. Prove that Zm ⊕ Zn ∼
= Zmn if m and n are relatively prime.
20. Prove that Z9 ⊕ Z3 ∼
6= Z27 .
21. Prove that Zm ⊕ Zn ∼ = Zlcm(m,n) ⊕ Zgcd(m,n) . [Hint: Use Exercise 3.7.19.]
22. Let G1 and G2 be two arbitrary groups. Prove that G1 ⊕ G2 ∼ = G2 ⊕ G1 .
23. Let G1 and G2 be two arbitrary groups. Prove that the “projection” maps p1 : G1 ⊕ G2 → G1 and
p2 : G1 ⊕ G2 → G2 defined by p1 (x, y) = x and p2 (x, y) = y are homomorphisms. Find the kernels
and images of p1 and p2 .
24. Prove that if a homomorphism ϕ : G → H is injective, then G ∼= ϕ(G).
25. Prove that for every homomorphism f : (R, +) → (R>0 , ×) there exists a positive real number a such
that satisfies f (q) = aq for all q ∈ Q.
26. Prove Proposition 3.7.17.
27. Prove that if two groups G and H are infinite and cyclic, then G ∼= H. [Hint: See the proof of
Proposition 3.7.25. Since G and H are infinite, you must prove separately that f is surjective and
injective.]
28. Prove that Z2 ⊕ Z2 ⊕ Z2 and Q8 are not isomorphic.
29. Prove that D6 and A4 are not isomorphic.
30. Prove that D12 and S4 are not isomorphic.
31. Prove that (R∗ , ×) and (C∗ , ×) are not isomorphic.
32. Prove or disprove that U (7) and U (9) are isomorphic.
33. Prove or disprove that U (44) and U (25) are isomorphic.
34. Let d be a divisor of n. Consider the function ϕ : U (n) → U (d) defined by ϕ(a) = a mod d. Determine
the kernel and the image of ϕ.
35. Prove that U (21) =∼ Z2 ⊕ Z6 .
36. Let G be the group of rigid motions of a cube. In Exercise 3.3.27 we saw that G has order 24. Prove
that each rigid motion of the cube corresponds uniquely to a permutation of the 4 main diagonals
through the cube. Conclude that G ∼= S4 .
37. Let G be a group, let H be a subgroup, and let g ∈ G.
3.8. GROUP PRESENTATIONS 123

(a) Prove that gHg −1 = {ghg −1 | h ∈ H} is a subgroup of G.


(b) Prove that the function ϕg : H → gHg −1 by ϕg (h) = ghg −1 is an isomorphism between H and
gHg −1 .
[The subgroup gHg −1 is said to be a conjugate to H.]
38. Let G be a group and let g ∈ G be any group element. Define the function ψg : G → G by
ψg (x) = gxg −1 .

(a) Prove that ψg is an automorphism of G with inverse (ψg )−1 = ψg−1 .


(b) Prove that the function Ψ : G → Aut(G) defined by Ψ(g) = ψg is a homomorphism.

[An automorphism of the form ψg is called an inner automorphism. The image of Ψ in Aut(G) is
called the group of inner automorphisms and is denoted Inn(G).]
39. Prove that Aut((Q, +)) ∼
= (Q∗ , ×).
40. This exercise determines the automorphism group Aut(Zn ). Suppose that Zn is generated by the
element z.
(a) Prove that every homomorphism ψ : Zn → Zn is completely determined by where it sends the
generator z.
(b) Prove that every homomorphism ψ : Zn → Zn is of the form ψ(g) = g a for some integer a
with 0 ≤ a ≤ n − 1. For the scope of this exercise, denote by ψa the homomorphism such that
ψa (g) = g a .
(c) Prove that ψa ∈ Aut(Zn ) if and only if gcd(a, n) = 1.
(d) Show that the function Ψ : U (n) → Aut(Zn ) with Ψ(a) = ψa is an isomorphism to conclude that
U (n) ∼
= Aut(Zn ).

3.8
Group Presentations
In the organizing principles outlined in the preface, one of the objectives for each algebraic structure
involved how to “conveniently describe an object with the given (algebraic) structure.” For the
groups we encountered so far, we have introduced individual notation for each one. This was
reasonable because these groups arose in a natural manner, from a context that already possessed
some habits notation. In order to study all groups, and not just those we encounter by happenstance,
it is desirable to have some form of consistent notation to describe elements and operations in
arbitrary groups.

3.8.1 – The Notation for the Identity


Up to this point in the textbook, we used the notation e to refer to the identity element of an
arbitrary group. We did this in order to avoid confusion. However, it is common in the literature
to use the symbol 1 for the identity of an arbitrary group so we adopt this practice from now on.
Now some groups have binary operations that are called some form of addition, multiplication,
or composition. If the group has an operation that is considered a multiplication, it is common
to denote the identity element by 1. If the group has an operation that is related to function
composition, we often denote the identity by 1 or by id. So, for example, most texts denote the
identity in Dn as 1 and the identity in Sn as 1. On the other hand, if the group operation is some
form of addition, it is also common to refer to that identity as 0. For example, if we consider the
addition group R3 , we will usually refer to the identity as 0.
124 CHAPTER 3. GROUPS

3.8.2 – Generators and Relations


In Section 3.1.2, where we introduced the abstract notation for the dihedral group Dn , we saw that
all dihedral symmetries can be obtained from r (the rotation by angle 2π/n) and s (the reflection
through the x-axis) by various compositions. We also know that rn = 1 because r has order n, that
s2 = 1 because s has order 2, and that r and s also satisfy sr = rn−1 s. Since rn = 1, we can rewrite
this last identity as sr = r−1 s.
Since every element in Dn can be created from operations on r and s, we say that r and s
generate Dn and that rn = 1, s2 = 1, and sr = rs−1 are relations on r and s. Furthermore, though
there may exist other relations on r and s, they can be deduced from the three given relations. Such
relations are sometimes called implicit relations. In group theory, we write

Dn = hr, s | rn = s2 = 1, sr = r−1 si (3.7)

to express that every element in Dn can be obtained by a finite number of repeated operations
between r and s and that every algebraic relation between r and s can be deduced from the relations
rn = 1, s2 = 1, and sr = r−1 s. The expression (3.7) is the standard presentation of Dn .
The relation sr = r−1 s shows that in any term sk rl it is possible to “move” the s to the left of the
term by appropriately changing the power on r. Hence, in any expression involving the generators r
and s, it is possible, by appropriate changes on powers, to move all the powers of s to the right and
all the powers of r to the left. Hence, every expression in the r and s can be rewritten as rl sk with
0 ≤ k ≤ 1 and 0 ≤ l ≤ n − 1. Though we already knew |Dn | = 2n from geometry, this reasoning
shows that there are at most 2n terms in the group given by this presentation.

Definition 3.8.1
Let G be a group. A presentation of G is an expression

G = hg1 , g2 , . . . , gk | R1 R2 · · · Rs i

where each Ri is an equation in the elements g1 , g2 , . . . , gk . This means that every element
in G can be obtained as a combination of operations on the generators g1 , g2 , . . . , gk and
that any relation between these elements can be deduced from the relations R1 , R2 , . . . , Rs .

Example 3.8.2 (Cyclic Groups). The nth order cyclic group has the presentation

Zn = hz | z n = 1i.

The distinct elements are {1, z, z 2 , . . . , z n−1 } and the operation is z a z b = z a+b , where z n = 1. 4

Example 3.8.3. Consider an alternate presentation for D5 , namely ha, b | a2 = b2 = 1, (ab)5 = 1i,
where a = rs and b = s. The relation (ab)5 = 1 corresponds to r5 = 1 and the relation a2 = 1 is
equivalent to rsrs = 1, which can be rewritten as sr = r−1 s. This shows that D5 = ha, b | a2 = b2 =
1, (ab)5 = 1i. 4

This last example illustrates two important properties. First, Example 3.8.3 and the standard
presentation for D5 give two distinct presentations for the same group. Second, in the standard
presentation of the dihedral group D5 , the relations r5 = 1 and s2 = 1 coupled with the fact that
|D5 | = 10, may lead a reader to speculate that the order of a group is equal to the product of the
order of the generators. This is generally not the case as Example 3.8.3 shows.
In the examples given so far, we began with a well-defined group and gave a presentation of it.
More importantly, we can define a group by a presentation. However, before providing examples,
it is useful to introduce free groups. Free groups possess interesting properties but force us to be
precise in how we use symbols in abstract groups.
3.8. GROUP PRESENTATIONS 125

3.8.3 – Free Groups


In the previous paragraphs, we transitioned subtly from a group having a presentation, to a group
being defined by a presentation. This is a subtle difference but we must put the latter concept on a
rigorous footing. In order to do so, we introduce free groups. We note that the following discussion
about strings and words of symbols appears technical at first pass, but it is the foundation for how
students are taught to deal with symbols early in their algebra experience.
Let S be a set. We usually think of S as a set of symbols. A word in S consists either of the
expression 1, called the empty word, or an expression of the form

w = sα 1 α2 αm
1 s2 · · · sm , (3.8)

where m ∈ N∗ , si ∈ S not necessarily distinct and αi ∈ Z − {0} for 1 ≤ i ≤ m. (The order of the si
matters.) If the word is not the empty word 1, we call m the length of the word. For example, if
S = {x, y, z}, then
x2 y −4 zx13 x−2 yyyz 2 , yzzzyzx−2 , yz 2 y −3 z
are examples of words in S. We call a word reduced if si+1 6= si for all 1 ≤ i ≤ m − 1. In the above
examples of words from {x, y, z}, only the last word is reduced.
We define F (S) as the set of all reduced words of S. We define the operation · of concatenation of
reduced words by concatenating the expressions and then eliminating adjacent symbols with powers
that collect or cancel. More precisely, for all w ∈ F (S), w · 1 = 1 · w = w and for two nonempty
β1 β2
reduced words a = sα 1 α2 αm βn
1 s2 · · · sm and b = t1 t2 · · · tn , then the concatenation

β1 β2
(sα1 α2 αm βn
1 s2 · · · sm ) · (t1 t2 · · · tn )

is as follows:

Case 1. the empty string 1 if m = n, sm+1−i = ti and βi = −αm+1−i for all i with 1 ≤ i ≤ m;
α β β
Case 2. the reduced word sα1 α2 βn
1 s2 · · · sm−k tk+1 tk+2 · · · tn if sm−k 6= tk+1 and if sm+1−i = ti and
m−k k+1 k+2

βi = −αm+1−i for all i with 1 ≤ i ≤ k;


α α +β β
Case 3. the reduced word sα 1 α2
1 s2 · · · sm−k tk
m−k m+1−k k k+1
tk+1 · · · tβnn if sm−k 6= tk+1 and if sm+1−i = ti
for all i with 1 ≤ i ≤ k and if βk 6= −αm+1−k but βi = −αm+1−i for all i with 1 ≤ i ≤ k − 1.

We call k the overlap of the pair (a, b). A few examples of this concatenation-reduction are

(x3 y −2 ) · (x10 y 2 ) = x3 y −2 x10 y 2 case 2 with k = 0,


2 −3 3 −1 3 −1
(xy z ) · (z yx ) = xy x case 3 with k = 1,
3 −2 −1 2 −3 2 2
(xy x z) · (z x y zy ) = xzy case 2 with k = 3,
−2 2 −1 −1
(zyx ) · (x y z )=1 case 1.

Theorem 3.8.4
The operation · of concatenation-reduction on F (S) is a binary operation and the pair
(F (S), ·) is a group with identity 1, and for w 6= 1 expressed as in (3.8),

w−1 = s−α
m
m
· · · s−α
2
2 −α1
s1 .

Proof. In all three cases for the definition of concatenation of reduced words, the resulting word is
such that no successive symbol is equal. Hence, the word is reduced and · is a binary operation on
F (S).
That the empty word is the identity is built into the definition of concatenation. Furthermore,
Case 1 establishes that the inverse of sα 1 α2 αm −αm
1 s2 · · · sm is sm · · · s−α
2
2 −α1
s1 .
126 CHAPTER 3. GROUPS

The difficulty of the proof resides in proving that concatenation is associative. For the rest of
this proof, we denote the length of a word w ∈ F (S) as L(w). Let a, b, and c be three arbitrary
reduced words in F (S). Let h be the overlap of (a, b) and let k be the overlap of (b, c).
First, suppose that h + k < L(b). Then we can write a = a0 a00 , b = b0 b00 b000 , and c = c0 c00 , where
a00 b0 are the symbols reduced out or reduced to a word of length 1 in the concatenation a · b, and
suppose that b000 c0 are the symbols reduced out or reduced to a word of length 1 in the concatenation
b · c. We write a · b = a0 [a00 b0 ]b00 b000 , where [a00 b0 ] stands for removed or reduced to a word of length 1
depending on Case 2 or Case 3 of the concatenation-reduction. Then, as reduced words
(a · b) · c = (a0 [a00 b0 ]b00 b000 ) · c = a0 [a00 b0 ]b00 [b000 c0 ]c00 = a · (b0 b00 [b000 c0 ]c00 ) = a · (b · c).
Suppose next that h+k = L(b). We write a = a0 a00 , b = b0 b00 and c = c0 c00 so that a·b = a0 [a00 b0 ]b00
and b · c = b0 [b00 c0 ]c00 . Then
(a · b) · c = (a0 [a00 b0 ]b00 ) · c = (a0 [a00 b0 ]) · ([b00 c0 ]c00 ) = a · (b0 [b00 c0 ]c00 ) = a · (b · c).
Finally, suppose that h + k > L(b). Now we subdivide each reduced word into three parts as
a = a0 a00 a000 , b = b0 b00 b000 , and c = c0 c00 c000 where
a · b = a0 [a00 a000 b0 b00 ]b000 and b · c = b0 [b00 b000 c0 c00 ]c000 .
Now any of these subwords can be the empty word. However, in order for the reductions to occur
as these subwords are defined, a few relations must hold. For the lengths of the subwords, we must
have
L(b00 ) = L(a00 ) = L(c00 ) = h + k − L(b),
L(a000 ) = L(b0 ) = h − L(b00 ),
L(c0 ) = L(b000 ) = k − L(b00 ).
−1 −1
Furthermore, we must have b0 = (a000 ) and c0 = (b000 ) and
α −α α
a00 = sα1 α2 m−1 αm
1 s2 · · · sm−1 sm , b00 = sm
−αm m−1
sm−1 · · · s2−α2 sβ1 , c00 = s−β α2 m−1 γ
1 s2 · · · sm−1 sm .

Then we can calculate that


α
(a · b) · c = (a0 sα1 +β b000 ) · c = (a0 sα1 +β b000 ) · (c0 s−β α2 m−1 γ 000
1 s2 · · · sm−1 sm c )
α
= a0 sα 1 α2 m−1 γ
1 s2 · · · sm−1 sm c
000

α
= (a0 sα1 α2 m−1 αm 000 0 −αm +γ 000
1 s2 · · · sm−1 sm a ) · (b sm c ) = a · (b0 sm
−αm +γ 000
c )
= a · (b · c).
In all three possible cases of combinations of overlap, associativity holds.
We conclude that (F (S), ·) is a group. 

Definition 3.8.5
Let S be a set of symbols. The F (S) equipped with the operation · of concatenation, is
called the free group on S. The cardinality of S is called the rank of the free group F (S).

The term “free” refers to the fact that its generators do not have any relations among them
besides the relations imposed by power rules,
xα xβ = xα+β , for all x ∈ G, and all α, β ∈ Z. (3.9)
In fact, we defined the concatenation operation on reduced words as we did in order to satisfy (3.9).
Because it has no relations, the free group on a given set of symbols is as complicated as a group
can get when generated by those symbols. All free groups are infinite because each symbol in a
reduced word can carry powers of any nonzero integer and also because there exists reduced words
of arbitrary length.
3.8. GROUP PRESENTATIONS 127

3.8.4 – Defining Groups from a Presentation


A group defined by a presentation is similar to a free group in that it is first understood through
its symbols, rather than the symbols representing some function, matrix, or number. In a group
defined by a presentation, the elements are simply reduced words in the generators but the relations
impose additional simplifications beyond just the power rules that hold in any group.

Example 3.8.6. Consider the following three groups. To illustrate the result of various sets of
relations, we consider presentations that have the same number of generators, each of the same
orders.

G1 = hx, y | x3 = y 7 = 1, xy = yxi,
G2 = ha, b | a3 = b7 = 1, ab = b2 ai,
G3 = hu, v | u3 = v 7 = 1, uv = v 2 u2 i.

In G1 , from the relation xy = yx, then y n xk = y n−1 xyxk−1 = · · · y n−1 xk y = · · · xk y n . Conse-


quently, in any expression of the generators, all the x symbols can be moved to the left. Thus, all
elements in G1 can be written as xk y n . Furthermore, since x3 = y 7 = 1, then xi y j with 0 ≤ i ≤ 2
and 0 ≤ j ≤ 6 give all the elements of the group. All 21 of these elements are distinct because

xk y ` = xm y n ⇐⇒ xk−m = y n−`

and this equality only holds when 3 divides k − m and 7 divides n − `. Hence, G1 is a group of
order 21 in which the elements operate as (xk y l )(xm y n ) = xk+m y l+n . In fact, it is easy to see that
G1 ∼
= Z3 ⊕ Z7 and by Exercise 3.7.19, we deduce that G1 ∼ = Z21 .
In G2 , from the relation ab = b2 a, we see that all the a symbols may be moved to the right of
any b symbols, though possibly changing the power on b. In particular,
n
an b = an−1 b2 a = an−2 (ab)ba = an−2 b2 aba = an−2 b4 a2 = · · · = b2 an ,

and also
n n n n
an bk = b2 an bk−1 = b2 b2 an bk−2 = · · · = bk2 an .
Thus, every element in G can be written as bm an . Also, since a3 = b7 = 1, then bi aj with 0 ≤ i ≤ 6
and 0 ≤ j ≤ 3 give all the elements of the group. The same reasoning used for G1 shows that all
21 of these elements are distinct. Hence, G2 is a group of order 21 but in which the group elements
operate according to
`
(bk a` )(bm an ) = bk+m2 a`+n .
In G3 , the relation uv = v 2 u2 does not readily show that in every expression of u and v, the v’s
can be moved to the left of the u’s. Consider the following equalities coming from the relation

u2 v = u(uv) = uv 2 u2 = (uv)vu2 = v 2 u2 vu2


= v 2 uv 2 u4 = v 4 u2 vu4 = v 4 uv 2 u6 = v 6 u2 vu6 .

Now u6 = 1 so we have u2 v = v 6 u2 v which leads to v 6 = 1. By Proposition 3.3.5, we conclude that


v = 1. Then the relation uv = v 2 u2 becomes u = u2 , which implies that u = 1. Hence, we conclude
that G3 is the trivial group G3 = {1}.
We should not surmise that a group generated by one element of order 3 and one element of
order 7 must have an order less than 21. Exercise 3.5.44 with p = 7 gives an example of a group of
order 2520 obtained from two generators of order 3 and 7, respectively. 4

The presentation of G3 in Example 3.8.6 shows that the combination of relations in a presentation
can lead to implicit relations.
The following example offers a more extreme example of the size of a group in relation to the
orders of the generators.
128 CHAPTER 3. GROUPS

Example 3.8.7. Recall the infinite dihedral group D∞ . Example 3.3.8 described a presentation of
D∞ as
D∞ = hx, y | x2 = y 2 = 1i.
The element xy has infinite order. Setting z = xy, we have xz = y = y −1 = y −1 x−1 x = z −1 x.
Hence, since the set {x, y} can be obtained from the set {x, z} and vice versa by operations, then
another presentation of D∞ is
D∞ = hx, z | x2 = 1, xz = z −1 xi.
This presentation resembles the standard presentation of a dihedral group, except that z (which is
similar to r) has infinite order. 4
It is always possible to find a presentation of any finite group. We can take the set of generators
to be the set of all elements in the group and the set of relations as all the calculations in the Cayley
table. More often than not, however, we are interested in describing the group with a small list
of generators and relations. Depending on the group, it may be a challenging problem to find a
minimal generating subset.
A profound result that illustrates the possible complexity of working with generators and rela-
tions, the Novikov-Boone Theorem [50, 10] states that in the context of a given presentation, given
two words w1 and w2 , there exists no algorithm to decide if w1 = w2 . With certain specific relations,
it may be possible to decide if two words are equal. For example, with the dihedral group, there is
an algorithm to reduce any word to one in a complete list of distinct words.

3.8.5 – Presentations and Homomorphisms


Suppose that G has a presentation hg1 , g2 , . . . , gk | R1 R2 · · · Rs i. Every element w ∈ G is a word
α`
in the generators, w = uα 1 α2
1 u2 · · · u` , with ui ∈ {g1 , g2 , . . . , gk }, so for a homomorphism ϕ : G → H
we have
ϕ(w) = ϕ(u1 )α1 ϕ(u2 )α2 · · · ϕ(u` )α` .
Hence, ϕ is entirely determined by the values of ϕ(g1 ), ϕ(g2 ), . . . , ϕ(gk ).
When trying to construct a homomorphism from G to a group H, it is not possible to associate
arbitrary elements in H to the generators of G and always obtain a homomorphism. The following
theorem makes this precise.

Theorem 3.8.8 (Extension Theorem on Generators)


Let G = hg1 , g2 , . . . , gk | R1 R2 · · · Rs i and let h1 , h2 , . . . , hk ∈ H be elements that
also satisfy the relations satisfied by the generators of G when replacing gi with hi for
i = 1, 2, . . . , k. Then the function {g1 , g2 , . . . , gk } → H that maps gi 7→ hi for i = 1, 2, . . . , k
can be extended to a unique homomorphism ϕ : G → H, satisfying ϕ(gi ) = hi .

Proof. We define the function ϕ : G → H by ϕ(gi ) = hi for i = 1, 2, . . . , k and for each element
α`
g ∈ G, if g = uα1 α2
1 u2 · · · u` with uj ∈ {g1 , g2 , . . . , gk }, then

def
ϕ(g) = ϕ(u1 )α1 ϕ(u2 )α2 · · · ϕ(u` )α` .
By construction, ϕ satisfies the homomorphism property ϕ(xy) = ϕ(x)ϕ(y) for all x, y ∈ G. However,
since different words can be equal, we have not yet determined if ϕ is a well-defined function.
Two words v and w in the generators g1 , g2 , . . . , gk are equal if and only if there is a finite
sequence of words w1 , w2 , . . . , wn such that v = w1 , w = wn , and wi to wi+1 are related to each
other by either one application of a power rule (as given in Proposition 3.2.13) or one application
of a relation Rj . Since the elements h1 , h2 , . . . , hk ∈ H satisfy the same relations R1 , R2 , . . . , Rs as
g1 , g2 , . . . , gk , then the same equalities apply between the words ϕ(wi ) and ϕ(wi+1 ) as between wi
and wi+1 . This establishes the chain of equalities
ϕ(v) = ϕ(w1 ) = ϕ(w2 ) = · · · = ϕ(wn ) = ϕ(w).
3.8. GROUP PRESENTATIONS 129

Hence, if v = w are words in G, then ϕ(v) = ϕ(w). Thus, ϕ is a well-defined function and hence is
a homomorphism. 
Example 3.8.9. We use Theorem 3.8.8 to prove that the subgroup h(1 2 3)(4 5), (1 2)i of S5 is
isomorphic to D6 . We set up a function from {r, s}, the standard generators of D6 to S5 by
r 7→ (1 2 3)(4 5) and s 7→ (1 2).
6
Obviously r6 = 1 and ((1 2 3)(4 5)) = 1 while s2 = 1 and (1 2)2 = 1. In D6 , we also have the
relation rs = sr−1 while in S5 ,
−1
(1 2 3)(4 5)(1 2) = (1 3)(4 5) and (1 2) ((1 2 3)(4 5)) = (1 2)(1 3 2)(4 5) = (1 3)(4 5).
Thus, (1 2 3)(4 5) and (1 2) satisfy the same relations as r and s. Hence, by Theorem 3.8.8, this map-
ping on generators extends to a homomorphism ϕ : D6 → S5 . Obviously, ϕ(D6 ) = h(1 2 3)(4 5), (1 2)i.
However, it is not hard to verify that h(1 2 3)(4 5), (1 2)i consists of exactly 12 elements. Hence, ϕ is
injective and by Exercise 3.7.24, D6 ∼ = ϕ(D6 ) = h(1 2 3)(4 5), (1 2)i. 4

3.8.6 – Cayley Graph of a Presentation


The Cayley graph of a presentation of a group is another visual tool to understand the internal
structure of a group, particularly a group of small order. The vertices of the Cayley graph are the
elements of the group G and the edges are pairs {x, y} if there is a generator g such that y = gx. One
variant of the Cayley graph colors the edges accordingly to distinguish which generator corresponds
to which edge. Yet another variant is a directed graph that places an arrow from x to y if there is
a generator g such that y = gx.
Figure 3.11 gives the Cayley graph of the standard presentation hr, s | r6 = s2 = 1, rs = sr−1 i
of D6 . The shown graph uses directed edges and draws a double edge for the generator r and a
single edge for the generator s. At each vertex, there is an in-edge and out-edge for every generator.
Hence, Cayley graphs possess a high degree of symmetry. Furthermore, a Cayley graph for a group
presentation involving two generators can sometimes be drawn as a polyhedron. For example, the
Cayley graph for the standard presentation of D6 can be drawn as a hexagonal prism.

r4 s r5 s r3
r4
r2
r2 r r5
1 r
3
r s r 3 1 s
r2 s
r4 r5
rs r5 s
2
r s rs s

Figure 3.11: Cayley graphs for D6

The Cayley graph of a group presentation depends only on the set of generators of the group.
Hence, for a known group G, a set of generators (without necessarily supplying all the relations
between them) is sufficient to define a Cayley graph of G. For example, it is not hard to show
that S4 = h(1 2 3), (1 2 3 4)i. Figure 3.12 shows the Cayley graph for S4 using these generators. The
double edges correspond to left multiplication by (1 2 3) and the single edges to left multiplication
by (1 2 3 4). This Cayley graph has the adjacency structure of the Archimedean solid named a
rhombicuboctahedron.
130 CHAPTER 3. GROUPS

(142)
(1324)

(34)
(123)

(14)(23)

(143) (24)

(1342) (1243)

(1234)
(132)

(13)
(12)(34)
(23)

(124) (234)

(13)(24)
(14)
(1432)

(12)
(134)

(243)
(1423)

Figure 3.12: Cayley graph for S4 = h(1 2 3), (1 2 3 4)i


3.8. GROUP PRESENTATIONS 131

Exercises for Section 3.8


1. Find a set of generators and relations for S3 . Prove that D3 ∼
= S3 .
2. Find a set of generators and relations for the group Z2 ⊕ Z2 ⊕ Z2 .
3. Prove that in any group, given any two elements x, y ∈ G, the subgroups hx, yi and hx, xyi are
isomorphic.
4. Consider the group presentation G = hx, y | x4 = y 3 = 1, xy = y 2 x2 i. Prove that G is the trivial
group G = {1}.
5. Prove that hi, j | i4 = j 4 = 1, i2 = j 2 , ij = j 3 ii is a presentation of the quaternion group Q8 . In
particular, show how to write −1, −i, −j, k, and −k as operations on the generators i and j.
6. Show that the group G = hx, y | x5 = 1, y 2 = 1, x2 yxy = 1i is isomorphic to Z2 .
7. Prove that D7 and hx, y | x7 = y 3 = 1, yx = x2 yi have the same lattice structure.
8. Let G = ha, b | a6 = b7 = 1, ab = b3 ai.
(a) Prove that every element in G can be written uniquely as bm an for 0 ≤ m ≤ 6 and 0 ≤ n ≤ 5.
(b) Write the element a4 b2 a−2 b5 in the form bm an .
(c) Find the order of ab and list all the powers (ab)i .
9. The quasidihedral group of order 16 is defined by
QD16 = ha, b | a8 = b2 = 1, ab = ba3 i.
Show that QD16 has 16 elements and then draw the subgroup lattice of QD16 .
10. Prove that if all nonidentity elements in a finite group G have order 2, then there exists a nonnegative
integer n such that
G = ha1 , a2 , . . . , an | a2i = 1 ai aj = aj ai for all i, j = 1, 2, . . . , ni.
n times
z }| {
Prove also that G ∼
= Z2n = Z2 ⊕ Z2 ⊕ · · · ⊕ Z2 . [Hint: Use induction on the minimum number of
generators of G.]
11. Prove that the subgroup h(1 2 3 4), (2 4)i in S4 is isomorphic to D4 . [Hint: See Example 3.7.30.]
12. We work in the group S5 .
(a) Find a subgroup of S5 isomorphic to D5 .
(b) Find a subgroup of S5 isomorphic to D6 .
(c) Prove that S5 does not have any subgroups isomorphic to Dn for n ≥ 7.
13. Consider homomorphisms Q8 → Z4 .
(a) Prove that there exists no homomorphism ϕ : Q8 → Z4 = hz | z 4 = 1i such that ϕ(i) = z and
ϕ(j) = 1.
(b) Prove that there exists a homomorphism ψ : Q8 → Z4 = hz | z 4 = 1i such that ϕ(i) = z 2 and
ϕ(j) = 1.
14. Prove that there exists a homomorphism from ϕ : D3 → GL2 (F3 ) such that
   
1 1 2 0
ϕ(r) = and ϕ(s) = .
0 1 0 1

15. Prove that the subgroup    


0 2 1 1
,
1 0 1 2
in GL2 (F3 ) is isomorphic to Q8 .
16. Fix a positive integer n. Show that a function Dn → GL2 (R) that maps
   
cos(2π/n) − sin(2π/n) 1 0
r 7−→ and s 7−→
sin(2π/n) cos(2π/n) 0 −1
extends uniquely to a homomorphism ϕ : Dn → GL2 (R). Show also that this ϕ is injective.
132 CHAPTER 3. GROUPS

Figure 3.13: The Quintrino sculpture (Courtesy of Bathsheba Grossman)

17. Prove that the subgroup


   
2 0 1 1
,
0 1 0 1
in GL2 (F7 ) is isomorphic to the group G2 in Example 3.8.6.
18. Sketch the Cayley graph for Z6 ⊕ Z2 using directed edges and colors corresponding to generators.
19. Sketch the Cayley graph of D4 using the generating set {s, sr}.
20. Show that A4 is generated by {(1 2 3), (1 2)(3 4)} and then sketch the Cayley graph of A4 using these
generators.
21. Decide whether some group has a tetrahedron as its Cayley graph for some set of generators.
22. Let S be a set and let F (S) be the free group on S. Prove that two reduced words w1 = sα 1 α2 αm
1 s2 · sm
β1 β2 βn
and w2 = t1 t2 · · · tn are equal in F (S) if and only if m = n, si = ti for all 1 ≤ i ≤ n and αi = βi
for all 1 ≤ i ≤ n. (This was not obvious from the definition of reduced words.)
23. Prove that Aut(Z2 ⊕ Z2 ) ∼
= S3 . [Hint: Use the presentation Z2 ⊕ Z2 = hx, y | x2 = y 2 = 1, xy = yxi.]
24. This exercise guides the reader to prove that Aut(Q8 ) ∼
= S4 .

(a) Let ϕ be an arbitrary automorphism of Aut(Q8 ). Prove that


(i) ϕ(i) ∈ {i, −i, j, −j, k, −k};
(ii) ϕ(−1) = −1;
(iii) ϕ(j) ∈ {i, −i, j, −j, k, −k} − {ϕ(i), ϕ(−i)};
(iv) all values of ϕ(x) are determined by ϕ(i) and ϕ(j).
(b) Prove that all functions that are allowed by the above restrictions are valid automorphisms.
Conclude that | Aut(Q8 )| = 24.
(c) By labeling the faces of a cube with {i, −i, j, −j, k, −k} where x and −x are opposite faces, show
that Aut(Q8 ) is isomorphic to the group of rigid motions of a cube.
(d) Use Exercise 3.7.36 to conclude that Aut(Q8 ) ∼
= S4 .

25. Consider the sculpture entitled Quintrino depicted in Figure 3.13 and let G be its group of symmetry.
(a) Show that |G| = 60.
(b) Show that G can be generated by two elements σ and τ .
(c) Show that G can be viewed as a subgroup of S12 by writing σ and τ explicitly as elements in
S12 .
(d) (*) Show that G is isomorphic to A5 .
[Other sculptures by the same artist can be found at http://bathsheba.com.]
3.9. GROUPS IN GEOMETRY 133

3.9
Groups in Geometry
Groups arise naturally in many areas of mathematics, often as groups of certain functions.
Geometry in particular offers many examples of groups. The dihedral group, which we introduced
in Section 3.1 as a motivation for groups, comes from geometry. However, there are countless
connections between group theory and geometry, ranging from generalizations of dihedral groups
(reflection groups, e.g., [40]) to applications in advanced differential geometry and topology. Elliptic
curves, a certain family of curves in the plane, themselves possess a natural group structure. In fact,
group theory became so foundational to geometry that in 1872, Felix Klein proposed the Erlangen
program: to classify all geometries using projective geometry and groups of allowed transformations.
In this section, we introduce a few instances in which groups arise in geometry in an elementary
way. This section only offers a glimpse of these topics.

3.9.1 – The Groups of Isometries


In the real plane R2 , the Euclidean distance between two points P = (xP , yP ) and Q = (xQ , yQ ) is
q
d(P, Q) = (xQ − xP )2 + (yQ − yP )2 .

The plane equipped with this distance function is called the Euclidean plane. More generally, the
Euclidean n-space is the set Rn equipped with the Euclidean distance function
v
u n
uX
d(~x, ~y ) = t (yi − xi )2 .
i=1

Definition 3.9.1
An isometry of Euclidean space is a function f : Rn → Rn that preserves the distance,
namely f satisfies
d(f (~x), f (~y )) = d(~x, ~y ) for all ~x, ~y ∈ Rn . (3.10)

The Greek etymology of isometry is “same measure.” It is possible to broaden the concept of
Euclidean distance to more general notions of distance. This is formalized by metric spaces.

Definition 3.9.2
A metric space is a pair (X, d) where X is a set and d is a function d : X × X → R≥0
satisfying
(1) (identity of equal elements) d(x, y) = 0 if and only if x = y;
(2) (symmetry) d(x, y) = d(y, x) for all x, y ∈ X;

(3) (triangle inequality) d(x, y) + d(y, z) ≤ d(x, z) for all x, y, z ∈ X.

More generally than Definition 3.9.1, an isometry between metric spaces (X, d) and (Y, d0 ) is a
bijection f : X → Y that satisfies (3.10) for all x, y ∈ X. However, for the rest of this section, we
only discuss isometries of Euclidean spaces.
Some examples of isometries in the plane include a rotation about some fixed point A by an
angle α and reflection about a line L. Many others exist. Figure 3.14 shows an isometry obtained
by reflecting about a line L followed by a translation. However, without knowing all isometries, we
can nonetheless establish the following proposition.
134 CHAPTER 3. GROUPS

B0
C0

A0 B 00

D0 C 00

A00
D C D00
L

A B

Figure 3.14: An example of an isometry

Proposition 3.9.3
The set of isometries on Euclidean space Rn is a group under composition.

The following properties of isometries are helpful for the proof of Proposition 3.9.3.

Proposition 3.9.4
Let f be an isometry of Euclidean space. For arbitrary points A, B, C ∈ Rn we denote by
A0 = f (A), B 0 = f (B), and C 0 = f (C).
(1) f preserves betweenness;
(2) f preserves collinearity;
−→ −−→ −−→ −−−→
(3) if AC = λAB, then A0 C 0 = λA0 B 0 ;
(4) 4A0 B 0 C 0 is congruent to 4ABC;

(5) f maps parallelograms to nondegenerate parallelograms.

Proof. A point B is said to be between two points A and C if d(A, B) + d(B, C) = d(A, C). Since
an isometry preserves the distance function, then d(A0 , B 0 ) + d(B 0 , C 0 ) = d(A0 , C 0 ). Hence, B 0 is
between A0 and C 0 .
In Euclidean geometry, three points are collinear if one is between the other two. Since isometries
preserve betweenness, they preserve collinearity.
−→ −−→
The vector equation AC = λAB with λ ≥ 0 holds when {A, B, C} is a set of collinear points,
with A not between B and C and when d(A, C) = λd(A, B). Since f is an isometry, d(A0 , C 0 ) =
−−→ −−−→
d(A, C) = λd(A, B) = λd(A0 , B 0 ). Hence, A0 C 0 = λA0 B 0 . When λ ≤ 0, this vector equality means
that {A, B, C} is a set of collinear points, with A between B and C and when d(A, C) = |λ|d(A, B).
−−→ −−−→
Then d(A0 , C 0 ) = d(A, C) = λd(A, B) = λd(A0 , B 0 ) and thus A0 C 0 = λA0 B 0 .
That 4A0 B 0 C 0 is congruent to 4ABC follows from the Side-Side-Side Theorem of Euclidean
Geometry.
Let ABCD be a nondegenerate parallelogram. Then 4ABD and 4BDC are congruent to each
and to 4A0 B 0 D0 and 4B 0 D0 C 0 . Hence, A0 B 0 C 0 D0 is a nondegenerate parallelogram, congruent to
ABCD. 
3.9. GROUPS IN GEOMETRY 135

Proof (of Proposition 3.9.3). We provide the proof for the Euclidean plane (n = 2) but the proof
generalizes to arbitrary n.
We first must check that function composition is a binary operation on the set of isometries of
Rn . Let f, g be two isometries of Rn . Then for any P, Q ∈ Rn ,

d ((f ◦ g)(P ), (f ◦ g)(Q)) = d (f (g(P )), f (g(Q)))


= d(g(P ), g(Q)) since f is an isometry
= d(P, Q) since g is an isometry.

Thus, f ◦ g is an isometry.
Function composition is always associative (Proposition 1.1.15).
The identity function id : Rn → Rn is an isometry and satisfies the group axioms for an identity
element.
In order to show that the set of isometries is closed under taking inverses, we need to show that
an arbitrary isometry f is a bijection and that the inverse function is again an isometry. Suppose
that f (P ) = f (Q) for two points P, Q ∈ R2 . Then d(f (P ), f (Q)) = 0. Since f is an isometry, then
d(P, Q) = 0 and hence P = Q. This shows that every isometry is injective.
Establishing that an isometry is a surjection requires the most work. Let O be a point in
−→ −→ −→
the domain and let A1 , A2 , . . . , An be points so that OA1 , OA2 , . . . , OAn form a basis of Rn . Let
O0 = f (O) and A0i = f (Ai ). By Proposition 3.9.4(5) and an induction argument, we deduce that
−− → −−→ −−→
O0 A0 1 , O0 A0 2 , . . . , O0 A0 n is a basis of the codomain. Let Q be an arbitrary point in Rn . There exist
real numbers λ1 , λ2 , . . . , λn such that

−−0→ −−→ −−→ −−→


O Q = λ1 O0 A0 1 + λ2 O0 A0 2 + · · · + λn O0 A0 n .

−−→ −→
We define a sequence of points P1 , P2 , . . . , Pn by OP 1 = λ1 OA1 and

−−→ −−→ −→
OP i = OP i−1 + λi OAi for 1 < i ≤ n.

−−−→ −−→ −−0−−→0 −−→


Denote P = Pn . By Proposition 3.9.4(3), O0 P 0 1 = λ1 O0 A0 1 and Pi−1 P i = λ1 O0 A0 i for 1 < i ≤ n.
Proposition 3.9.4(5) implies that

−−0−→0 −−−→ −−→


O P i = O0 P 0 i−1 + λi O0 A0 i for 1 < i ≤ n.

Then
−−0−→0 −−0−→0 −−→ −−→ −−→
O P = O P n = λ1 O0 A0 1 + λ2 O0 A0 2 + · · · + λn O0 A0 n

and hence P 0 = f (P ) = Q. This shows that isometries are surjective


Finally, since an isometry is a bijection, the inverse function f −1 exists and, for arbitrary points
P = f (P ) and Q0 = f (Q) in the codomain, we have
0

d(P 0 , Q0 ) = d(f (P ), f (Q)) = d(P, Q) = d(f −1 (P ), f −1 (Q)).

Thus, the inverse function is also an isometry. This proves that the set of isometries is closed under
taking inverses. 

Proposition 3.9.4 allows us to give a characterization of all isometries in Rn .


136 CHAPTER 3. GROUPS

Theorem 3.9.5
A function f : Rn → Rn is an isometry of Euclidean space if and only if

f (~x) = A~x + ~b

for some matrix A such that A> A = I and any constant vector ~b. In particular, isometries
of the Euclidean plane are of the form
   
a1 −sa2 b
~x 7−→ ~x + 1 ,
a2 sa1 b2

where s ∈ {1, −1}, ai ∈ R with a21 + a22 = 1, and bi ∈ R.

Proof. Proposition 3.9.4 (3 and 5) show that an isometry is a linear transformation on the set of
displacement vectors (though not position vectors). Suppose that f : Rn → Rn is an isometry and
that f (O) = (b1 , b2 , . . . , bn ). Let P be an arbitrary point with coordinates (x1 , x2 , . . . , xn ) and let
P 0 = f (P ). Then
−−→0 −−→0 −−0−→0 −−→
OP = OO + O P = ~b + AOP = ~b + A~x
−−−→ −−−→
for some matrix A since O0 P 0 is related to OOP by a linear transformation.
Let P and Q be arbitrary points with coordinates (x1 , x2 , . . . , xn ) and (y1 , y2 , . . . , yn ). For the
distance, we have d(P, Q) = k~y − ~xk. Since f is an isometry,

k~y − ~xk = d(P, Q) = d(f (P ), f (Q)) = k(A~y + ~b) − (A~x + ~b)k = kA(~y − ~x)k.

The usual identity (a − b)2 = a2 − 2ab + b2 generalizes and leads to

1
k~xk2 + k~y k2 − k~y − ~xk2 .

~x · ~y =
2
Hence, since A is a matrix satisfying kA(~y − ~x)k = k~y − ~xk for arbitrary ~x and ~y , then

~x · ~y = (A~x) · (A~y ) = (A~x)> (A~y )~x> A> A~y .

Since this holds for arbitrary vectors, in particular the basis vectors, we deduce that A> A = I, the
identity matrix.
If n = 2, we consider the identity A> A = I with a generic matrix
  2
a + b2 ac + bd
    
1 0 a b a c
= = .
0 1 c d b d ac + bd c2 + d2

Hence, 
2 2
a + b
 =1
ac + bd = 0
 2
c + d2 = 1.

Note first that if a = 0, then b = ±1, d = 0 and c = ±1, with the ±’s on b and c independent. If
a 6= 0, then c = −bd/a and the third equation gives, (−bd/a)2 + d2 = 1, which gives b2 d2 + a2 d2 = a2
and then a2 = d2 , using the first equation. But then 0 = (a2 + b2 ) − (c2 + d2 ) = b2 − c2 , so b2 = c2
also. Finally using the second equation, we deduce that d = εa, while c = −εb, with ε ∈ {−1, 1}
with the condition a2 + b2 = 1, still holding. The theorem follows. 

A matrix A ∈ GLn (R) satisfying A> A = I is called the set of orthogonal matrices. The set of all
real orthogonal n × n matrices is a subgroup of GLn (R), called the orthogonal group and denoted
by O(n).
3.9. GROUPS IN GEOMETRY 137

If we set a point as the origin, then by Theorem 3.9.5 we see that O(n) is the subgroup of
isometries that leaves the origin fixed. However, let p~ ∈ Rn be any other point. The set of isometries
that leave p~ fixed is the set
tp~ O(n)t−1
~ = {tp
p
−1
~ ◦ f ◦ tp
~ | f ∈ O(n)},

where tp~ is a translation by the vector p~. By Exercise 3.7.37, we see that the subgroup of isometries
that leave a given point fixed is conjugate and hence isomorphic to the subgroup of isometries that
leave the origin fixed.
Because det(A> ) = det(A) for all n × n real matrices, then every orthogonal matrix satisfies
det(A)2 = 1. This gives two possibilities for the determinant of an orthogonal matrix, namely 1 or
−1.

Definition 3.9.6
An isometry f : Rn → Rn as described in Definition 3.9.5, is called direct (resp. indirect)
if det(A) = −1 (resp. det(A) = −1).

Example 3.9.7. In linear algebra, we find that equations of transformation for rotation of an angle
of α about the origin are     
x cos α − sin α x
7−→ .
y sin α cos α y
It is easy to check that rotations are direct isometries. The equations of transformation for reflection
through a line through the origin making an angle of β with the x-axis are
    
x cos 2β sin 2β x
7−→ .
y sin 2β − cos 2β y
In contrast to rotations, reflections through lines are indirect isometries. 4
Example 3.9.8. To find the equations of transformation for the rotation of angle α about a point
p~ = (p1 , p2 ) besides the origin, we obtain it as a composition of a translation by −~
p, followed by
rotation about the origin of angle α, then followed with a translation by p~. The equations then are
      
x p1 cos α − sin α x − p1
7−→ + .
y p2 sin α cos α y − p2 4
From Theorem 3.9.5 applied to the case of the Euclidean plane, we can deduce (see Exercise 3.9.6)
that an isometry is uniquely determined by how it maps a triangle 4ABC into its image 4A0 B 0 C 0 .
In other words, knowing how an isometry maps three noncollinear points is sufficient to determine
the isometry uniquely.
The orthogonal group is an important subgroup of the group of isometries. In the rest of the
section, we consider two other types of subgroups of the group of isometries of the Euclidean plane.

3.9.2 – Frieze Groups


For hundreds of years, people have adorned the walls of rooms with repetitive patterns. Borders as
a crown to a wall, as a chair rail, as molding to a door, or as a frame to a picture are particularly
common artistic and architectural details. Frieze patterns are the patterns of symmetries used in
borders.

Definition 3.9.9
A (discrete) frieze group is a subgroup of the group of Euclidean plane isometries whose
subgroup of translations is isomorphic Z.

In the usual group of isometries in the plane, the translations form a subgroup isomorphic to R2 .
We sometimes use the description of “discrete” for frieze groups in contrast to “continuous” because
there is a translation of least positive displacement.
138 CHAPTER 3. GROUPS

Example 3.9.10. Consider for example the following pattern and let G be the group of isometries
of the plane that preserve the structure of the pattern.

L1 L2

P Q Q3

−−→
The subgroup of translations of G consists of all translations that are an integer multiple of 2P Q.
Some other transformations in G include

• reflections through a vertical line L1 through P or any line parallel to L1 displaced by an


−−→
integer multiple of P Q;
←→
• reflection through the horizontal line L3 = P Q;

• rotations by an angle of π about P , Q, or any point translated from P by an integer multiple


−−→
of P Q.

It is possible to describe G with a presentation. Let si be the reflection through Li , for i = 1, 2, 3.


We claim that
G = hs1 , s2 , s3 | s21 = s22 = s23 = 1, (s1 s3 )2 = 1, (s2 s3 )2 = 1i. (3.11)
In order to prove the claim, we first should check that s1 , s2 , s3 do indeed generate all of G. By
Exercise 3.9.8, s1 s3 corresponds to rotation by π about P and s2 s3 corresponds to rotation by π
−−→
about Q. By Exercise 3.9.7 s2 s1 corresponds to a translation by 2P Q. In order to obtain a reflection
through another vertical line besides L1 or L2 , or a rotation about another point besides P or Q,
we can translate the strip to center it on P and Q, apply the desired transformation (s1 , s2 , s1 s3 ,
or s2 s3 ), and then translate back. For example, the rotation by an angle of π about Q3 , can be
described by (s2 s1 )2 s2 s3 (s2 s1 )−2 . This shows that our choice of generators is sufficient to generate
G.
Next, we need to check that we have found all the relations. The relations s2i = 1 are obvious.
The last two relations are from the fact that s1 s3 and s2 s3 are rotations by π so have order 2. Note
= Z is the infinite subgroup of translation and hs1 , s2 i ∼
that hs1 s2 i ∼ = D∞ (see Example 3.3.8). From
the relation, (s1 s3 )2 = 1, we see that

s1 s3 = (s1 s3 )−1 = s−1 −1


3 s1 = s3 s1

because s21 = 1 and s23 = 1. Hence, s3 commutes with s1 . Similarly s3 commutes with s2 . Hence,
all elements in G can be written as an alternating string of s1 and s2 or an alternating string of s1
and s2 followed by s3 . For example, rotation by π about Q3 is

(s2 s1 )2 s2 s3 (s2 s1 )−2 = s2 s1 s2 s1 s2 s3 s1 s2 s1 s2 = s2 s1 s2 s1 s2 s1 s2 s1 s2 s3 = (s2 s1 )4 s2 s3 .

If we set t = s2 s1 , then every element can be written as


−−→
• tk for k ∈ Z (translation by 2k P Q);
−−→
• tk s1 for k ∈ Z≤0 (reflection through the line k P Q from L1 );
−−→
• tk s2 for k ∈ Z≥0 (reflection through the line k P Q from L2 );
−−→
• tk s3 for k ∈ Z (reflection through L3 composed with a translation by 2k P Q);
3.9. GROUPS IN GEOMETRY 139

−−→
• tk s1 s3 for k ∈ Z≤0 (rotation through the point k P Q from P ); or
−−→
• tk s2 s3 for k ∈ Z≥0 (rotation through the point k P Q from Q).

This gives us a full description of all elements in G and it also shows that there exist no relations in
G not implied by those in the presentation in (3.11). 4

Following the terminology in Section 3.1.3, if a pattern has a Frieze group G of symmetry, we
call a subset of the pattern a fundamental region (or fundamental pattern) if the entire pattern is
obtained from the fundamental region by applying elements from G and that this region is minimal
among all subsets that generate the whole pattern. For example, a fundamental region for the
pattern in Example 3.9.10 is the following.

Frieze patterns are ubiquitous in artwork and architecture throughout the world. Figures 3.15
through 3.18 show a few such patterns.

Figure 3.15: A traditional Celtic border

Figure 3.16: A 17th century European chair rail

Figure 3.17: A modern wood molding

3.9.3 – Wallpaper Groups


Though wallpaper designers could use a frieze pattern to cover an entire wall, it is more common
to use a fundamental pattern that is bounded (or in practice small with respect to the size of the
wall). Then to cover this entire plane, we would require two independent directions of translation.
140 CHAPTER 3. GROUPS

Figure 3.18: The border on a Persian rug

(a) Starfish (b) Flowers

Figure 3.19: Some wallpaper patterns

Definition 3.9.11
A wallpaper group is a subgroup of the group of Euclidean plane isometries whose subgroup
of translations is isomorphic Z ⊕ Z.

Consider for example, Figures 3.19a and 3.19b. These two figures illustrate patterns covering
the Euclidean plane, each preserved by a different wallpaper group.
To first see that each is a wallpaper group, notice that in Figure 3.19a, the fundamental region
consisting of two starfish

can be translated to an identical pattern along any translation ~t = a~ı + b~, where a, b ∈ Z and ~ı
corresponds to one horizontal unit translation and ~ to one vertical unit. Clearly, the translation
subgroup of isometries preserving the pattern is Z ⊕ Z. In Figure 3.19b, a flower can be translated
into any other flower (ignoring shading) by
√ !
3 1
a ~ı + ~ + b~ for a, b ∈ Z.
2 2
3.9. GROUPS IN GEOMETRY 141

The two unit translation vectors involved are linearly independent and again the translation subgroup
for this pattern of symmetry is isomorphic to Z ⊕ Z.
To see that the wallpaper groups for Figures 3.19a and 3.19b are not isomorphic, observe that in
Figure 3.19b, the pattern is preserved under a rotation by π/3 around any center of a flower, while
Figure 3.19a is not preserved under any isometry of order 6.
We point out that Figure 3.19a displays an interesting isometry, called a glide reflection. We can
pass from one fundamental region to another copy thereof via a reflection through a line, followed
by a vertical translation.

The glide reflection composed with itself is equal to the vertical translation with distance equal
to the least vertical gap between identical regions. Note that in a glide reflection, the translation
(glide) is always assumed to be parallel to the line through which the reflection occurs. One must
still specify the length of the translation. For example, a glide reflection f through the x-axis with
a translation distance of +2 has for equations
        
x 1 0 x 2 x+2
f = + = .
y 0 −1 y 0 −y

The Dutch artist M. C. Escher (1898–1972) is particularly well-known for his exploration of
interesting artwork involving wallpaper symmetry groups.1 Part of the genius of his artwork resided
in that he devised interesting and recognizable patterns that were tessellations, patterns in which,
unlike Figure 3.19a, there is no blank space. In a tessellation, the fundamental region tiles and
completely covers R2 .
It is possible to classify all wallpaper groups. Since this section only offers a glimpse of applica-
tions of group theory to geometry, we do not offer a proof here or give the classification.

Theorem 3.9.12 (Wallpaper Group Theorem)


There exist exactly 17 nonisomorphic wallpaper groups.

Over the centuries, the study of symmetry patterns of the plane has drawn considerable interest
both by artists and mathematicians alike. Many books study this topic from a variety of directions.
From a geometer’s perspective, [32] offers a careful and encyclopedic analysis of planar symmetry
patterns.
Frieze groups and wallpaper groups are examples of crystallographic groups. A crystallographic
group of Rn is a subgroup G of the group of isometries on Rn such that the subgroup of translations
in G is isomorphic to Zn . The adjective “crystallographic” is motivated by the fact that regular
crystals fill Euclidean space in a regular pattern whose translation subgroup is isomorphic to Z3 .

Exercises for Section 3.9


1. Calculate the equations of transformation for the rotation by π/3 about the point (3, 2).
2. Determine the equations of transformation for the reflection through the line that makes an angle of
π/6 with the x-axis and goes through the point (3, 4).
3. Determine the equations of transformation for the reflection through the line 2x + 3y = 6.
4. Determine the equations of the isometry of reflection-translation through the line x = y with a
translation distance of 1 (in the positive x-direction).
1 M. C. Escher’s official website: http://www.mcescher.com/.
142 CHAPTER 3. GROUPS

5. We work in R3 . Show by direct matrix calculations that rotation by θ about the x-axis, followed by
rotation by α about the y-axis is not generally equal to the rotation by α about the y-axis followed
by rotation by θ about the x-axis.
6. Use Theorem 3.9.5 to prove the claim that it suffices to know how an isometry f maps three non-
collinear points to know f uniquely.
7. Let L1 and L2 be two parallel lines in the Euclidean plane. Prove that reflection through L1 followed
by reflection through L2 is a translation of vector ~v that is twice the perpendicular displacement from
L1 to L2 .
8. Let L1 and L2 be two lines in the plane that are not parallel. Prove that reflection through L1 followed
by reflection through L2 is a rotation about their point of intersection of an angle that is double the
angle from L1 to L2 (in a counterclockwise direction).
9. Let A = (0, 0), B = (1, 0), and C = (0, 1). Determine the isometry obtained by composing a rotation
about A of angle 2π/3, followed by a rotation about B of angle 2π/3, followed by a rotation about C
of angle 2π/3. Find the equations of transformation and describe it in simpler terms.
10. Let f be the isometry of rotation by α about the point A = (a1 , a2 ) and let g be the isometry of
rotation by β about the point B = (b1 , b2 ). Show that f ◦ g may be described by a rotation by α + β
about B followed by a translation and give this translation vector.
11. Let f be the plane Euclidean isometry of rotation by α about a point A and let t be a translation by a
vector ~v . Prove that the conjugate f ◦ t ◦ f −1 is a translation and determine this translation explicitly.
12. Let g be the plane Euclidean isometry of reflection through a line L and let t be a translation by a
vector ~v . Prove that the conjugate g ◦ t ◦ g −1 is a translation and determine this translation explicitly.
13. Prove that orthogonal matrices have determinant 1 or −1. Prove also that

SO(n) = {A ∈ O(n) | det(A) = 1}

is a subgroup of O(n) and hence of GLn (R). [The subgroup SO(n) is called the special orthogonal
group.]
14. Prove that SO(2) (see Exercise 3.9.13) consists of matrices
  
cos θ − sin θ
θ ∈ [0, 2π) .
sin θ cos θ

15. Prove that the function dt : R2 × R2 → R given by

dt ((x1 , y1 ), (x2 , y2 )) = |x2 − x1 | + |y2 − y1 |

satisfies the conditions of a metric on R2 as defined in Definition 3.9.2. Prove also that the set of
surjective isometries for dt is a group and show that it is not equal to the group of Euclidean isometries.
[Hint: This metric on R2 is called the taxi metric because it calculates that distance a taxi would
travel between two points if it could only drive along north-south and east-west streets.]
16. Find a presentation for the frieze group associated to the pattern in Figure 3.15.
17. Find a presentation for the frieze group associated to the pattern in Figure 3.16.
18. Find a presentation for the frieze group associated to the pattern in Figure 3.18. Show that it is the
same as the frieze group associated to the pattern in Figure 3.17.
19. Find a presentation for the frieze group associated to the following pattern and then sketch the
fundamental pattern.

20. (*) Prove that there are only 7 nonisomorphic frieze groups.
21. Prove that the wallpaper group for the following pattern is not isomorphic to the wallpaper groups
for either Figure 3.19a or 3.19b.
3.9. GROUPS IN GEOMETRY 143

22. Prove that the wallpaper group for the following pattern is not isomorphic to the wallpaper groups
for either Figure 3.19a or 3.19b.

For Exercises 3.9.23 through 3.9.26, sketch a reasonable portion of the pattern generated by the following fun-
damental pattern and the group indicated in each exercise. Assume the minimum distance for any translation
is 2.

23. The wallpaper group corresponding to Figure 3.19a.


24. The wallpaper group corresponding to Figure 3.19b.
25. The wallpaper group corresponding to the pattern in Exercise 3.9.29.
26. The wallpaper group corresponding to the pattern in Exercise 3.9.21.
27. Find a presentation for the wallpaper group corresponding to Figure 3.19a.
28. Find a presentation for the wallpaper group corresponding to the pattern in Exercise 3.9.22.
29. Find a presentation for the wallpaper group corresponding to the wallpaper pattern here below.
144 CHAPTER 3. GROUPS

3.10
Diffie-Hellman Public Key
3.10.1 – A Brief Background on Cryptography
In this section, we will study an application of group theory to cryptography, the science of keeping
information secret.
Cryptography has a long history, with one of the first documented uses of cryptography attributed
to Caesar. When writing messages he wished to keep in confidence, the Roman emperor would shift
each letter by 3 to the right, assuming the alphabet wraps around. In other words, he would
substitute a letter of A with D, B with E and so forth, down to replacing Z with C. To anyone
who intercepted the modified message, it would look like nonsense. This was particularly valuable
if Caesar thought there existed a chance that an enemy could intercept orders sent to his military
commanders.
After Caesar’s cipher, there came letter wheels in the early Renaissance, letter codes during the
American Civil War, the Navajo windtalkers during World War II, the Enigma machine used by
the Nazis, and then a whole plethora of techniques since then. Military uses, protection of financial
data, and safety of intellectual property have utilized cryptographic techniques for centuries. For
a long time, the science of cryptography remained the knowledge of a few experts because both
governments and companies held that keeping their cryptographic techniques secret would make it
even harder for “an enemy” to learn one’s information security tactics.
Today, electronic data storage, telecommunication, and the Internet require increasingly complex
cryptographic algorithms. Activities that are commonplace like conversing on a cellphone, opening
a car remotely, purchasing something online, all use cryptography so that a conversation cannot be
intercepted, someone else cannot easily unlock your car, or an eavesdropper cannot intercept your
credit card information.
Because of the proliferation of applications of cryptography in modern society, no one should
assume that the cryptographic algorithm used in any given instance remains secret. In fact, modern
cryptographers do not consider an information security algorithm at all secure if part of its effective-
ness relies on the algorithm remaining secret. But not everything about a cryptographic algorithm
can be known to possible eavesdroppers if parties using the algorithm hope to keep some message
secure. Consequently, most, if not all, cryptographic techniques involve an algorithm but also a
“key,” which can be a letter, a number, a string of numbers, a string of bits, a matrix or some other
mathematical object. The security of the algorithm does not depend on the algorithm staying secret
but rather on the key remaining secret. Users can change keys from time to time without changing
the algorithm and have confidence that their messages remain secure.
A basic cryptographic system involves the following objects.

(1) A message space M. This can often be an n-tuple of elements from some alphabet A (so
M = An ) or any sequence from some alphabet A (so M = AN ). The original message is
called plaintext.

(2) A ciphertext space C. This is the set of all possible hidden messages. It is not uncommon for
C to be equal to M.

(3) A keyspace K that provides the set of all possible keys to be used in a cryptographic algorithm.

(4) An encryption procedure E ∈ Fun(K, Fun(M, C)) such that for each key k ∈ K, there is an
injective function Ek : M → C.

(5) A decryption procedure D ∈ Fun(K, Fun(C, M)) such that for each key k ∈ K, there exists a
key k 0 ∈ K, with a function Dk0 : C → M satisfying

Dk0 (Ek (m)) = m for all m ∈ M.


3.10. DIFFIE-HELLMAN PUBLIC KEY 145

In many algorithms, k 0 = k but that is not necessarily the case. (The requirement that Ek be
injective makes the existence of Dk0 possible.)

In an effective cryptographic algorithm, it should be very difficult to recover the keys k or k 0 given
just ciphertext c = Ek (m) (called a “ciphertext only attack”) or even given ciphertext c = Ek (m)
and the corresponding plaintext m (called a “ciphertext and known plaintext attack”).
From the mathematician’s viewpoint, it is interesting that all modern cryptographic techniques
rely on number theory and advanced abstract algebra that is beyond the understanding of the vast
majority of people. Companies involved in designing information security products or protocols
must utilize advanced mathematics.
Now imagine that you begin a communication with a friend at a distance (electronically or
otherwise) and that other people can listen in on everything that you communicate to each other.
Would it be possible for that communication to remain secret? More specifically, would it be
possible to together choose a key k (for use in a subsequent cryptographic algorithm) so that people
who are eavesdropping on the whole communication do not know what that key is. It seems very
counterintuitive that this should be possible but such algorithms do exist and are called public key
cryptography techniques.
In this section, we present the Diffie-Hellman protocol. Devised in 1970, it was one of the first
public key algorithms. An essential component of the effectiveness of the Diffie-Hellman protocol is
the Fast Exponentiation Algorithm.

3.10.2 – Fast Exponentiation


Let G be a group, let g be an element in G, and let n be a positive integer. To calculate the power
g n , one normally must calculate
n times
g n = g · g · · · g,
z }| {

which involves n − 1 operations. (If fact, when we implement this into a computer algorithm, since
we must take into account the operation of incrementing a counter, the above direct calculation
takes a minimum of 2n − 1 computer operations.) If the order |g| and the power n are large, one
may not notice any patterns in the powers of g that would give us any shortcuts to determining g n
with fewer than n − 1 group operations.
The Fast Exponentiation Algorithm allows one to calculate g n with many fewer group operations
than n, thus significantly reducing the calculation time.

Algorithm 3.10.1: FastExponentiation(g, n)

(bk bk−1 · · · b1 b0 )2 ← ConvertToBinary(n)


x←g
for i ← (k − 1) downto 0
if bi = 0
do then x ← x2
else x ← x2 g

return (x)

The reason that x has the value of g n at the end of the for loop is because when the algorithm
terminates,
k k−1
x = g bk 2 +bk−1 2 +···+b1 2+b0 ,
which is precisely g n . Note that if we write n in binary with bits n = (bk bk−1 · · · b1 b0 )2 , then there
is an assumption that bk = 1.
Each time through the for loop, we do either one or two group operations. Hence, we do at
most 2k = 2blog2 nc groups operations. In practice, when implementing this algorithm, getting the
146 CHAPTER 3. GROUPS

binary expansion for n takes k + 1 = blog2 nc + 1 operations (integer divisions) and the operation of
decrementing the counter i takes a total of k operations. This gives a total of at most 4blog2 nc + 1
computer operations.

Example 3.10.1. Let G = U (311). Note that 311 is a prime number. We propose to calculate 7̄39 .
The binary expansion of 39 is 39 = (100111)2 = 25 + 22 + 21 + 20 . Following the steps of the
algorithm, we

• assign x := 7̄;

• for i = 4, since b4 = 0, assign x := x2 = 7̄2 = 49;

2
• for i = 3, since b3 = 0, assign x := x2 = 49 = 224;

2
• for i = 2, since b2 = 1, assign x := 7̄x2 = 7̄ × 224 = 113;

2
• for i = 1, since b1 = 1, assign x := 7̄x2 = 7̄ × 113 = 126;

2
• for i = 0, since b0 = 1, assign x := 7̄x2 = 7̄ × 126 = 105.

We conclude that in U (311), we have 7̄39 = 105. 4

In the above example, we performed 8 group calculations as opposed to the necessary 38 had
we simply multiplied 7̄ to itself 38 times. This certainly sped up the process for calculating 7̄39 by
hand. However, running a for loop with 39 iterations obviously does not come close to straining a
computer’s capabilities.

Example 3.10.2. Let G = U (435465768798023). Again, 435465768798023 is a prime number


(though that is not necessary for the algorithm). We propose to calculate

1234567890123
379

in U (435465768798023). Using the standard way of finding powers, it would take 1234567890122
operations in G, a number that begins to require a significant computing time. However, using the
Fast Exponentiation Algorithm, we only need to do at most 2(blog2 1234567890123c + 1) = 82 group
operations. For completeness, we give the result of the calculation
1234567890123
379 = 370162048004176. 4

Both of the above examples involve groups of the form U (p) but Fast Exponentiation applies in
any group.

3.10.3 – Diffie-Hellman Algorithm


/Diffie-Hellman protocol Diffie-Hellman is a protocol (an algorithm involving two or more agents)
for two people who communicate publicly and never get together in secret to select a key in such a
way that an eavesdropper cannot easily determine what that key is.
In order to describe how Diffie-Hellman works, we will introduce three players: Alice, Bob, and
Eve. Alice and Bob want to talk secretly while Eve wants to eavesdrop on their conversation. We
assume that Eve can hear everything that Alice and Bob say to each other. In the following diagram,
that which is boxed stays secret to the individual and that which is not boxed is heard by everyone,
including Eve. The Diffie-Hellman public key protocol works as follows.
3.10. DIFFIE-HELLMAN PUBLIC KEY 147

(1) Choose group G and element g ∈ G

Send g a
(2) Choose a ∈ N
Send g b
(3) Choose b ∈ N

(4) Calculate (g b )a
(5) Calculate (g a )b

(6) Use key g ab (6) Use key g ab

(1) Alice (on the left) and Bob (on the right) settle on a group G and a group element g, called
the base. Ideally, the order of g should be very large. (If you are doing calculations by hand,
the order of g should be in the hundreds. If we are using computers, the order of g is ideally
larger than what a typical for loop runs through, say 1015 or much more.)

(2) Alice chooses a relatively large integer a and sends to Bob the group element g a , calculated
with Fast Exponentiation.

(3) Bob chooses a relatively large integer b and sends to Alice the group element g b , calculated
with Fast Exponentiation.

(4) Alice calculates (g b )a = g ab using Fast Exponentiation.

(5) Bob calculates (g a )b = g ab using Fast Exponentiation.

(6) Alice and Bob will now use the group element g ab as the key.

The reason why Eve cannot easily figure out g ab is simply a matter of how long it would take
her to do so. We should assume that Eve intercepts the group G, the base g, the element g a , and
the element g b . However, Eve does not know the integers a or b. In fact, Alice does not know b and
Bob does not know a. The reason why this is safe is that Fast Exponentiation makes it possible to
calculate powers of group elements quickly while, on the other hand, the problem of determining n
from knowing g and g n could take as long as n. If the power n is very large, it simply takes far too
long. Not knowing a or b, Eve cannot quickly determine the key g ab if she only knows g, g a , and g b .
The fact that the security of the Diffie-Hellman public key exchange relies on the speed of one
calculation versus the relative slowness of a reverse algorithm may seem unsatisfactory at first.
However, suppose that it takes only microseconds for Alice and Bob to establish the key g ab but
centuries (at the present computing power) for Eve to recover a from g and g a . The secret message
between Alice and Bob would have lost its relevance long before Eve could recover the key.
The problem of finding n given g and g n in some group is called the Discrete Log Problem. Many
researchers in cryptography study the Discrete Log Problem, especially in groups of the form U (n),
for n ∈ N. For reasons stemming from modular arithmetic, there are algorithms that calculate n
from g and g n using fewer than n operations. However, Diffie-Hellman using some other groups,
do not have such weaknesses. A popular technique called elliptic curve cryptography is one such
example of the Diffie-Hellman protocol that does not possess some of the weaknesses of U (n).
In any Diffie-Hellman public key exchange, some choices of group G and base g are wiser than
others. It is preferable to choose a situation in which |g| is high. Otherwise, an exhaustive calculation
on the powers of g would make a brute-force solution to the Discrete Log Problem a possibility for
148 CHAPTER 3. GROUPS

Eve. Furthermore, given a specific g, some choices of a and b are not wise either. For example, if g a
(or g b ) has a very low order, then g ab would also have a very low order. Then, even if Eve does not
know the key for certain, she would only be a small number of possible keys in the form (g a )k .

3.10.4 – ElGamal Encryption


ElGamal encryption is a cryptographic protocol based on Diffie-Hellman. We describe now with the
assumption that Alice wants to send an encrypted message to Bob.

(1) Alice and Bob settle on a group G and on the base, a group element g. The plaintext space
M can be any set, the ciphertext space C will be sequences of elements in G, and the keypsace
is K = G.
(2) Alice and Bob also choose a method of encoding the message into a sequence of elements in
G, i.e., they choose an injective function h : M → Fun(N∗ , G).
(3) They run Diffie-Hellman to obtain their public key k = g ab .
(4) Alice encodes her message m with a sequence of group elements h(m) = (m1 , m2 , . . . , mn ).
(5) Alice sends to Bob the group elements Ek (m) = (km1 , km2 , . . . , kmn ) = (c1 , c2 , . . . , cn ). Note
that for each i, we have ci = g ab mi .
(6) To decipher the ciphertext, we have k 0 = k = g ab . Bob calculates the mi from mi = ci k −1 =
ci (g ab )−1 .
[We point out that with a group element of large order, it is not always obvious how to
determine the inverse of a group element. We use Corollary 4.1.12, which says that g |G| = 1.
Hence, to calculate g −ab , without knowing a but only knowing g a , using Fast Exponentiation,
Bob calculates
(g a )|G|−b = g a|G|−ab = g −ab .]

(7) Since h is injective, Bob can find Alice’s plaintext message m from (m1 , m2 , . . . , mn ).

Example 3.10.3. We use the group G = U (3001) and choose for a base the group element g = 2̄.
(It turns out that |g| = 1500. In general, one does not have to know the order of g, but merely
hope that it is high.) The message space is the set of sequences M = Fun(N∗ , A) where A is the set
consisting of the 26 English letters and the space character.
Alice and Bob decide to encode their messages (define h : M → Fun(N∗ , G)) as follows. Ignore all
punctuation, encode a space with the integer 0 and each letter of the alphabet with its corresponding
ordinal, so A is 1, B is 2, and so on, where we allow for two digits for each letter. Hence, a space
is actually 00 and A is 01. Then group pairs of letters simply by adjoining them to make (up to) a
four-digit number. Thus, “GOOD-BYE CRUEL WORLD” becomes the finite sequence

715, 1504, 225, 500, 318, 2105, 1200, 2315, 1812, 400

where we completed the last pair with a space. We now view these numbers as elements in U (3001).
Alice chooses a = 723 while Bob chooses b = 1238. In binary, a = (1, 0, 1, 1, 0, 1, 0, 0, 1, 1)2 and
b = (1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0)2 . Using Fast Exponentiation, Alice calculates that g a = 2̄723 = 1091
and Bob calculates that g b = 2̄1238 = 1056.
723
Alice just wants to say, “HI BOB.” She first calculates g ab = 1056 = 2442. Her corresponding
ab
message in group elements is: 809, 2, 1502. The ciphertext mi g for i = 1, 2, 3 is the string of group
elements: 920, 1883, 662.
1762
On his side, Bob now first calculates the element (g a )|G|−b = 1091 = 102. The deciphered
code is
920 × 102 = 809, 1883 × 102 = 2̄, 662 × 102 = 1502.
Bob then easily recovers “HI BOB” as the original message. 4
3.10. DIFFIE-HELLMAN PUBLIC KEY 149

In the following exercises, the reader should understand that the situations presented all are
small enough to make it simple for a computer to perform a brute force attack to find a from g a
or b from g b . In real applications, the group G, the base g, the elements a and b are chosen large
enough by parties so make the Discrete Log Problem not tractable using a brute force attack.

Exercises for Section 3.10


1. Show all your steps as you perform Fast Exponentiation to calculate by hand 5̄73 in the group U (153).
2. Show all your steps as you perform Fast Exponentiation to calculate by hand 2̄111 in the group U (501).
3. Show all your steps as you perform Fast Exponentiation to calculate by hand 3̄795 in the group U (7703).
4. Consider the group G = GL2 (F199 ). Show all your steps as you perform Fast Exponentiation to
calculate by hand
 42
0̄ 23
.
10 3
5. Show all your steps as you perform Fast Exponentiation to calculate by hand
 9
3 2
1 1

in GL2 (R).
6. Suppose that Alice and Bob decide to use the group U (3001) as their group G and the element 5̄ as
the base. If Alice chooses a = 73 and Bob chooses b = 129, then what will be their common key using
the Diffie-Hellman public key exchange algorithm?
7. Suppose that Alice and Bob decide to use the group U (4237) as their group G and the element 11
as the base. If Alice chooses a = 100 and Bob chooses b = 515, then what will be their common key
using the Diffie-Hellman public key exchange algorithm?

In Exercises 3.10.8 to 3.10.11, use the method in Example 3.10.3 to take strings of letters to strings of
numbers. Each time, use the same method to change a letter to a numbers and only collect two letters to
make an integer that is at most 4 digits.
8. Play the role of Alice. Use the group G = U (3001) and the base g = 2̄. Change a and b to a = 437
and b = 1000. Send to Bob the ciphertext for the following message: “MEET ME AT DAWN”
9. Play the role of Alice. Use the group G = U (3001) but use the base g = 7̄. Bob sends you g b = 2442
and you decide to use a = 2319. Send to Bob the ciphertext for the following message: “SELL ENRON
STOCKS NOW”
10. Play the role of Bob. Use the group G = U (3517) and use the base g = 11. Alice sends you g a = 1651
and you tell Alice you will use b = 789. You receive the following ciphertext from Alice:

369, 665, 1860, 855, 3408, 1765, 1820, 1496.

Show the steps using fast exponentiation to recover the decryption key g −ab and use this to recover
the plaintext for the message that Alice sent you.
11. Play the role of Bob. Use the group G = U (7522) and use the base g = 3. Alice sends you g a = 2027
and you tell Alice you will use b = 2013. You receive the following ciphertext from Alice:

4433, 5198, 6996, 3275, 7067, 2568, 1894, 6037, 7208.


−ab
Show the steps using fast exponentiation to recover the decryption key g and use this to recover
the plaintext for the message that Alice sent you.
12. We design the following Diffie-Hellman/ElGamal setup to encipher strings of bits (0 or 1). We will
chose a prime number p with 210 < p < 211 and use the group G = U (p) and choose a base g ∈ G. We
break up the string of bits into blocks of 10 bits and map a block of ten bits into a number between
0 and 1023 by using a binary expression, hence,

(b0 , b1 , . . . , b9 ) −→ b0 + b1 · 2 + · · · + bk · 2k + · · · + b9 · 29 = mi .

This is a plaintext unit mi and we view it as an element in U (p). With this setup to convert strings
of bits to elements in U (p), we then apply the usual Diffie-Hellman key exchange and the ElGamal
150 CHAPTER 3. GROUPS

encryption. As one extra layer, when Alice sends the cipher text to Bob, she writes it as a bit string,
but with the difference that since numbers c = mg ab can be expressed uniquely as an integer less than
211 , blocks of 10 bits become blocks of 11 bits.
Here is the exercise. You play the role of Alice. You and Bob decide to use p = 1579 and the base
of g = 7. Bob sends you g b = 993 and you decide to use a = 78. Show all the steps to create the
Diffie-Hellman key g ab . Use this to create the ciphertext as described in the previous paragraph for
the following string of bits:
1011010110 1011111001 1111100010.
Show all the work.
13. Use the setup as described in Exercise 3.10.12 to encipher bit strings. This time, however, you play
the role of Bob. You and Alice use the group U (1777) and the base g = 10, Alice sends you g a = 235,
and you choose to use b = 1573. Alice has sent you the following string of bits:

00100110110 01110101001 10011011100.

Show all the steps to create the Diffie-Hellman decryption key g −ab . Turn the ciphertext into strings of
elements in U (1777). By multiplying by g −ab , recover the list of elements in U (1777) that correspond
to plaintext. (These should be integers mi with 0 ≤ mi ≤ 1023.) Convert them to binary to recover
the plaintext message in bit strings.
14. We design the following application of Diffie-Hellman. We choose to encrypt 29 characters: 26 letters
of the alphabet, space, the period “.”, and the comma “,”. We associate the number 0 to a space
character, 1 through 26 for each of the letters, 27 to the period, and 28 to the comma. We use the
group G = GL2 (F29 ), the general linear group on modular arithmetic base 29. Given a message in
English, we write the numerical values of the characters in a 2 × n matrix, reading the characters of
the alphabet by successive columns. Hence, “SAY FRIEND AND ENTER” would become the matrix
 
19 25 6 9 14 0 14 0 14 5
M= ∈ M2×10 (F29 )
1 0 18 5 4 1 4 5 20 18

where we have refrained from putting the congruence bars over the top of the elements only for brevity.
Then given a key K ∈ GL2 (F 29 ), we
 encrypt the message into ciphertext by calculating the matrix
3 4
C = KM . Hence, with K = the ciphertext matrix becomes
5 9
 
3 17 3 18 0 4 0 20 6 0
C=
17 9 18 3 19 9 19 16 18 13

and “CQQICRRI SDI STPFR M” is the ciphertext message in characters. [Note that this enciphering
scheme is not an ElGamal enciphering scheme as described in Section 3.10.4. Here, C = M and we do
not use a function h so that the enciphering function Ek does not involve products in the group G.]
Here is the exercise. You play the role of Alice. You and Bob decide on the above enciphering scheme.
You will chosethe key
 K in the usual Diffie-Hellman
 manner. You use the group G = GL2 (F29 ) and
1 2 b 27 24
the base g = . Bob sends you g = and you choose to use a = 17. Calculate the
3 5 7 17
Diffie-Hellman key and use this to determine the ciphertext corresponding to “COME HERE, NOW.”
(Since there is an odd number of characters in the message, append a space on the end to make a
message of even length.)
15. We use the communication protocol as described in Exercise 3.10.14. Youuse the same  group and the
a 5 27
same base but this time you play the role of Bob. Alice sends you g = and you chose to
26 1
use b = 12. After you send your g b to Alice, she then creates the public key and sends you the message
“LPSKQIMBW.ECRBHL” in ciphertext. Show all the steps with fast exponentiation to calculate
the deciphering key g −ab and recover the plaintext message. (Note that since we know how to take
inverses of matrices in GL2 (F28 ), it suffices to calculate g ab and then find the inverse as opposed to
calculating (g a )|G|−b .)
16. We design the following Diffie-Hellman/ElGamal setup. We choose to encrypt 30 characters: 26 letters
of the alphabet, space, the period “.”, the comma “,” and the exclamation point “!”. We associate
the number 0 to a space character, 1 through 26 for each of the letters, 27 to the period, 28 to the
3.11. SEMIGROUPS AND MONOIDS 151

comma and 29 to “!”. We choose to compress triples of characters as follows: (b1 , b2 , b3 ), where each
bi ∈ {0, 1, . . . , 29}, corresponds to the number

b1 × 302 + b2 × 30 + b3 .

The resulting possible numbers are between 0 and 303 − 1 = 26, 999. Now, the smallest prime bigger
than 303 is p = 27011. We will work in the group U (27011) and we will view messages as sequences
of elements in G encoded as described above. For example: “HI FRANK!” uses the compression of

8 × 302 + 9 × 30 + 0 = 7, 470 6 × 302 + 18 × 30 + 1 = 5, 941 14 × 30a = 2 + 11 × 30 + 29 = 12, 959

and hence corresponds to the sequence in G of 7470, 5941, 12959.


Here’s the exercise. You are Bob. Alice and you use the system described above and we use the base
of g = 2̄. Alice wants to send a message to you. She first sends her g a = 5, 738 ∈ G. You (Bob) chose
the integer b = 10, 372 and send back g b = 255. You receive the following ciphertext from Alice:

11336 8377 17034 688 1031 13929.

Using Fast Exponentiation, determine the inverse of the public key, g −ab . Decipher the sequence of
numbers corresponding to Alice’s plaintext message. From the message coding scheme, determine
Alice’s original message (in English).
17. Use the text to strings of groups elements as described in Exercise 3.10.16. Use the same group
G = U (27011) but use the base g = 5. Play the role of Alice. Select a = 10, 000 while Bob sends you
g b = 15128. Show all the steps to create the string of elements in G that are the ciphertext for the
message “I WILL SURVIVE.”

3.11
Semigroups and Monoids
In this section, we present two more algebraic structures closely related to groups. They possess
value in their own right but we present them here to illustrate more examples of algebraic structures.
Monoids in particular are regularly studied in far more detail than our brief overview. Consequently,
for each structure we follow the outline in the preface of this book.

3.11.1 – Semigroups

Definition 3.11.1
A semigroup is a pair (S, ◦), where S is a set and ◦ is an associative binary operation on S.

Having introduced groups already, we can see that a semigroup resembles a group but with only
the associativity axiom. Note that Proposition 2.3.7 holds in any semigroup. Obviously, every group
is a semigroup.
In every semigroup (S, ◦), because of associativity, the order in which we group the operations
in an expression of the form a ◦ a ◦ · · · ◦ a does not change the result. Hence, we denote by ak the
k times
z }| {
(unique) element a ◦ a ◦ · · · ◦ a.

Example 3.11.2. The set of positive integers equipped with addition (N>0 , +) is a semigroup. 4

Example 3.11.3. All integers equipped with multiplication (Z, ×) is also a semigroup. That not
all elements have inverses prevented (Z, ×) from being a group but that does not matter for a
semigroup. 4
152 CHAPTER 3. GROUPS

R2
R 1 ∩ R2

R1

Figure 3.20: Operation in Example 3.11.5

Example 3.11.4. Let S = Z and suppose that a ◦ b = max{a, b}. It is easy to see that for all
integers a, b, c,

a ◦ (b ◦ c) = max{a, max{b, c}} = max{a, b, c} = max{max{a, b}, c} = (a ◦ b) ◦ c.

Hence, (Z, ◦) is a semigroup. There is no integer e such that ∀a ∈ Z, max{e, a} = a, because r


would need to be less than every integer. Hence, (Z, ◦) does not have an identity and consequently
it cannot have inverses to elements. 4

Example 3.11.5. Consider the following set of rectangles in the plane

S = {[a, b] × [c, d] ⊆ R2 | a, b, c, d ∈ R}.

Note that this includes finite vertical (if a = b) or horizontal (if c = d) lines as well as the empty set
∅ (if b < a of d < c). The intersection of two elements in S is

([a1 , b1 ] × [c1 , d1 ]) ∩ ([a2 , b2 ] × [c2 , d2 ]) = [max(a1 , a2 ), min(b1 , b2 )] × [max(c1 , c2 ), min(d1 , d2 )]

so ∩ is a binary operation on S. (See Figure 3.20.) We know that ∩ is associative so the pair (S, ∩)
is a semigroup. As in Example 3.11.4, there is no identity element. If an identity element U existed,
then U ∩ R = R for all R ∈ S. Hence, R ⊆ U for all R ∈ S and thus,
!
[
R = R2 ⊆ U.
R∈S

Hence, U = R2 but R2 is not an element of S. 4

Out of the outline given in the preface for the study of different algebraic structures, we briefly
mention direct sum semigroups, subsemigroups, generators, and homomorphisms.

Definition 3.11.6
Let (S, ◦) and (T, ?) be two semigroups. Then the direct sum semigroup of (S, ◦) and (T, ?)
is the pair (S × T, ·), where S × T is the usual Cartesian product of sets and where

(s1 , t1 ) · (s2 , t2 ) = (s1 ◦ s2 , t1 ? t2 ).

The direct sum of S and T is denoted by S ⊕ T .

Definition 3.11.7
A subset A a semigroup (S, ◦) is called a subsemigroup if a ◦ b ∈ A for all a, b ∈ A.
3.11. SEMIGROUPS AND MONOIDS 153

A subsemigroup is a semigroup in its own right, using the operation inherited from the containing
semigroup. In the theory of groups, in order for a subset H of a group G to be a group in its own
right, H needed to be closed under the operation and taking inverses. For semigroups, the condition
on taking an inverse does not apply.

Definition 3.11.8
Let (S, ◦) and (T, ?) be two semigroups. A function f : S → T is called a semigroup
homomorphism if
f (a ◦ b) = f (a) ? f (b) for all a, b ∈ S.
A semigroup homomorphism that is also a bijection is called a semigroup isomorphism.

Example 3.11.9. Let p be a prime number and recall the ordp : N∗ → N function from (2.4). The
ordp function is a semigroup homomorphism from (N∗ , ×) to (N, +). This claim means that for all
a, b ∈ N∗ , we have
ordp (ab) = ordp (a) + ordp (b).
This was proven in Exercise 2.1.26. 4

Example 3.11.10. Consider the following four functions fi : {1, 2, 3} → {1, 2, 3} expressed in chart
notation (see Section 3.4.1) as
       
1 2 3 1 2 3 1 2 3 1 2 3
f1 = , f2 = , f3 = , f4 = .
1 2 2 3 2 3 2 2 2 3 2 2

We claim that the set S = {f1 , f2 , f3 , f4 } together with function composition is a semigroup. We
know that function composition is associative. To prove the claim, we simply need to show that ◦
is a binary operation on S. The table of composition is:

f1 f2 f3 f4
f1 f1 f3 f3 f3
f2 f4 f2 f3 f4
f3 f3 f3 f3 f3
f4 f4 f3 f3 f3

This shows that ◦ is a binary operation on S, making (S, ◦) into a semigroup. 4

This last example illustrates the use of a Cayley table to easily see the operations in a semigroup.
Though we introduced Cayley tables in Section 3.2.1 in reference to groups, it makes sense to discuss
a Cayley table in the context of any finite set S equipped with a binary operation. In such a general
Cayley table, the elements that appear on the top row or leftmost column are the elements of the
set S and all the entries of the table must be other elements of the set S. However, for semigroups,
we see that there need not be a row and column indicating the identity operation. Furthermore,
since a semigroup does not necessarily contain inverses to elements, the Cayley table of a semigroup
is not necessarily a Latin square.

3.11.2 – Monoids

Definition 3.11.11
A monoid is a pair (M, ∗), where S is a set and ∗ is a binary operation on M , such that

(1) associativity: a ∗ (b ∗ c) = (a ∗ b) ∗ c for all a, b, c ∈ M ;


(2) identity: there exists e ∈ M such that a ∗ e = e ∗ a = a for all a ∈ M .
154 CHAPTER 3. GROUPS

Definition 3.11.12
A monoid (M, ∗) is called commutative if a ∗ b = b ∗ a for all a, b ∈ M .

If a monoid is commutative, we typically write the operation with an addition symbol and denote
the identity by 0. Consequently, it is not uncommon to say “let (M, +) be a commutative monoid”
when discussing a commutative monoid.

Definition 3.11.13
A monoid (M, ∗) is said to possess the cancellation property if for all a, b, c ∈ M ,

a ∗ c = b ∗ c ⇐⇒ a = b and
c ∗ a = c ∗ b ⇐⇒ a = b.

It is important to note that a monoid may have the cancellation property without possessing
inverses, as we will see in some examples below.
Example 3.11.14. The nonnegative integers with addition (N, +) is a monoid. The presence of 0
in N gives the nonnegative integers the needed identity. This monoid is commutative. This monoid
also possess the cancellation property. 4
Example 3.11.15. Positive integers with multiplication (N∗ , ×) is a monoid. This possesses the
necessary identity but has no inverses. Note that (N, ×) is also a monoid: Associativity still holds
and because 1 × 0 = 0, so 1 is still an identity for all N. This monoid is commutative.
Observe that (N∗ , ×) has the cancellation property but (N, ×) does not. A counterexample for
the latter is 0 × 1 = 0 × 2 but 1 6= 2. 4
Example 3.11.16. Let A be a set and let F = Fun(A, A) be the set of all functions f : A → A. Since
function composition is associative, (F, ◦) is a (noncommutative) monoid under function composition
◦ with the identity function idA as the identity element. Functions that are bijections have inverses,
but in a monoid not every element must have an inverse. 4
Example 3.11.17. Let (M, ∗) be a monoid. If S, T are subsets of M , then we define
def
S ∗ T = {s ∗ t | s ∈ S and t ∈ T }.
Then (P(M ), ∗) is a monoid. This result is not obvious and we leave it as an exercise for the reader
(Exercise 3.11.10). This is called the power set monoid. 4
Example 3.11.18 (Free Monoid, Monoid of Strings). Let Σ be a set of characters. Consider
the set of finite strings of elements from Σ, including the empty string, denoted by 1. Authors who
work with this structure in the area of theoretical computer science usually denote this set of strings
by Σ∗ . We define the operation of concatenation · on Σ∗ by
def
a1 a2 · · · am · b1 b2 · · · bn = a1 a2 · · · am b1 b2 · · · bn .
The pair (Σ∗ , ·) is a monoid where the empty string is the identity. This is called the free monoid
on the set of characters Σ. 4
The concepts of product structures, subobjects, and homomorphisms are nearly identical as with
groups or semigroups. We give an explicit description here for completeness.

Definition 3.11.19
Let (M, ∗) and (N, ◦) be two monoids. Then the direct sum monoid of (M, ∗) and (N, ◦)
is the pair (M × N, ·), where M × N is the usual Cartesian product of sets and where

(m1 , n1 ) · (m2 , n2 ) = (m1 ∗ m2 , n1 ◦ n2 ).

The direct sum of M and N is denoted by M ⊕ N . The identity in M ⊕ N is (1M , 1N ).


3.11. SEMIGROUPS AND MONOIDS 155

We commonly say that the operation is taken componentwise. Note that as with groups and
other algebraic structures, the product structure generalizes immediately to a product of any finite
number of monoids. In fact, it is also possible to make sense of the direct product of an infinite
collection of monoids but this requires some technical care and we leave it for later.

Definition 3.11.20
A nonempty subset A a monoid (M, ∗) is called a submonoid if a ∗ b ∈ A for all a, b ∈ A
(closed under ∗) and if 1M ∈ A.

A submonoid is a monoid in its own right, using the operation inherited from the containing
monoid. If we compare this definition to Definition 3.5.1, we observe that in Definition 3.5.1 we
required the subgroup to be closed under the operation and closed under taking inverses. Then a
simple proof showed that the identity must be in every subgroup. However, that simple proof involved
taking an inverse. Since elements do not necessarily have inverses in a monoid, the definition of a
submonoid explicitly needed to refer to the identity being in the submonoid.

Definition 3.11.21
Let (M, ∗) and (N, ◦) be two monoids. A function f : M → N is called a monoid homo-
morphism if

(1) f (a ∗ b) = f (a) ◦ f (b) for all a, b ∈ M ;


(2) f (1M ) = 1N .
A monoid homomorphism that is also a bijection is called a monoid isomorphism.

Definition 3.11.22
Let (M, ∗) be a monoid. We define the opposite monoid as the pair (M op , ∗op ) where, as
sets, M op = M and the operation is

a ∗op b = b ∗ a.

Obviously, if (M, ∗) is a commutative monoid then (M op , ∗op ) = (M, ∗). More precisely, the
identity function id : M → M is a monoid isomorphism if and only if the monoid is commutative.

Example 3.11.23 (State Machine). In theoretical computer science, a state machine or semi-
automaton is a triple (Q, Σ, T ) where

• Q is a nonempty set whose elements are called the states;

• Σ is a nonempty set whose elements are called the input symbols;

• T is a function T : Q × Σ → Q called a transition function.

This set theoretic construct models a machine that can possess various states and depending on an
input from Σ changes from one state to another.
Now for every word w ∈ Σ∗ we define a function Tw : Q → Q as follows:

(1) T1 : Q → Q is the identity function;

(2) if w = s ∈ Σ, then Tw (q) = T (q, s) for all q ∈ Q;

(3) if w = s1 s2 · · · sn for si ∈ Σ, then Tw = Tsn ◦ · · · ◦ Ts2 ◦ Ts1 .

The set of functions gives all possible finite compositions of transition functions arising from tran-
sitions produced from one input symbol at a time. This set of functions is denoted by M (Q, Σ, T ).
It is a monoid under function composition and it is called the transition monoid or input monoid .
156 CHAPTER 3. GROUPS

The association t : Σ∗ → M (Q, Σ, T )op defined by t(w) = Tw is a monoid isomorphism. The


function t is a bijection by definition of M (Q, Σ, T ). Furthermore, if w1 = s1 s2 · · · sm and w2 =
σ1 σ2 · · · σn , then

t(w1 · w2 ) = Ts1 s2 ···sm σ1 σ2 ···σn


= Tσn ◦ · · · ◦ Tσ2 ◦ Tσ1 ◦ Tsm ◦ · · · ◦ Ts2 ◦ Ts1
= Tw2 ◦ Tw1
= Tw1 ◦op Tw2 .

This shows that t is a monoid homomorphism. Hence, we have shown t is a monoid isomorphism.
Note that the use of the opposite monoid was necessary primarily because we are reading a word
w = s1 s2 · · · sm from left to right, while we apply the functions Tsi from right to left. 4

3.11.3 – The Grothendieck Construction


We end this section with a description of the Grothendieck construction, which associates a unique
group G to any commutative monoid M such that M is isomorphic to a submonoid of G. This
construction generalizes the algebraic process of passing from the monoid (N, +) of nonnegative
integers to the group (Z, +).
The Grothendieck construction is as follows. Let M be a commutative monoid. Consider the
direct sum monoid M ⊕ M and define an equivalence relation ∼ on M ⊕ M by

(m1 , m2 ) ∼ (n1 , n2 ) ⇐⇒ m1 + n2 + k = m2 + n1 + k for some k ∈ M. (3.12)

We leave the proof that this is an equivalence relation as an exercise. (See Exercise 3.11.19.) The
presence of the “+k for some k ∈ M ” may seem strange but we explain it later. We write [(m1 , m2 )]
as the equivalence class for the pair (m1 , m2 ) and we denote by M f = (M ⊕ M )/ ∼ the set of all
equivalence classes of ∼ on M × M .
Now suppose that (a1 , a2 ) ∼ (b1 , b2 ) and (c1 , c2 ) ∼ (d1 , d2 ). Then

a1 + b2 + k = a2 + b1 + k
c1 + d2 + ` = c2 + d1 + `

for some k, ` ∈ M . Thus,

(a1 + c1 ) + (b2 + d2 ) + (k + `) = (a2 + c2 ) + (b1 + d2 ) + (k + `)

and hence (a1 , a2 ) + (c1 , c2 ) ∼ (b1 , b2 ) + (d1 , d2 ). Thus, the definition


def
[(a1 , a2 )] + [(c1 , c2 )] = [(a1 , a2 ) + (c1 , c2 )] = [(a1 + c1 , a2 + c2 )] (3.13)

is in fact well-defined.

Proposition 3.11.24
The set M f equipped with the operation + defined in (3.13) is a group. The identity in M f
is [(0, 0)] and the inverse of [(m, 0)] is [(0, m)]. Furthermore, if M possesses the cancellation
property, then M is isomorphic to the submonoid {[(m, 0)] ∈ M f | m ∈ M }.

Proof. That + is associative on M f follows from + being associative on M ⊕ M . According to (3.13),


the element [(0, 0)] is an identity on Mf. Furthermore, we notice that (0, 0) ∼ (m, m) for all m ∈ M .
Hence,
[(m, n)] + [(n, m)] = [(m + n, m + n)] = [(0, 0)].
Thus, [(n, m)] is the inverse of [(m, n)].
3.11. SEMIGROUPS AND MONOIDS 157

Suppose now that M possesses the cancellation property. Because of (3.13), the function f : M →
M with f (m) = [(m, 0)] is a monoid homomorphism. The image of f is {[(m, 0)] ∈ M
f f | m ∈ M }.
However, f (m) = f (n) is equivalent to (m, 0) ∼ (n, 0) which is equivalent to

m+k =n+k

for some k ∈ M . Thus, m = n by the cancellation property. Then f is an injective function and is
thus a monoid isomorphism between M and {[(m, 0)] ∈ M f | m ∈ M }. 

We now see why we needed the “+k for some k ∈ M ” in (3.12). It is possible in M with two
unequal elements m and n for there to exist a k such that m + k = n + k. In an abelian group
m + k = n + k implies m = n. Hence, in M f, we would need f (m) and f (n) to be the same element.
In the Grothendieck construction applied to (N, +), elements in N e are equivalence classes of pairs
(a, b). Since N has the cancellation property, (a, b) ∼ (c, d) if and only if a + d = b + c. We can
think of these pairs as a displacement of magnitude the difference of a and b to the right (positive)
if a ≥ b and to the left (negative) if b ≥ a. Identifying Z = N,
e we view positive and negative integers
as displacement integers with a direction.
Though we have illustrated the Grothendieck construction for a simple example of a way to
properly define negative integers, the construction first arose in a far more abstract context. It
appears again in abstract algebra in a variety of interesting places.

Exercises for Section 3.11


1. Prove that (N∗ , gcd) is a semigroup but not a monoid.
2. Prove that (N∗ , lcm) is a semigroup and is a monoid.
3. Let p be a prime number. Recall the ordp function defined in (2.4). In Exercise 3.11.1 you showed
that (N∗ , gcd) is a semigroup. Prove that ordp : (N∗ , gcd) → (N, min) is a semigroup homomorphism.
4. Let (L, 4) be a lattice.
(a) Prove that (L, ◦), where a ◦ b = lub(a, b) is a semigroup.
(b) Prove that (L, ?), where a ◦ b = glb(a, b) is a semigroup.
5. Let (L, 4) be a lattice and let (Lop , 4op ) be the lattice defined by

a 4op b ⇐⇒ b 4 a.

Prove that the function f : L → Lop defined by f (a) = a is a semigroup isomorphism between (L, ◦),
where a ◦ b gives the least upper bound between a and b in (L, 4) and (Lop , ?), where a ? b gives the
greatest lower bound between a and b in (Lop , 4op ).
6. Let U be a set and P(U ) the power set of U . Prove that (P(U ), ∩) is a monoid but not a group.
7. Let U be a set and P(U ) the power set of U . Prove that (P(U ), ∪) is a monoid but not a group.
8. Let U be a set and consider the function f : P(U ) → P(U ) defined by f (A) = A, set complement.
Prove that f is a monoid isomorphism from (P(U ), ∩) to (P(U ), ∪).
9. Let F be the set of functions from {1, 2, 3, 4} to itself and consider the monoid (F, ◦) with the operation
of function composition. Let S be the smallest subsemigroup of F that contains the functions
       
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
f1 = , f2 = , f3 = , f4 = .
1 1 2 3 4 2 2 3 4 1 3 3 4 1 2 4

Prove that S is not a monoid. Prove that S contains no bijections.


10. Prove that the power set of a monoid (M, ∗) is indeed a monoid as claimed in Example 3.11.17.
11. Let U be a set and let Fun(U, U ) be the monoid of functions from U to U , with the operation of
composition. Let A be a subset of U and let NA = {f ∈ Fun(U, U ) | f (A) ⊆ A}.
(a) Show that NA is a submonoid of Fun(U, U ).
(b) Give an example that shows that not every submonoid of Fun(U, U ) is NA for some A ⊆ U .
158 CHAPTER 3. GROUPS

12. Call M the set of nonzero polynomials with real coefficients. The pair (M, ×) is a monoid with the
polynomial 1 as the identity. Decide which of the following subsets of M are submonoids and justify
your answer.
(a) Nonconstant polynomials.
(b) Polynomials whose constant coefficient it 1.
(c) Palindromic polynomials. [A polynomial an xn + · · · + a1 x + a0 is called palindromic if an−i = ai
for all 1 ≤ i ≤ n.]
(d) Polynomials with odd coefficients.
(e) Polynomials with an−1 = 0.
(f) Polynomials with no real roots.
13. Let ϕ : M → N be a monoid homomorphism. We define the kernel of the monoid homomorphism as
Ker ϕ = {m ∈ M | ϕ(m) = 1N }.
Show that Ker ϕ is a submonoid of M .
14. Let ϕ : M → N be a monoid homomorphism. We define the image of the monoid homomorphism as
Im ϕ = {n ∈ N | n = ϕ(m) for some m ∈ M }.
Show that Im ϕ is a submonoid of N .
15. We denote by Z[x] the set of polynomials with coefficients in Z. Consider the monoids (Z[x] − {0}, ×)
and (C, +). Define the function γm : Z[x] − {0} → C by
d
X
γm (p(x)) = rim ,
i=1

where deg p(x) = d and r1 , r2 , . . . , rd are the roots of p(x), listed with multiplicity. Note that γm is
only defined on nonzero polynomials and that if p(x) is a constant polynomial then γm (p(x)) = 0
because p(x) has no roots.
(a) Prove that γm is a monoid homomorphism.
(b) (*) Prove that the image of γm is the submonoid (Z, +).
16. Prove that C 0 (R, R), the set of continuous real-valued function from R, is a submonoid of Fun(R, R)
(equipped with composition).
17. Consider the monoid M = Fun(R, R) and consider the function


 −x if x<0

0 if 0≤x<1
f (x) =


 1 − x if 1≤x<2
2 ≤ x.

x if

(a) Prove that f 1 , f 2 , f 3 , f 4 are all distinct.


(b) Prove that f n = f 4 for n ≥ 4.
(c) This is an example of a sequence of the form (f k )k≥1 that is not constant but eventually constant.
Explain why eventually constant sequences do not occur in groups.
18. Let (S, ·) be a semigroup. Let e be some object not in S and consider the set S 0 = S ∪ {e}. Define
the operation ∗ on S 0 by a ∗ b = a · b for all a, b ∈ S and a ∗ e = e ∗ a = a for all a ∈ S 0 . Prove that
(S 0 , ∗) is a monoid.
19. Prove that the relation ∼ defined (3.12) is indeed an equivalence relation.
20. Show that the Grothendieck construction applied to the monoid (N∗ , ×) gives the group (Q, ×).
21. Consider the monoid (N, +). Let M = h4, 7i denote the smallest submonoid that contains the subset
{4, 7}. [We say that M is the submonoid generated by {4, 7}.] List the first 15 elements in M . Prove
that the group obtained from the Grothendieck construction applied to (M, +) is group-isomorphic
to (Z, +).
22. Let A be an abelian group. Prove that the Grothendieck construction applied to A as a monoid is a
group that is (group) isomorphic to A.
3.12. PROJECTS 159

3.12
Projects
Project I. Rubik’s Cube. Let K be the group of operations on the Rubik’s Cube.
(1) Determine a set of generators and explore some of the relations among them.
(2) Is K naturally a subgroup of some Sn ? If so, how and for what n?
(3) What are the orders of some elements that are not generators?
(4) Can you determine the size of K either from the previous question or from other reason-
ing?
(5) Explore any other properties of the Rubik’s Cube group.
Project II. Matrix Groups. Consider the family of groups GLn (Fp ), where n is some positive
integer with n ≥ 2 and p is a prime number. Study some of them. Can you provide generators
for some of them? What is the center? What are some subgroups? Explore any related topics.
Maple has a package, accessible by typing

with(LinearAlgebra[Modular]);

that is optimized for working with linear algebra in modular arithmetic. Refer to the help files
to understand how to use the procedures in this package.
Project III. Shuffling Cards and S52 . A shuffle of a card deck is an operation that changes the
order of a deck and hence can be modeled by an element of S52 . In this project, study patterns
of shuffling cards. Two popular kinds of shuffles are the random riffle shuffle and the overhand
shuffle. A perfect riffle involves cutting the deck in half and interlacing one card from one half
with exactly one card from the other half. How would you model shuffling styles by certain
permutations in S52 ? Might patterns occur in shuffling?
Project IV. Sudoku and Group Theory. If you are not familiar with the Sudoku puzzle, visit
the Wikipedia website found at

http://en.wikipedia.org/wiki/Sudoku

for a description but do not consult other sources. Let S be the set of all possible solutions
to a Sudoku puzzle, i.e., all possible ways of filling out the grid according to the rules. There
exists a group of transformations G on specific numbers, rows, and columns such that given any
solution s and any g ∈ G, we can know for certain that g(s) is another solution. Determine and
describe concisely this group G. Can you describe it as a subgroup of some large permutation
group? Can you find the size of G?
We will call two fillings s1 and s2 of the Sudoku grid equivalent if s2 = g(s1 ) for some g ∈ G.
Explore if there exists nonequivalent Sudoku fillings.
Project V. The 15 Puzzle. Visit

http://en.wikipedia.org/wiki/Fifteen_puzzle

to learn about the so-called 15-puzzle. Here is an interesting question about this puzzle.
Suppose that you start with tiles 1 through 15 going left to right, top row down to bottom
row, with the empty square in the lower right corner. Is it possible to obtain every theoretical
configuration of tiles on the board? (For reference, label the empty slot as number 16.) If it
is not, try to find the subgroup of S16 or S15 of transformations that you can perform on this
puzzle. Make sure what you work with is a group. Can you generalize?
160 CHAPTER 3. GROUPS

ax + b
Project VI. Groups of Functions. Consider the set of functions of the form f (x) = ,
cx + d
where a, b, c, d ∈ Z. (Do not worry about domains of definition or, if you insist, just think of
the domains of these functions as on R − Q.) Show that (with perhaps a restriction on what
a, b, c, d are) this set of functions is a group. Try to find generators and relations for this group
or any other way to describe the group.
Project VII. Group Theory in a CAS. Many computer algebra systems (CAS) have a package
for group theory. In Maple version 16 or below, the appropriate package is accessed with the
command with(group);. In Maple version 17 or higher, this package was deprecated in favor
of with(GroupTheory);. Your CAS should offer help files on the command provided in the
package. By reading the help files, become familiar with as many commands as possible at the
level available to you. Become familiar with these commands and demonstrate your ability by
doing the following: Be able to define a subgroup of Sn , define a group with generators and
relations, calculate some group orders, and find 10 interesting nonisomorphic subgroups that
have order 40 or greater (than are not symmetric or dihedral groups).
Project VIII. Groups of Rigid Motions of Polyhedra. Let Π be a polyhedron. Call G(Π)
the group of rigid motions in R3 that map Π into Π. For example, If Π is the cone over a
pentagon, then G(Π) = Z5 . Does G(Π) consist of only transformations that are rotations
about an axis? Find G(Π) for the regular polyhedra. Find G(Π) for some irregular polyhedra
that do not have G(Π) = {1}. For a given group G, does there exist a polyhedron Π such that
G(Π) = G?
Project IX. Music and Group Theory. Read the article “Musical Actions of the Dihedral
Group” (see [18]). Using between 5 and 7 pages, summarize the main themes of this article in
your own words, making careful use of the group theory. Offer analysis according to the music
theory described in this article of some musical pieces of your own choosing.
Project X. Permutations and Inversions. In Section 3.4.3, we introduced the notion of the
number of inversions of a permutation to discuss the parity of a permutation. Let n be a fixed
positive integer. Consider the function F : Sn → P(Tn ) such that F (σ) consists of the set of
pairs in Tn that are inverted by σ. Study as many properties about this function as you can.
Here are a few questions to motivate your investigations. Attempt to ask and answer other
questions. Is F injective or surjective? Given an element U in F (Sn ), can you give a concise
method to find all σ ∈ Sn such that F (σ) = U ? Is F (Sn ) closed under taking unions? If so,
can you describe how to calculate w from σ and τ where F (σ) ∪ F (τ ) = F (w)? Repeat the
same two questions with intersections and set complements.

Project XI. Diffie-Hellman and ElGamal. Program and document a computer program that
implements the ElGamal encryption scheme on a file using a key that is created via the Diffie-
Hellman procedure. Use a group G and a base g ∈ G such that the order of g is larger than
current computers will run a for loop in a reasonable amount of time. Feel free to choose M
as you think is effective and h : M → Fun(N∗ , G) as you wish.
4. Quotient Groups

One of the most fascinating aspects of group theory is how much internal structure follows by virtue
of the three group axioms. Groups possess much more internal structure than we have yet seen in
Chapter 3. The internal properties will often permit us to create a “quotient group,” which is a
smaller group that retains some of the group information and conflates other information.
The process of describing this type of internal structure and creating a quotient group already
arose in Section 2.2 on modular arithmetic. Consequently, we review modular arithmetic as a
motivating example for defining quotient groups.
Fix a positive integer n greater than 1. Let G = (Z, +) and consider the subgroup H = nZ. We
defined the congruence relation as

a ≡ b ⇐⇒ n | (b − a) ⇐⇒ b − a ∈ H.

We proved that ≡ was an equivalence relation and we defined the congruence class a as the set of
all elements that are congruent to a. Note that

a = {. . . , a − 2n, a − n, a, a + n, a + 2n, . . .} = a + nZ = a + H.

We defined Z/nZ as the set of equivalence classes modulo n. Explicitly, Z/nZ = {0̄, 1̄, 2̄, . . . , n − 1}.
Furthermore, we showed that addition behaves well with respect to congruence, by which we mean
that a ≡ c and b ≡ d imply that a + b ≡ c + d. Then, defining addition on Z/nZ as

def
ā + b̄ = a + b

is in fact well-defined. From this, we were able to create the group (Z/nZ, +). We will soon say
that Z/nZ is the quotient group of Z by its subgroup nZ.
··· -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 ··· Z

quotient process

0 1 2 3 4 Z/5Z

The construction of quotient groups will be similar to the construction of modulo arithmetic.
However, Z has many properties (e.g., abelian, cyclic, infinite) that make the construction simpler
or do not illustrate some important consequences of the general construction of quotient groups.
Section 4.1 introduces the concept of cosets, which immediately leads to Lagrange’s Theorem,
a deep theorem about the internal structure of groups. Section 4.2 presents characterizations of
normal subgroups, which are necessary for the construction of quotient groups. Section 4.3 gives
the construction for quotient groups, provides many examples, and illustrates the connection to
equivalence classes on the group that behave well with respect to the group operation. Section 4.4
develops a number of theorems that illustrate how to understand the internal structure of a group
from knowing the structure of a quotient group. Finally, in Section 4.5, using the quotient process,
we prove a classification theorem for all abelian groups that are finitely generated.

161
162 CHAPTER 4. QUOTIENT GROUPS

4.1
Cosets and Lagrange’s Theorem
4.1.1 – Cosets
Following the guiding example of modular arithmetic provided at the beginning of the chapter, we
first consider what should play the role of ā = a + nZ in groups in general.

Definition 4.1.1
Let G be a group and let H be a subgroup. The set of elements defined by gH (respectively
Hg) is called the left (respectively right) coset of H by g.

Example 4.1.2. Consider G = D4 and consider the subgroups R = hri = {1, r, r2 , r3 } and H =
{1, s}. The left cosets of R are

1R = {1, r, r2 , r3 }, sR = {s, sr, sr2 , sr3 },


rR = {r, r2 , r3 , 1}, srR = {sr, sr2 , sr3 , s},
r2 R = {r2 , r3 , 1, r}, sr2 R = {sr2 , sr3 , s, sr},
r3 R = {r3 , 1, r, r2 }, sr3 R = {sr3 , s, sr, sr2 }.

Note that 1R = rR = r2 R = r3 R and sR = srR = sr2 R = sr3 R. Hence, there are only two distinct
left cosets of R, namely 1R = R, which consists of all the rotations in D4 , and sR, which consists
of all the reflections. The right cosets of R are
R = {1, r, r2 , r3 }, Rs = {s, rs, r2 s, r3 s} = {s, sr3 , sr2 , sr},
Rr = {r, r2 , r3 , 1}, Rsr = {sr, rsr, r2 sr, r3 sr} = {sr, s, sr3 , sr2 },
Rr2 = {r2 , r3 , 1, r}, Rsr2 = {sr2 , rsr2 , r2 sr2 , r3 sr2 } = {sr2 , sr, s, sr3 },
Rr3 = {r3 , 1, r, r2 }, Rsr3 = {sr3 , rsr3 , r2 sr3 , r3 sr3 } = {sr3 , sr2 , sr, s}.

Note that R = Rr = Rr2 = Rr3 and Rs = Rsr = Rsr2 = Rsr3 . Again, there are only two distinct
right cosets of R, namely R and Rs. We also note that for all g ∈ D4 , we have gR = Rg. This
resembles commutativity, but we must recall that D4 is not commutative. Before we are tempted
to think this happens for all subgroups (and we should be inclined to doubt that it would), let us
do the same calculations with the subgroup H.
The left cosets of H are
1H = {1, s}, sH = {s, 1} = {1, s},
rH = {r, rs} = {r, sr3 }, srH = {sr, srs} = {sr, r3 },
r2 H = {r2 , r2 s} = {r2 , sr2 }, sr2 H = {sr2 , sr2 s} = {sr2 , r2 },
r3 H = {r3 , r3 s} = {r3 , sr}, sr3 H = {sr3 , sr3 s} = {sr3 , r}.

There are four distinct left cosets: H = sH, rH = sr3 H, r2 H = sr2 H, and r3 H = srH. The right
cosets are
H = {1, s}, Hs = {s, 1} = {1, s},
Hr = {r, sr}, Hsr = {sr, r},
Hr2 = {r2 , sr2 }, Hsr2 = {sr2 , r2 },
3 3 3
Hr = {r , sr }, Hsr3 = {sr3 , r3 }.
Notice again that there are four distinct right cosets: H = Hs, Hr = Hsr, Hr2 = Hsr2 and
Hr3 = Hsr3 . However, with the subgroup H, it is not true that gH = Hg for all g ∈ G. For
example, Hr = {r, sr} while rH = {r, sr3 }. 4

A key point to extract from the above example is that, in general, left cosets are not equal to
right cosets, but that depending on the subgroup, it may happen nonetheless. Of course, if G is an
abelian group then for all H ≤ G and for all g ∈ G, it is true that gH = Hg.
4.1. COSETS AND LAGRANGE’S THEOREM 163

Proposition 4.1.3
Let H be a subgroup of a group G and let g ∈ G be arbitrary. Then there exists a bijection
between H and gH and between H and Hg. Furthermore, if H is a finite subgroup of G,
then |H| = |gH| = |Hg|.

Proof. Consider the function f : H → gH defined by f (x) = gx. This function is injective because
f (x1 ) = f (x2 ) implies that gx1 = gx2 so that x1 = x2 . Furthermore, the function is surjective by
definition of gH. Thus, f is a bijection between H and gH. If H is finite, then |H| = |gH|.
Similarly, the function ϕ : H → Hg defined by ϕ(x) = xg is a bijection between H and Hg. 

Recall that in the motivating example of modular arithmetic, the cosets a + nZ corresponded
to the congruence classes modulo n. In general groups, cosets correspond to certain equivalence
relations. However, because groups are not commutative in general, we must consider two equivalence
relations.

Proposition 4.1.4
Let G be a group and let H be a subgroup. The relations ∼1 and ∼2 , defined respectively
as

a ∼1 b ⇐⇒ a−1 b ∈ H,
a ∼2 b ⇐⇒ ba−1 ∈ H,

are equivalence relations. Furthermore, the equivalence classes for ∼1 (resp. ∼2 ) are the
left (resp. right) cosets of H.

Proof. We prove the proposition for ∼1 . The proof for ∼2 is identical in form.
Let G be a group and let H be any subgroup. For all a ∈ G, a−1 a = 1 ∈ H, so ∼1 is reflexive.
Suppose that a ∼1 b. Then a−1 b ∈ H. Since H is closed under taking inverses, we know that
(a−1 b)−1 = b−1 a ∈ H, so b ∼1 a. Thus, ∼1 is symmetric. Now suppose that a ∼1 b and b ∼1 c. By
definition, a−1 b ∈ H and b−1 c ∈ H. Since H is a subgroup, (a−1 b)(b−1 c) = a−1 c ∈ H, which means
that a ∼1 c. Hence, ∼1 is transitive. We conclude that ∼1 is an equivalence relation.
The ∼1 equivalence class of a ∈ G consists of all elements g such that a ∼1 g, i.e., all elements g
such that there exists h ∈ H with a−1 g = h. Thus, g = ah so g ∈ aH. Conversely, if g ∈ aH, then
a−1 g ∈ H and thus a ∼1 g. This shows that the ∼1 equivalence class of a is aH. 

The equivalence relations ∼1 and ∼2 are not necessarily distinct. The relations are identical if
and only if all the left cosets of H match up with right cosets of H.
If the group is abelian, left and right cosets are equal for all subgroups H and hence the relations
∼1 and ∼2 are equal. It is for this reason that we did not need to define two concepts of congruence
relations on Z. Indeed, for all a, b ∈ Z,

−a + b = b − a.

Proposition 4.1.4 leads immediately to the following corollary.

Proposition 4.1.5
Let H be a subgroup of a group G and let g1 , g2 ∈ G. Then

g1 H = g2 H ⇐⇒ g1−1 g2 ∈ H ⇐⇒ g2−1 g1 ∈ H and


Hg1 = Hg2 ⇐⇒ g2 g1−1 ∈ H⇐⇒ g1 g2−1 ∈ H.

It is also possible to deduce Proposition 4.1.5 from the following reasoning. The equality holds
g1 H = g2 H if and only if H = g1−1 g2 H if and only if for all h ∈ H, there exists h0 ∈ H such
164 CHAPTER 4. QUOTIENT GROUPS

Hgn

..
.
H g2 H g3 H ··· gn H H
Hg3

Hg2

G G

Figure 4.1: Illustration of left and right cosets of H in G

that g1−1 g2 h = h0 . This implies that g1−1 g2 = h0 h−1 ∈ H. Conversely, if g1−1 g2 ∈ H, then writing
g1−1 g2 = h00 , for all h in H, we have g1−1 g2 h = h00 h ∈ H, so g1−1 g2 H = H and thus g2 H = g1 H. A
similar reasoning holds with right cosets.
Because equivalence classes on a set partition that set, Proposition 4.1.4 leads immediately to
the following corollary.

Corollary 4.1.6
Let H be a subgroup of a group G. The set of left (respectively right) cosets form a partition
of G.

Figure 4.1 illustrates how left and right cosets of a subgroup H partition a group G. It is
important to note that the subgroup H is both a left and a right coset, and the remaining left and
right cosets partition G − H. In the figure, the left cosets are shown to overlap the right cosets. In
general, it is possible for each left coset to be a right coset or for only some left cosets to intersect
with some right cosets.
In the example of modular arithmetic, though, the original group G = Z was infinite, for any
given modulus n, there were exactly n (a finite number of) left cosets, 0, 1, . . . , n − 1. This may
happen in a general group setting.

Definition 4.1.7
Let H be a subgroup of a group G that is not necessarily finite. If the number of distinct
left cosets is finite then this number is denoted by |G : H| and is called the index of H in
G.

For any subgroup H of a group G, the set of inverse elements H −1 is equal to H. Consequently,
the inverse function f (x) = x−1 on the group G, maps the left coset gH to the right coset (gH)−1 =
H −1 g −1 = Hg −1 . Thus, the inverse function gives a bijection between the set of left cosets and the
set of right cosets of H. In particular, if |G : H| is finite, then it also counts the number of right
cosets.

Example 4.1.8. Let G = S5 and let H = {σ ∈ G | σ(5) = 5}. It is not hard to see that H ∼ = S4 .
Hence, |G| = 120 and |H| = 24. (Ideally, if studying the cosets of H by hand, we would rather not
list out all the elements in each coset.) Note that the index of H in S5 is |S5 : H| = 5, so whether
we consider left or right cosets, we will find 5 of them. Obviously, H is both a left coset and a right
coset of H. We investigate a few other cosets of H.
Consider the left coset (1 2)H. Since (1 2) ∈ H, then (1 2)H = H. Furthermore, any transposition
(a b) in which a, b < 5 satisfies (a b)H = H. Another left coset (1 5)H 6= H because (1 5) ∈ / H. Now
consider a third coset (2 5)H. Since (2 5) ∈ / H, we know that (2 5)H 6= H. However, by Proposition
4.1.5, since (2 5)−1 (1 5) = (1 2 5) ∈ / H so (2 5)H 6= (1 5)H. In fact, for any a and b satisfying
4.1. COSETS AND LAGRANGE’S THEOREM 165

1 ≤ a < b ≤ 5, we have (b 5)−1 (a 5) = (a b 5) ∈


/ H so (a 5)H 6= (b 5)H for a 6= b. Each coset consists
of 24 elements and we have found 5 distinct cosets. Since 5 × 24 = 120 = |S5 |, we have found all the
left cosets, which implies that (a 5)H for 1 ≤ a ≤ 5 is a complete list of left cosets of H.
Now let σ = (1 2 3 4 5) and consider the cosets σ i H with i = 0, 1, 2, 3, 4. According to Proposi-
tion 4.1.5,
σ i H = σ j H ⇐⇒ σ i−j ∈ H ⇐⇒ i − j ≡ 0 (mod 5).
Since i, j ∈ {0, 1, 2, 3, 4}, the equality i ≡ j (mod 5) if and only if i = j. Thus, the cosets σ i H with
i = 0, 1, 2, 3, 4 are all distinct. Together, these five cosets account for 120 distinct elements of S5
and hence are all left cosets. Interestingly enough, we can characterize the σ i H as
(1 2 3 4 5)i H = {τ ∈ S5 | τ (5) = i}. 4
Previously, we emphasized that though both Z and its subgroup nZ are infinite, the index
|Z : nZ| = n is finite. The following example illustrates some other possibilities of subgroup indices
in the context of an infinite group.
Example 4.1.9. Consider the multiplicative group of nonzero reals, (R∗ , ×). The subset H = R>0
of positive real numbers is a subgroup. The subgroup R>0 has only two distinct cosets, namely R>0
and (−1)R>0 = R<0 . Hence, the index of R>0 in R∗ is |R∗ : R>0 | = 2. As a point of terminology,
it is not proper to say that “R>0 consists of half of the real numbers” because R>0 is the same
cardinality as R. However, the concept of index makes precise our intuitive sense that “R>0 consists
of half of the real numbers.”
As an alternate example, consider the subgroup Q∗ . We show by contradiction that the index of
Q in R∗ is not only infinite, but uncountable as well. Assume that Q∗ has a countable number of

cosets in R∗ . Let S be a complete set of distinct representatives of the partition formed by the cosets.
Then, since the cosets of Q∗ partition R∗ , every positive real number must be written uniquely as
sq for s ∈ S and q ∈ Q∗ . This creates a bijection between R∗ and S × Q∗ . By Exercise 1.2.7,
if we assume that S is countable, then S × Q∗ is countable. Since R∗ is uncountable, this is a
contradiction. 4

4.1.2 – Lagrange’s Theorem


Proposition 4.1.3 leads to a profound theorem in group theory. A perceptive student might have
conjectured the following result from some examples, but none of the theorems provided in Chapter 3
lead to it. With the concept of cosets, this theorem is hiding in plain sight. We uncover it now.

Theorem 4.1.10 (Lagrange’s Theorem)


|G|
Let G be a finite group. If H ≤ G, then |H| divides |G|. Furthermore, = |G : H|.
|H|

Proof. By Proposition 4.1.3, each left coset of H has the same cardinality of |H|. Since the set of
left cosets partitions G, then the sum of cardinalities of the distinct cosets is equal to |G|. But since
each coset has cardinality |H|, we have
|G| = |H| · |G : H|,
and the theorem follows. 
In the language of posets, Lagrange’s Theorem can be rephrased by saying that if G is a finite
group, then the cardinality function from (Sub(G), ≤) to (N∗ , |) is monotonic.
A number of corollaries follow from Lagrange’s Theorem.

Corollary 4.1.11
For every element g in a finite group G, the order |g| divides |G|.
166 CHAPTER 4. QUOTIENT GROUPS

Proof. The order |g| is the order of the subgroup hgi. Hence, |g| divides |G| by Lagrange’s Theorem.

Corollary 4.1.12
For every element g in a finite group G, we have g |G| = 1.

Proof. By Corollary 4.1.11, if |g| = k, then k divides |G|. So |G| = km for some m ∈ Z. Since
g k = 1 by definition of order, then g |G| = g km = (g k )m = 1m = 1. 

Lagrange’s Theorem and its corollaries put considerable restrictions on possible subgroups of a
group G. For example, in Exercise 3.3.31 concerning the classification of groups of order 6, that G
does not contain elements of order 4 or of order 5 follows immediately from Lagrange’s Theorem.
As a simple application, knowing that the size of a subgroup can only be a divisor of the size of
the group may tell us whether or not we have found all the elements in a subgroup. In particular,
as the following example illustrates, if we know that a subgroup H has |H| strictly greater than the
largest strict divisor of |G|, then we can deduce that |H| = |G| and hence that H = G.

Example 4.1.13. Consider the group A4 and the subgroup H = h(1 2 3), (1 2 4)i. Since |A4 | = 12,
by Lagrange’s Theorem, the subgroups of A4 can only be of order 1, 2, 3, 4, 6, or 12. By taking
powers of the generators of H, we know that

1, (1 2 3), (1 3 2), (1 2 4), (1 4 2)

are in H. Furthermore, (1 2 3)(1 4 2) = (1 4 3) is in H as must be its square (1 3 4). This shows that
H contains at least 7 elements. Since |H| is greater than 6, by Lagrange’s Theorem, |H| = 12 and
hence H = A4 . 4

Lagrange’s Theorem also leads immediately to the following important classification theorem.

Proposition 4.1.14
Let p be a prime number and suppose that G is a group such that |G| = p. Then G ∼
= Zp .

Proof. Let g ∈ G be a nonidentity element. Then hgi is a subgroup of G that has at least 2 elements.
By Lagrange’s Theorem, |g| = |hgi| divides p. Hence, |g| = p and hgi = G. Therefore, G is cyclic.
The proposition follows from Proposition 3.7.25. 

4.1.3 – Partial Converses to Lagrange’s Theorem


Given the profound nature of Lagrange’s Theorem, it is natural to wonder whether the theorem or
any of its corollaries have converses. Consider the following questions. (1) Given a finite group G,
if d is a divisor of |G|, does there necessarily exist a subgroup of order d? (2) Given a finite group
G, if d is a divisor of |G|, does there necessarily exist an element of order d? The answer to both of
these questions is “no.”
The answer to question (2) is obvious in that if every group contained an element of order |G|,
then every group would be cyclic, which we know to be false. For question (1), the smallest counter-
example occurs with A4 . In Example 3.6.7, we drew the lattice of A4 and found that it possesses
subgroups of order 1, 2, 3, 4, and 12, but none of order 6.
Some partial converses to Lagrange’s Theorem and its corollaries do exist. What we mean by
this is that, for some divisors of |G|, we may be able to know whether there exists a subgroup of
that order or an element of that order with no other information besides order considerations. We
mention two such theorems as examples, but their proofs require techniques that we will develop
later. Cauchy’s Theorem (Theorem 8.4.5) states that if p is a prime number dividing |G|, then G has
an element of order p. Another partial converse to Lagrange’s Theorem, Sylow’s Theorem (Theorem
4.1. COSETS AND LAGRANGE’S THEOREM 167

8.5.6), guarantees that if p is a prime divisor of |G|, then G has a subgroup of order pn , where pn is
the highest power of p that divides |G|.
Even without Cauchy’s Theorem, it is sometimes possible to determine whether a group G
contains elements of certain orders by virtue of Corollary 4.1.11. The following example illustrates
the reasoning.
Example 4.1.15. Let G be a group of order 35. We prove that G must have an element of order
5 and an element of order 7. If G contains an element z of order 35 (which would imply that G is
cyclic), then z 5 has order 7 and z 7 has order 5.
Assume that G has no elements of order 7. By Corollary 4.1.11, the only allowed orders of
elements would be 1 and 5. Obviously, the identity is the only element of order 1. But if two
elements a, b are of order 5 and not powers of each other, then hai ∩ hbi = {1}. Hence, each
nonidentity element would be in a cyclic subgroup of order 5 but containing 4 nonidentity elements.
If there are k such subgroups, then we would have 4k + 1 = 35. This is a contradiction. Hence, G
must have an element of order 7.
Similarly, assume that G has no elements of order 5. Again, by Corollary 4.1.11, the only allowed
orders of elements would be 1 and 7. Any element of order 7 generates a cyclic subgroup, containing
the identity element and 6 elements of order 7. If there are h such subgroups, then we would have
6h + 1 = 35. Again, this is a contradiction. Hence, G must contain an element of order 5. 4

4.1.4 – Products of Subgroups


As another application of cosets, we consider subsets of a group G of the form HK, where H and
K are two subgroups of G. We define the subset HK as

HK = {hk | h ∈ H and k ∈ K}.

This subset is, in general, not a subgroup of G. It is, however, a union of certain cosets, in particular
a union of right cosets of H and also a union of left cosets of K via
[ [
HK = Hk = hK. (4.1)
k∈K h∈H

In either of the above expressions, it is possible that many terms in the union are redundant as some
of the cosets may be equal. By an analysis of cosets of H and of H ∩ K, we can prove the following
proposition.

Proposition 4.1.16
If H and K are finite subgroups of a group G, then

|H| |K|
|HK| = .
|H ∩ K|

Proof. Consider HK as the union of left cosets of K as given in (4.1). Each left coset of K has |K|
elements, so |HK| is a multiple of |K|. We simply need to count the number of distinct left cosets
of K in HK. By Proposition 4.1.5, h1 K = h2 K if and only if h−1 −1
2 h1 ∈ K. Since h2 h1 ∈ H, then
−1
h2 h1 ∈ H ∩ K, which again by Proposition 4.1.5 is equivalent to h1 (H ∩ K) = h2 (H ∩ K).
However, H ∩ K ≤ H. By the above reasoning, the number of distinct left cosets of K in
HK is the number of distinct left cosets of H ∩ K in H. By Lagrange’s Theorem, this number is
|H|/|H ∩ K|. Thus,
|H| |H| |K|
|HK| = |K| = .
|H ∩ K| |H ∩ K| 

Recall that the join hH ∪ Ki of H and K is the smallest (by inclusion) subgroup of G that
contains both H and K. Obviously, hH ∪ Ki must contain all products of the form hk with h ∈ H
168 CHAPTER 4. QUOTIENT GROUPS

and k ∈ K but perhaps much more. Hence, HK is a subset of the join of H and K. If HK happens
to be a subgroup of G, then HK = hH ∪ Ki. By Lagrange’s Theorem, we can deduce that

|H| |K|
≤ |hH ∪ Ki| and |hH ∪ Ki| |G|. (4.2)
|H ∩ K|
Note that when HK is not a subgroup of G, we cannot use Lagrange’s Theorem to deduce that
|HK| divides |hH ∪ Ki|.
Example 4.1.17. As an application of the result in (4.2), let G = S5 and let H be the subgroup
of G that fixes 4 and 5, while K is the subgroup of G that fixes 2 and 3. Note that H ∼ = S3 and
K∼ = S3 so |H| = |K| = 6. Furthermore, if σ ∈ H ∩ K, then σ fixes 2, 3, 4, and 5 and hence must
also fix 1. Thus, H ∩ K = {1}. By Proposition 4.1.16, |HK| = 36. Since 36 - 120 = |S5 |, then by
Lagrange’s Theorem, we know that HK is not a subgroup of S5 . By (4.2), we can also deduce that
the join of H and K is greater than 36, but a divisor of 120. Thus, given this information, we know
that |hH ∪ Ki| is 40, 60, or 120. 4

Exercises for Section 4.1


1. List all the distinct left cosets of H = h3̄i in Z/21Z and list all the elements in each coset.
2. Suppose that Z20 is generated by the element z. List all the distinct left cosets of hz 5 i and list all the
elements in each coset.
3. In the group Q8 , list the distinct left cosets of the following subgroups and in each case list all the
elements in each coset: (a) hii; (b) h−1i.
4. List all the distinct left cosets of H = h4̄i in U (35) and list all elements in each coset.
5. List all the distinct left cosets of H = h7̄, 47i in U (48) and list all elements in each coset.
6. List all the distinct left cosets of H = h(1 2 3)i in A4 and list all elements in each coset. Following the
interpretation given in Exercise 3.5.43 of A4 as the group of rigid motions of a tetrahedron, describe
each left coset by what all the elements of the coset do to the tetrahedron.
7. Consider the group G = Z2 ⊕ Z2 ⊕ Z2 with Z2 = hz | z 2 = 1i. There is a bijection between the elements
of G and the (vertices of the) unit cube C via f (z a , z b , z c ) = (a, b, c), where a, b, and c are 0 or 1.
(See Figure 4.2.) For each of the following subgroups, list all the distinct left cosets, describe them as
subsets of the unit cube (via the mapping f ), and label the cosets on a sketch of the unit cube.
(a) H = h(1, z, 1)i
(b) K = h(z, z, z)i
(c) L = h(z, 1, 1), (1, z, 1)i
8. Consider the group C× with multiplication and define the subgroup H = {z ∈ C× |z| = 1}. Describe
geometrically (as a subset of the plane) each coset of H.
9. Consider the group C× with multiplication and consider the subgroup H = R× . Describe geometrically
(as a subset of the plane) each coset of H.
10. Prove the last claim in Example 4.1.8. With the group G = S5 and the same subgroup H, prove also
that the right cosets H(1 2 3 4 5)i correspond to

H(1 2 3 4 5)i = {τ ∈ S5 | τ (5 − i) = 5}.

11. Consider the group (Q, +).


(a) Show that |Q : Z| is infinite.
(b) Show that there is no proper subgroup of Q of finite index.
12. Consider the cosets of H = hsi in D4 as described in Example 4.1.2. We write AB = {ab | a ∈
A and b ∈ B} for any subsets A, B ⊆ G. Prove that (rH)(r2 H) is a left coset of H, but that
(rH)(r3 H) is not.
13. Let H be a subgroup of G. Suppose one attempted to define the function F from the set of left cosets
of H to the set of right cosets of H via F (gH) = Hg. Explain why F is not a function. [Hint: We
also say that F is not well-defined.]
4.1. COSETS AND LAGRANGE’S THEOREM 169

(1, 1, z)

(1, z, z)

(z, 1, z)

(z, z, z)

(1, 1, 1)

(1, z, 1)

(z, 1, 1)

(z, z, 1)

Figure 4.2: Cayley graph of Z2 ⊕ Z2 ⊕ Z2

14. Show that a complete set of distinct representative of ∼1 is not necessarily a complete set of distinct
representatives of ∼2 .
15. Let G be a group and do not assume it is finite. Prove that if H ≤ K ≤ G, then

|G : H| = |G : K| · |K : H|.

16. Let G be a group and let H and K be subgroups with |G : H| = m and |G : K| = n finite. Prove that
(a) |G : H ∩ K| ≤ mn;
(b) lcm(m, n) ≤ |G : H ∩ K|.
[Hint: Use Exercise 4.1.15.]
17. Let ϕ : G → H be a group homomorphism.
(a) Prove that the left cosets of Ker ϕ are the fibers of ϕ. [Recall that a fiber of a function f : A → B
is a subset of A of the form f −1 (b) for some b ∈ B. See (1.4).]
(b) Deduce that for all g ∈ G, the left coset g(Ker ϕ) is equal to the right coset (Ker ϕ)g.
18. Consider the Cayley graph for S4 given in Figure 3.12.
(a) Prove that the triangles (whose edges are double edges) correspond to right cosets of h(1 2 3)i.
(b) Prove that the squares with all single edges correspond to the right cosets of h(1 2 3 4)i.
(c) Prove that the squares with mixed edge styles are not the left or right cosets of any subgroup.
19. Suppose that a group G has order |G| = 105. List all possible orders of subgroups of G.
20. Suppose that a group G has order |G| = 48. List all possible orders of subgroups of G.
21. Prove that n − 1 ∈ U (n) for all integers n ≥ 3. Apply Lagrange’s Theorem to hn − 1i to deduce that
Euler’s totient function φ(n) is even for all n ≥ 3.
22. Prove or disprove that Z6 ⊕ Z10 has
(a) a subgroup of order 4;
(b) a subgroup isomorphic to Z4 .
23. Suppose that G is a group with |G| = pq, where p and q are primes, not necessarily distinct. Prove
that every proper subgroup of G is cyclic.
24. Let G = GL2 (F5 ).
(a) Use Lagrange’s Theorem to determine all the possible sizes of subgroups of G.
(b) Show that the orders of the following elements are respectively 3, 4, and 5,
     
0 2 3 0 1 1
A= , and B = , and C = .
2 4 0 3 0 1

(c) Determine the order of AB and BC without performing any matrix calculations.
170 CHAPTER 4. QUOTIENT GROUPS

25. Let G = GL2 (F5 ) and let H be the subgroup of upper triangular matrices. [See Exercise 3.5.22.] Prove
that |G : H| = 6 and find 6 different matrices g1 , g2 , . . . , g6 such that the cosets gi H for i = 1, 2, . . . , 6
are all the left cosets of H.
26. Let p be a prime number. Prove that the subgroups of Zp ⊕ Zp consist of {1}, Zp ⊕ Zp and p + 1
subgroups that are cyclic and of order p.
27. Let G be a group of order 21. (a) Prove that G must have an element of order 3. (b) By the strategy
of Example 4.1.15, can we determine whether G must have an element of order 7?
28. Let G be a group of order 3p, where p is a prime number. Prove that G has an element of order 3.
29. Let G be a group of order pq, where p and q are distinct odd primes such that (p − 1) - (q − 1) and
(q − 1) - (p − 1). Prove that G contains an element of order p and an element of order q.
30. Let G be a group and let H, K ≤ G. Prove that if gcd(|H|, |K|) = 1, then H ∩ K = {1}.
31. Let G be a group and let H be a subgroup with |G : H| = p, a prime number. Prove that if K is a
subgroup of G that strictly contains H, then K = G.
32. Let G be a group of order pqr, where p, q, r are distinct primes. Let A be a subgroup of order pq and
let B be a subgroup of order qr. Prove that AB = G and that |A ∩ B| = q.
33. Use Lagrange’s Theorem applied to U (n) to prove Euler’s Theorem (a generalization of Fermat’s Little
Theorem), which states that if gcd(a, n) = 1 then

aϕ(n) ≡ 1 (mod n).

34. Show that there exists a subgroup of order d for each divisor d of |S4 | = 24 and give an example of a
subgroup for each divisor.
35. Classification of Groups of Order 2p. Let p be a prime number. This exercise guides the proof that a
group of order 2p is isomorphic to Z2p or Dp . Let G be an arbitrary group with |G| = 2p.
(a) Without using Cauchy’s Theorem, prove that G contains an element a of order 2 and an element
b of order p.
(b) Prove that if ab = ba, then G ∼
= Z2p .
(c) Prove that if a and b do not commute, then aba = b−1 . Deduce in this case that G ∼
= Dp .
36. Let G be a group and let H, K ≤ G. Prove that that HK ≤ G if and only if HK = KH as sets.

4.2
Conjugacy and Normal Subgroups
In the previous section, we discussed left cosets and right cosets in a group G. By considering simple
examples, we found that generally a left coset of a subgroup H is not necessarily equal to a right
coset. However, in Example 4.1.2 we saw that every left coset of hri in D4 is a right coset. This
property, called normal, that subgroups may have turns out to play a vital role for the construction
of quotient groups.
This section studies normal subgroups. Some of the constructions or criteria developed in this
section may at first pass seem unnecessarily complicated if we are simply trying to generalize the
construction employed to create modular arithmetic. The important difference between general
groups and (Z, +) is that the latter is abelian, while groups in general are not. All the difficulty to
construct quotient groups stems from the possible noncommutativity of a group. As we shall see in
Section 4.3, normality of a subgroup is a necessary property to generalize the modular arithmetic
construction.
4.2. CONJUGACY AND NORMAL SUBGROUPS 171

4.2.1 – Normal Subgroups


Let G be a group and let H ≤ G be a subgroup. If a left coset gH is equal to a right coset Hg 0 ,
then since 1 ∈ H, we know that g ∈ Hg 0 . If g ∈ Hg 0 , then Hg 0 = Hg as right cosets. Hence, the
criterion that a subgroup is such that every left coset is equal to a right coset can be summarized
in the following definition.

Definition 4.2.1
Let G be a group. A subgroup N ≤ G is called normal if gN = N g for all g ∈ G. If N is
a normal subgroup of G, we write N E G.

In Example 4.1.2, we saw that while hri E D4 , in contrast hsi is not a normal subgroup of D4 .
In notation, we write hsi 5 D4 .
The criterion for a normal subgroup is equivalent to a variety of other conditions on the subgroup.
Before we list these conditions in Theorem 4.2.4, we mention a few results that are immediate from
the definition. The first observation is that every group G has at least two normal subgroups: the
trivial group {1} and itself G. The next proposition gives another sufficient condition for a subgroup
to be normal.

Proposition 4.2.2
Let G be a group (not necessarily finite). If H is a subgroup such that |G : H| = 2, then
H E G.

Proof. By definition, if |G : H| = 2, then H has two left cosets, just as it has two right cosets. Now,
H = 1H is a left coset. Since the collection of left cosets form a partition of G, then G − H is the
other left coset. Similarly, H = H1 is a right coset and, by the same reason as before, G − H is the
other right coset. Hence, every left coset is equal to a right coset, and thus H is a normal subgroup
of G. 

Example 4.2.3. From Proposition 4.2.2, we immediately see that hri, hr2 , si, and hr2 , rsi are nor-
mal subgroups of D4 , simply because each of those subgroups has order 4 and |D4 | = 8. 4

Theorem 4.2.4
Let N be a subgroup of G. The following are equivalent:
(1) N E G.
(2) gN g −1 = N for all g ∈ G.

(3) NG (N ) = G.
(4) For all g ∈ G and all n ∈ N , gng −1 ∈ N .
(5) ∼1 and ∼2 as defined in Proposition 4.1.4 are equal relations.

Proof. (1)⇐⇒(2): If gN = N g for all g ∈ G, then by multiplication on the right by g −1 , we obtain


gN g −1 = N . Conversely, given the condition gN g −1 = N for all g ∈ G, by multiplying on the right
by g, we recover gN = N g for all g ∈ G.
(2)⇐⇒(3): This is automatic from the definition of normalizers.
(3)⇐⇒(4): If gN g −1 = N for all g ∈ G, then for all n ∈ N , we know that gng −1 ∈ N . Conversely,
assuming (4), we can immediately conclude that gN g −1 ⊆ N for all g ∈ G. Note that given g ∈ G,
since g −1 also must satisfy condition (3), we also know that g −1 N g ⊆ N . By multiplying on the left
by g and on the right by g −1 , we deduce that N ⊆ gN g −1 . Since we already had gN g −1 ⊆ N , we
conclude that gN g −1 = N .
172 CHAPTER 4. QUOTIENT GROUPS

(1)⇐⇒(5): The condition that N is normal in G means that every left coset is a right coset.
By Proposition 4.1.4, this is equivalent to saying that every ∼1 -equivalence class is equal to a ∼2 -
equivalence class. By Proposition 1.3.14, the equivalence relations ∼1 and ∼2 are equal. 

We underscore that the condition gN g −1 = N does not imply gng −1 = n for all g ∈ G and all
n ∈ N . It merely implies that the process of operating on the left by g and right by g −1 produces a
bijection on N . We explore this issue more in Section 4.2.2.
In practice, as we explore properties of normal subgroups, the various criteria of Theorem 4.2.4
may be more useful in some contexts than in others. Proposition 4.2.2 illustrated a situation where
the original definition is sufficiently convenient to establish the result. The following two propo-
sitions, which are important in themselves, illustrate situations where a different criteria is more
immediately useful in the proof.

Proposition 4.2.5
If G is abelian, then every subgroup H ≤ G is normal.

Proof. For all g ∈ G and h ∈ H, we have ghg −1 = gg −1 h = h ∈ H. By Theorem 4.2.4(4), H E G.

This proposition hints at why generalizing the construction of modular arithmetic to all groups
poses some subtleties that were not apparent in modular arithmetic: (Z, +) is abelian. By a similar
reason, we can also conclude the more general proposition.

Proposition 4.2.6
Let G be a group. Any subgroup H in the center Z(G) is a normal subgroup H E G.

Proof. (Left as an exercise for the reader. See Exercise 4.2.5.) 

Proposition 4.2.7
Let ϕ : G → H be a homomorphism between groups. Then Ker ϕ E G.

Proof. Let n ∈ Ker ϕ and let g ∈ G. Then

ϕ(gng −1 ) = ϕ(g)ϕ(n)ϕ(g)−1 by homomorphism properties


−1
= ϕ(g)1ϕ(g) since n ∈ Ker ϕ
= 1H .

By Theorem 4.2.4(4), we conclude that Ker ϕ E G. 

This proposition, though easy to prove, leads to some natural and profound consequences. For
example, if n ≥ 3, there exists no homomorphism ϕ : Dn → G, where G is any group, such that
Ker ϕ = hsi. Proposition 4.2.7 establishes some profound restrictions on how homomorphisms can
map from one group to another. In particular, a fiber of a homomorphism ϕ : G → H, i.e., a set
ϕ−1 ({h}), is either the empty set or a coset of Ker ϕ. If ϕ(g) = h, then ϕ−1 ({h}) = g(Ker ϕ). Note
that if G is finite, then all nonempty fibers have the same cardinality as | Ker ϕ|.
When attempting to determine whether a subgroup of a group G is a normal subgroup, using
the original definition of a normal subgroup or Theorem 4.2.4(4) requires a possibly large number
of calculations. For example, using the latter criterion, we would need to make |G| · |N | number
of calculations gng −1 to determine if N E G. However, if a finite group and its subgroup are both
presented by generators, the following theorem provides a quick shortcut.
4.2. CONJUGACY AND NORMAL SUBGROUPS 173

Theorem 4.2.8
Let G be a finite group generated by a subset T . Let N = hSi be the subgroup generated
by the subset S. Then N E G if and only if for all t ∈ T and all s ∈ S, tst−1 ∈ N .

Proof. (=⇒) This direction is obvious as a consequence of Theorem 4.2.4(4).


(⇐=) Every element in N can be written as a finite product sε11 sε22 · · · sεkk , where si ∈ S and
εi = ±1. Then

g(sε11 sε22 · · · sεkk )g −1 = (gsε11 g −1 )(gsε22 g −1 ) · · · (gsεkk g −1 )


= (gs1 g −1 )ε1 (gs2 g −1 )ε2 · · · (gsk g −1 )εk .

We see that if gSg −1 ⊆ N , then the above product on the right occurs as a product in N and hence
g ∈ NG (N ).
Suppose now that we only know that for all t ∈ T and all s ∈ S, tst−1 ∈ N . Every element g ∈ G
can be written as a product g = t1 t2 · · · tl , for ti ∈ T , possibly with repetitions. Note that since G is
finite, the inverse to any t ∈ T is tn−1 , where |t| = n. We prove that N is normal by induction on l.
By what was said above, if all t ∈ T satisfy tSt−1 ⊆ N , then every T ⊆ NG (N ). Now suppose
that every product of length l − 1 of elements in T is in NG (N ). Consider a product of length l,
namely t1 t2 · · · tl and let n ∈ N .

(t1 t2 · · · tl )n(t1 t2 · · · tl )−1 = t1 (t2 t3 · · · tl )n(t2 t3 · · · tl )−1 t−1 0 −1


1 = t1 n t1

for some element n0 ∈ N because t2 t3 · · · tl ∈ NG (N ). However, t1 ∈ Ng (N ) so t1 n0 t−11 ∈ N and


hence t1 t2 · · · tl ∈ N . By induction, we conclude that every finite product t1 t2 · · · tl ∈ NG (N ).
Hence, G ⊆ NG (N ) and thus G = NG (N ) and N is normal in G. 

Example 4.2.9. Consider the group D8 and consider the subgroup H = hr4 , si. We test to see if
H is a normal subgroup in D8 . Notice first that H = {1, r4 , s, sr4 }. We only need to perform four
calculations:
r(r4 )r−1 = r4 sr4 s−1 = r4 rsr−1 = r6 sss−1 = s.
By Theorem 4.2.8, the third calculation rsr−1 = r6 6∈ H shows that H is not a normal subgroup.4

It is important to remark that, in contrast to the subgroup relation ≤ on the set of subgroups
of G, the relation of normal subgroup E is not transitive, and hence is not a partial order on
Sub(G). The easiest illustration comes from the dihedral group D4 . By Proposition 4.2.2, we see
that hr2 , si E D4 and hsi E hr2 , si. However, hsi is not a normal subgroup of D4 . Therefore,

K E H E G 6=⇒ K E G.

One intuitive reason for this property of the relation of normal subgroup is that even if hKh−1 ⊆ K
for all h ∈ H, the condition gKg −1 ⊆ K for all g ∈ G is a stronger condition and might not still
hold.
We can now put into context the terminology of “normalizer” of a subgroup. Recall that

NG (H) = {g ∈ G | gHg −1 ⊆ H}.

Consequently, NG (H) is the largest subgroup K of G such that H E K. The normalizer gives some
way of measuring how far a subgroup H is from being normal. For all H ≤ G, we have

H ≤ NG (H) ≤ G,

with NG (H) = G if and only if H E G. Intuitively speaking, we can say that H is farthest from
being normal when NG (H) = H.
Exercise 4.1.36 establishes that if H, K ≤ G are two subgroups, then HK is a subgroup of G if
and only if HK = KH as sets.
174 CHAPTER 4. QUOTIENT GROUPS

Corollary 4.2.10
If H ≤ NG (K), then HK is a subgroup of G. In particular, if K E G, then HK is a
subgroup of G for all H ≤ G.

Proof. (Left as an exercise for the reader. See Exercise 4.2.11.) 

4.2.2 – Conjugacy
The expression in criteria (4) of Theorem 4.2.4 is not new. We encountered it before when discussing
centralizers and normalizers. (See Definitions 3.5.15 and 3.5.17.) We remind the reader of a definition
given in Section 3.5.

Definition 4.2.11
Let G be a group and let g ∈ G.
• If x ∈ G, then the element gxg −1 is called the conjugate of x by g.
• If S ⊆ G is any subset, then the subset gSg −1 is also called the conjugate of S by g.

We have seen the conjugate of a group element in other contexts previously. In Example 3.9.10,
we found a presentation for the Frieze group of a certain pattern. In that example, we noted that
−−→
rotation by π about Q3 is equal to trt−1 , where t corresponds to translation along QQ3 and r is
rotation by π about Q. In linear algebra, one encounters the change of basis formula. If A is the
n × n matrix associated to a linear transformation T : Rn → Rn with respect to a basis B, and if B
is the matrix associated to the same linear transformation, but with respect to the basis B 0 , then

B = M AM −1 ,

where M is the coordinate transition matrix from B to B 0 coordinates. In this latter example, A
and B need not be invertible, but, if T is an isomorphism, then A and B are invertible and the
conjugation B = M AM −1 occurs entirely in the group GLn (R).
The above examples give us the intuitive sense that conjugation corresponds to a change of origin,
a change of basis, or some change of perspective more generally. Consider the conjugation rsr−1 in
the dihedral group Dn . Explicitly, rsr−1 = sr−2 , which is the reflection through the line L0 that is
related to the s-reflection line L by a rotation by r. (See Figure 4.3.)

L0
rsr−1

L
s

Figure 4.3: A conjugacy operation in Dn


4.2. CONJUGACY AND NORMAL SUBGROUPS 175

Proposition 4.2.12
Let G be a group and define the relation ∼c on G by x ∼c y if y = gxg −1 for some g ∈ G.
Then ∼c is an equivalence relation. The relation ∼c is called the conjugacy relation.

Proof. (Left as an exercise for the reader. See Exercise 4.2.10.) 

The conjugacy class of an element x ∈ G is the set [x] = {gxg −1 | g ∈ G}. By Proposition 4.2.12
and properties of equivalence relations, the conjugacy classes in G partition G.
Example 4.2.13 (Conjugacy Classes in Sn ). In order to determine all the conjugacy classes in
Sn , we first prove the following claim. For all m-cycles (a1 a2 · · · am ) and for all permutations
σ ∈ Sn ,
σ(a1 a2 · · · am )σ −1 = (σ(a1 ) σ(a2 ) · · · σ(am )). (4.3)
Write τ = σ(a1 a2 · · · am )σ −1 and let i = 1, 2, . . . , m − 1. The permutation τ applied to σ(ai ) gives

σ(a1 a2 · · · am )σ −1 σ(ai ) = σ(a1 a2 · · · am ) · ai = σ(ai+1 ).

If i = m, then similarly we easily see that τ applied to τ (σ(am )) = σ(a1 ). We have seen that τ
permutes the set {σ(a1 ), σ(a2 ), . . . , σ(am )}. If b ∈ {1, 2, . . . , n}, but b ∈/ {σ(a1 ), σ(a2 ), . . . , σ(am )},
then there exists c ∈ {1, 2, . . . , n} − {a1 , a2 , . . . , am } such that b = σ(c). Then

τ (b) = σ(a1 a2 · · · am )σ −1 σ(c) = σ(a1 a2 · · · am ) · c = σ(c) = b.

We have calculated where τ sends all the elements and found that τ is given by (4.3).
Now every element ω ∈ Sn is a product of disjoint cycles, ω = τ1 τ2 · · · τm . Furthermore, we can
write
σωσ −1 = (στ1 σ −1 )(στ2 σ −1 ) · · · (στm σ −1 ), (4.4)
where each στi σ −1 is calculated from (4.3).
As a numerical example, only using (4.3) and (4.4) we determine that

(1 3 5 4)(1 4 2)(3 5)(1 3 5 4)−1 = (3 1 2)(5 4) = (1 2 3)(4 5).

Consequently, we can see that if a permutation ω has a given cycle type, then for all σ ∈ Sn ,
σωσ −1 will have the same cycle type. Conversely, if two permutations ω1 , ω2 ∈ Sn have the same
cycle type, then by using (4.3) and (4.4), we can find a σ ∈ Sn such that ω2 = σω1 σ −1 . Thus,
the conjugacy classes in Sn are precisely the sets of permutations that have the same cycle type.
Therefore, the table in Example 3.4.5 gives the conjugacy classes in S6 and gives the cardinality of
each. 4

Conjugacy classes and normal subgroups are closely related in the following way. If N is a normal
subgroup of a group G and x ∈ N , then by Theorem 4.2.4, gxg −1 ∈ N for all g ∈ G. Consequently,
the conjugacy class of x is in N . This leads to the following proposition.

Proposition 4.2.14
A subgroup H ≤ G is normal if and only if it is the union of some conjugacy classes.

In Exercise 3.7.37, we proved that if H is a subgroup of a group G, and g ∈ G, then gHg −1 is


also a subgroup of G that is isomorphic to H. Furthermore, define the relation ∼c on the set of
subgroups Sub(G) defined by H ∼c K if and only if K = gHg −1 for some g ∈ G. It is easy to show
that ∼c is an equivalence relation on Sub(G). If H is a normal subgroup, then gHg −1 = H for all
g ∈ G. In the language of equivalence classes, H E G if and only if the ∼c -equivalence class of H is
the single element set {H}.
By Theorem 4.2.4, the conjugation operation on a normal subgroup by an element g is a function
N → N . However, more can be said.
176 CHAPTER 4. QUOTIENT GROUPS

Proposition 4.2.15
Let H be any subgroup of a group G. Then for all g ∈ NG (H), the function ψg : H → H
defined by ψg (h) = ghg −1 is an automorphism of H. Furthermore, the association Ψ :
NG (H) → Aut(H) defined by Ψ(g) = ψg is a homomorphism.

Proof. Let h1 , h2 ∈ H. Then

ψg (h1 h2 ) = gh1 h2 g −1 = gh1 g −1 gh2 g −1 = ψg (h1 )ψg (h2 ).

This proves that ψg is a homomorphism. It is easy to check that ψg−1 = ψg−1 , so for all g ∈ NG (H),
the function ψg is a bijection and, hence, an automorphism of H.
Now let a, b ∈ NG (H) be arbitrary. Then for all h ∈ H,

ψab (h) = (ab)h(ab)−1 = abhb−1 a−1 = a(bhb−1 )a−1 = ψa (ψb (h)).

Hence, Ψ(ab) = Ψ(a) ◦ Ψ(b), which establishes that Ψ : NG (H) → Aut(H) is a homomorphism. 

The above proposition applies in particular when N E G, in which case NG (N ) = G.

4.2.3 – Simple Groups


Every group always has at least two normal subgroups, namely the trivial group {1} and itself G.
It is possible for the group not to contain any other normal subgroups. For example, if p is prime,
then Zp only has the two subgroups {1} and Zp so these are the only two normal subgroups.

Definition 4.2.16
A group G is called simple if it contains no normal subgroups besides {1} and itself.

We discuss simple groups in more detail in Section 9.1. By what we said above, Zp is a simple
group whenever p is a prime number. Determining if a group is simple is not always a “simple”
task. We have encountered one other family of groups that is simple, namely An with n ≥ 5.
Exercise 4.2.27 guides the reader to prove that A5 is simple. The proof that An is simple for n ≥ 6
is more challenging, but we will see a proof in Theorem 9.2.7.

Exercises for Section 4.2


1. Prove that An is a normal subgroup of Sn .
2. Determine whether hr2 i is normal in D8 .
3. Determine whether h(1 2)(3 4), (1 3)(2 4)i is normal in A4 .
4. Find all normal subgroups of D6 . [Hint: See Example 3.6.8.]
5. Prove that H E G for all subgroups H ≤ Z(G).
6. Prove that every subgroup of Q8 is a normal subgroup. [This shows that the converse to Proposi-
tion 4.2.5 is false.]
7. Let F = Q, R, C, or Fp where p is prime. Prove that for all positive integers n,

SLn (F ) E GLn (F ).

8. Let F = Q, R, C, or Fp where p is prime. Denote by Tn (F ) ≤ GLn (F ) the upper triangular matrices


in GLn (F ).
(a) Prove that the function ϕ : Tn (F ) → (F × )n defined by

ϕ(A) = (a11 , a22 , . . . , ann )


× n
is a homomorphism, where by (F ) we mean the nth direct sum of the multiplicative group
F × = F − {0}.
4.2. CONJUGACY AND NORMAL SUBGROUPS 177

(b) Conclude that the subgroup of Tn (F ) consisting of matrices with 1s down the diagonal is a
normal subgroup.
9. Let n be a positive integer and let G = GLn (R).
(a) Prove that H = GLn (Q) is a subgroup that is not a normal subgroup.
(b) Define the subset K as
K = {A ∈ GLn (R) | det(A) ∈ Q}.
Prove that K is a normal subgroup that contains H.
10. Prove Proposition 4.2.12.
11. Prove Corollary 4.2.10.
12. Let G be a group. Prove that if H ≤ G is the unique subgroup of a given order n, then H E G.
13. Let ϕ : G → H be a group homomorphism and let N E H. Prove that ϕ−1 (N ) E G. [Note that this
generalizes the fact that kernels of homomorphisms are normal subgroups of the domain group.]
14. Let G be a group, H ≤ G and N E G. Prove that H ∩ N E H.
15. Prove that the intersection of two normal subgroups N1 , N2 of a group G is again a normal subgroup
N1 ∩ N2 E G.
16. Let N1 and N2 be normal subgroups is G. Prove that N1 N2 is the join of N1 and N2 and that it is a
normal subgroup in G.
17. Let {Ni }i∈I be a collection of normal subsets of G. Prove that the intersection
\
Ni
i∈I

is a normal subgroup. Do not assume that I is finite.


18. Suppose that a subgroup H ≤ G is such that if h ∈ H with |h| = n, then H contains all the elements
in G of order n. Prove that H is a normal subgroup.
19. Let A be any subset of a group G. Prove that CG (A) E NG (A).
20. We say that a subgroup H of a group G is a characteristic subgroup if ψ(H) = H for all automorphisms
ψ ∈ Aut(G). Prove that every characteristic subgroup is a normal subgroup.
21. Prove that if g ∈ Z(G), then the conjugacy class of g is the singleton set {g}.
22. Prove that in an abelian group, all the conjugacy classes consist of a single element.
23. Prove that the conjugacy classes of Dn are:
(a) if n is even: {1}, {rn/2 }, {ra , r−a } for 1 ≤ a ≤ n
2
−1, {s, sr2 , . . . , srn−2 }, and {sr, sr3 , . . . , srn−1 };
(b) if n is odd: {1}, {ra , r−a } for 1 ≤ a ≤ n−1
2
, and {s, sr, sr2 , . . . , srn−1 }.
24. List all the conjugacy classes in A4 .
25. List all the conjugacy classes in the group G2 of Example 3.8.6.
26. Let G be a group. Consider the group of automorphisms Aut(G). Prove that the group Inn(G) of
inner automorphisms (see Exercise 3.7.38) is a normal subgroup of Aut(G).
27. A5 is Simple. In this exercise, we guide the reader to prove that A5 is a simple group.
(a) Prove that, as a partition of A5 , the set of conjugacy classes in A5 is a strict refinement of the
partition of A5 into cycle types.
(b) Prove that A5 has 6 conjugacy classes, 2 consisting of equal parts of the subset of 5-cycles, 2
consisting of equal parts of 3-cycles, the whole subset of permutations with cycle type (2, 2), and
the identity.
(c) After determining the sizes of all of the conjugacy classes of A5 , use Proposition 4.2.14 and
Lagrange’s Theorem to conclude that A5 has no normal subgroups besides {1} and the whole
group.
178 CHAPTER 4. QUOTIENT GROUPS

4.3
Quotient Groups
We began Chapter 4 by proposing to generalize to all groups the construction that led from (Z, +)
to addition in modular arithmetic, (Z/nZ, +). Our discussion sent us far afield, but we never
constructed something analogous to modular arithmetic. As promised in Section 4.2, we are in a
position to generalize the modular arithmetic construction to general groups.

4.3.1 – Quotient Groups


One of the key points in allowing us to construct Z/nZ was the innocuous result that if a ≡ c and
b ≡ d, then a + b ≡ c + d. We can rephrase this as addition of sets by saying (a + nZ) + (b + nZ) =
(a + b) + nZ. In order to generalize the modular arithmetic construction, we need a similar result
for groups in general.
Let G be a group and let ∼ be an equivalence relation on G. We will informally say that the
equivalence relation behaves well with respect to the operation if for all g1 , g2 , h1 , h2 ∈ G,

g1 ∼ g2 and h1 ∼ h2 =⇒ g1 h1 ∼ g2 h2 . (4.5)

Let ∼ be an equivalence relation on a group G that behaves well with respect to the operation.
Then on the quotient set G/ ∼, i.e., the set of ∼-equivalence classes, we can define the operation ·
by
def
[x] · [y] = [xy]. (4.6)

By virtue of condition (4.5), this operation is well-defined. We leave as an exercise for the reader to
show that (G/ ∼, ·) is a group. (See Exercise 4.3.13.)

Proposition 4.3.1
Suppose that ∼ is an equivalence relation on G that behaves well with respect to the
operation. Then the equivalence class of 1 is a normal subgroup N . Furthermore, all
equivalence classes of ∼ are of the form gN .

Proof. Let N = [1]. Obviously, N is a nonempty subset of G. Let x, y ∈ N . Then x ∼ 1 and


y ∼ 1. By (4.5), xy ∼ 1, so N is closed under the operation. Since x−1 ∼ x−1 , by (4.5) we have
xx−1 ∼ 1 x−1 , which implies that 1 ∼ x−1 . Consequently, N is closed under taking inverses. This
establishes that N ≤ G.
Now, let n ∈ N and let g be any group element. By definition, n ∼ 1. Using (4.5) twice, we first
see that gn ∼ g and then gng −1 ∼ gg −1 = 1. By Theorem 4.2.4(4), N E G.
Finally, consider the equivalence class [g] of an arbitrary element g ∈ G. Let h ∈ [g]. Then h ∼ g
and since g −1 ∼ g −1 , by (4.5) we find that g −1 h ∼ 1, so g −1 h ∈ N and hence h ∈ gN . The reverse
reasoning also holds so that h ∈ gN implies that h ∼ g. Thus, [g] = gN . 

This proposition establishes that an equivalence relation that behaves well with respect to the
group operation defines a normal subgroup. However, the converse is true.

Proposition 4.3.2
Let N be a normal subgroup of a group G. The left cosets of N form a partition of G,
which corresponds to an equivalence relation ∼ that behaves well with respect to the group
operation.
4.3. QUOTIENT GROUPS 179

Proof. We already know that the left cosets of N (which are also right cosets because N is normal)
partition G and that a partition defines a unique equivalence relation on G. (See Proposition 1.3.14.)
Let g1 , g2 be in the same left coset of N and let h1 , h2 be in the same left coset of N . Then g2−1 g1 ∈ N
and h−1
2 h1 ∈ N . Then

h−1 −1
1 g2 g1 h1 ∈ N because g2−1 g1 ∈ N and N E G
=⇒ (h−1 −1 −1
2 h1 )(h1 g2 g1 h1 ) ∈ N because h−1
2 h1 ∈ N
=⇒ h−1 −1
2 g2 g1 h1 ∈ N
=⇒ (g2 h2 )−1 (g1 h1 ) ∈ N
and we conclude that g2 h2 and g1 h1 are in the same left coset of N . Consequently, the equivalence
relation defined by the partition of left cosets of N behaves well with respect to the group operation.
Proposition 4.3.2 can be restated in the following way.

Corollary 4.3.3
Let G be a group and N a normal subgroup. The set of left cosets of N with the operation
def
(xN )(yN ) = (xy)N (4.7)

has the structure of a group with identity N and inverses given by (xN )−1 = x−1 N .

Proof. The expression (4.7) is precisely the property (4.6) of an equivalence relation that is well
behaved with respect to the group operation. Associativity follows immediately from associativity
in G and (4.7). It is obvious that (gN )(N ) = (gN )(1N ) = gN , which shows that N is the identity.
Finally, for all g ∈ G, (4.7) gives (gN )(g −1 N ) = (gg −1 )N = N so that gN has an inverse (gN )−1 =
g −1 N . 

Definition 4.3.4
Let G be a group and let N be a normal subgroup. The group defined as the set of left
cosets (which are the same as right cosets since N E G) with the operation defined in
Corollary 4.3.3, is called the quotient group of G by N , and is denoted by G/N .

The importance of Proposition 4.3.1 is that a quotient group G/N , where N E G, is the only
manner in which any quotient set G/ ∼ of G can be made into a group with an operation inherited
from G via (4.6).

4.3.2 – Examples of Quotient Groups


Example 4.3.5. As a first example, observe that (Z/nZ, +) is the quotient group of Z by nZ. Since
Z is abelian, every subgroup is normal. In fact, the notation for modular arithmetic inspired the
notation for quotient groups in general. 4
Example 4.3.6. Consider the dihedral group on a square, D4 , and consider R = hri, the subset of
rotations. Since R has index 2, RED4 . The quotient group consists of two elements D4 /R = {R, sR}.
Following the habit of notation with modular arithmetic, it is not uncommon to write 1̄ for R and
s̄ for sR.
Consider the Cayley tables of D4 and the quotient group D4 /R. When the Cayley table of a
group G is organized so that elements in the same coset of N are contiguous, then blocks of size
|N | × |N | in the original Cayley table will correspond to a single entry in the Cayley table of the
quotient group G/N . (See Figure 4.4.)
In this particular situation, the Cayley table of D4 /R carries an interesting geometric interpre-
tation. Since the coset R corresponds to all the rotations in D4 and since sR corresponds to all the
reflections through lines of symmetry, we can interpret the above Cayley table for D4 /hri as:
180 CHAPTER 4. QUOTIENT GROUPS

D4
2 3
1 r r r s sr sr2 sr3 D4 /hri
1 1 r r2 r3 s sr sr2 sr3 1̄ s̄
r r r2 r3 1 sr3 s sr sr2
1̄ 1̄ s̄
r2 r2 r3 1 r sr2 sr3 s sr
r3 r3 1 r r2 sr sr2 sr3 s s̄ s̄ 1̄

s s sr sr2 sr3 1 r r2 r3
sr sr sr2 sr3 s r3 1 r r2
sr2 sr2 sr3 s sr r2 r3 1 r
sr3 sr3 s sr sr2 r r2 r3 1

Figure 4.4: Cayley graphs of the quotient D4 /hri

· rotation reflection
rotation rotation reflection
reflection reflection rotation

So every rotation composed with a rotation is a rotation; every rotation composed with a reflec-
tion is a reflection, and so forth. 4

As illustrated in the previous two examples, it is not uncommon to mimic the notation used in
modular arithmetic and denote a coset gN in the quotient group G/N by ḡ. In modular arithmetic,
the modulus is understood by context. Similarly, when we use this notation ḡ, the normal subgroup
is understood by context.

Example 4.3.7. As another example, consider the subgroup N = h−1i in Q8 . By Proposition 4.2.6,
since N is the center of Q8 , it is a normal subgroup. The elements in the quotient group Q8 /N are
{1̄, ī, j̄, k̄}. It is easy to see that ī2 = −1 = 1̄ and similarly for j̄ and k̄. Hence, all the nonidentity
elements have order 2. Consequently, we can conclude that Q8 /N ∼ = Z2 ⊕ Z2 . 4

Example 4.3.8. Consider the group G = hx, y | x3 = y 7 = 1, xyx−1 = y 2 i. In Example 3.8.6, we


saw that this group has order 21 and that every element in G can be written as xa y b with a = 0, 1, 2
and b = 0, 1, 2, . . . , 6. By Theorem 4.2.8, since xyx−1 = y 2 ∈ hyi and yyy −1 = y ∈ hyi, then hyi is
normal in G.
The cosets of hyi are 1̄ = hyi, x̄ = xhyi, and x2 = x2 hyi. The corresponding quotient group is

G/hyi = {1, x, x2 } ∼
= Z3 . 4

Definition 4.3.9
Let N E G. The function π : G → G/N defined by π(g) = gN is called the canonical
projection of G onto G/N .

By the definition of the operation on cosets in (4.7), the canonical projection is a homomorphism.
In Proposition 4.2.7, we saw that kernels of homomorphisms are normal subgroups; the converse is
in fact true, which leads to the following proposition.
4.3. QUOTIENT GROUPS 181

Proposition 4.3.10
A subgroup N of G is normal if and only if it is the kernel of some homomorphism.

Proof. This follows from Proposition 4.2.7 and the fact that N is the kernel of the canonical projec-
tion π : G → G/N . 

By the result of Exercise 3.7.10, this proposition implies that for all g ∈ G, the order of gN in
G/N divides the order of g in G. We can prove this same result in another fashion and obtain more
precise information. By Exercise 4.3.14,
|hgiN |
|gN | = .
|N |
By Proposition 4.1.16, in G we have
|g| |N |
|hgiN | = ,
|hgi ∩ N |
which implies that
|hgiN |
|g| = |hgi ∩ N | = |hgi ∩ N | · |gN |.
|N |
The subgroup order |hgi ∩ N | is the divisibility factor between |gN | and |g|.
As a final set of examples of quotient groups, by Proposition 4.2.6, Z(G) E G for all groups G. In
Exercise 4.3.21, the reader is asked to prove the important result that if G/Z(G) is cyclic, then G is
abelian. In some intuitive sense, the quotient group G/Z(G) “removes” any elements that commute
with everything else. This intuitive manner of thinking may be misleading because the center of
the quotient group G/Z(G) is not necessarily trivial (Exercise 4.3.20). Nonetheless, given a group
G, the quotient group G/Z(G) often tells us something important about the group G. As specific
examples, we mention in the so-called projective linear groups.
Example 4.3.11 (Projective Linear Groups). Suppose that F is C, R, Q or Fp , where p is
prime. In Example 3.5.14, we proved that the center of GLn (F ) consists of matrices of the form
aI, where a ∈ F × = F − {0}. A similar result still holds for the center of SLn (F ), except not all
diagonal matrices of the form aI are in SLn (F ).
The projective general linear group of order n is

PGLn (F ) = GLn (F )/Z(GLn (F )),

while the projective special linear group of order n is


PSLn (F ) = SLn (F )/Z(SLn (F )). 4

4.3.3 – Direct Sum Decomposition


Early on, we introduced the concept of a direct sum of a finite number of groups. Recall that if
G1 , G2 , . . . , Gn are groups, then the direct sum G = G1 ⊕ G2 ⊕ · · · ⊕ Gn is the group in which
the set is the Cartesian product of the groups and in which the operation on n-tuples is performed
component-wise.
By a generalization of Exercise 4.3.25, for all i, the subset

G̃i = {(1, 1, . . . , 1, gi , 1, . . . , 1) | gi ∈ Gi } ⊆ G1 ⊕ G2 ⊕ · · · ⊕ Gn

is isomorphic to Gi and is a normal subgroup of G. By the result of Exercise 4.2.16, the product set
of normal subgroups is the join of the subgroups. In this situation, as a join of subgroups,

G = G̃1 G̃2 · · · G̃n


182 CHAPTER 4. QUOTIENT GROUPS

and for all k = 1, 2, . . . , n − 1,

G̃1 G̃2 · · · G̃k ∩ G̃k+1 = {(1, 1, . . . , 1)}.

Furthermore,

G1 ⊕ G2 ⊕ · · · ⊕ Gn /G̃i ∼
= G1 ⊕ · · · ⊕ Gi−1 ⊕ G
ci ⊕ Gi+1 ⊕ · · · ⊕ Gn ,

where the Gci notation indicates that the corresponding term is omitted.
In the above discussion, we assumed that we started with a collection of groups, constructed
the direct sum group, and studied some of its properties. In contrast, suppose that we encounter a
group G either from some natural context, as a quotient group, as a presentation or by any other
means. It is possible that G is isomorphic to a direct sum of groups G1 ⊕ G2 ⊕ · · · ⊕ Gn . If it is,
then these groups Gi would be isomorphic to subgroups of G and possess the properties mentioned
above. The following theorem states when and how a group G may be isomorphic to a direct sum
of its own subgroups.

Theorem 4.3.12 (Direct Sum Decomposition)


Let N1 , N2 , . . . , Nk be a finite collection of normal subgroups of a group G satisfying:
(1) G = N1 N2 · · · Nk ;
(2) N1 N2 · · · Ni ∩ Ni+1 = {1} for all i = 1, 2, . . . , k − 1.

Then G ∼
= N1 ⊕ N2 ⊕ · · · ⊕ Nk .

Proof. First, we show that for any indices i 6= j, the elements in Ni commute with the elements in
Nj . Let ni ∈ Ni and nj ∈ Nj and consider the element n−1 −1
i nj ni nj . This element is called the
commutator of ni and nj and is denoted by [ni , nj ]. (See Exercise 4.3.24.) Since n−1j ∈ Nj E G,
−1 −1 −1
then ni nj ni ∈ Nj and [ni , nj ] ∈ Nj . Similarly, Ni E G, so nj ni nj ∈ Ni so again [ni , nj ] ∈ Ni .
Thus, [ni , nj ] ∈ Ni ∩ Nj . However, by Condition 2, Ni ∩ Nj = {1}, so

n−1 −1 −1
i nj ni nj = 1 =⇒ nj ni nj = ni =⇒ ni nj = nj ni .

Second, we show that if 1 = n1 n2 · · · nk , with ni ∈ Ni , then n1 = n2 = · · · = nk = 1. Suppose


that 1 = n1 n2 · · · nk for ni ∈ Ni . Then n1 n2 · · · nk−1 = n−1k . Since n1 n2 · · · nk−1 ∈ N1 N2 · · · Nk−1
and n−1k ∈ Nk , by Condition 2, we deduce that nk = 1 and n1 n2 · · · nk−1 = 1. By an induction
argument, we can continue this reasoning and prove that n1 = n2 = · · · = nk = 1.
We now claim that each element in G can be expressed uniquely as a product of elements from
the Ni normal subgroups. Condition 1 implies that every g ∈ G can be expressed as a product of
elements in Ni as g = n1 n2 · · · nk . Suppose that g = n1 n2 · · · nk = n01 n02 · · · n0k with ni , n0i ∈ Ni for
all i = 1, 2, . . . , k. Since the elements in Ni for i 6= k commute, we have

n1 n2 · · · nk = n01 n02 · · · n0k =⇒ n2 n3 · · · nk = n−1 0 0 0


1 n1 n2 · · · nk
=⇒ n3 · · · nk = n−1 0 −1 0 0 0
1 n1 n2 n2 n3 · · · nk
=⇒ 1 = (n−1 0 −1 0 −1 0
1 n1 )(n2 n2 ) · · · (nk nk ).

Since n−1 0 −1 0 0
i ni ∈ Ni , by our previous remark, we deduce that ni ni = 1 hence ni = ni for all
i = 1, 2, . . . , k. Hence, g can be expressed uniquely as a product of elements in the subgroups
N1 , N2 , . . . , Nk .
Finally, consider the function ψ : G → N1 ⊕ N2 ⊕ · · · ⊕ Nk defined by

ψ(g) = (n1 , n2 , . . . , nk ) when g = n1 n2 · · · nk .


4.3. QUOTIENT GROUPS 183

This is well-defined precisely because g can be expressed uniquely as a product of elements in the
Ni . Let g = n1 n2 · · · nk and h = n01 n02 · · · n0k be arbitrary elements in G, where ni , n0i ∈ Ni . From
the first claim in this proof, we have

gh = (n1 n2 · · · nk )(n01 n02 · · · n0k ) = n1 n01 n2 n02 · · · nk n0k .

It is important to note that gh is still not necessarily hg because ni and n0i need not commute in
Ni . Then
ψ(gh) = ψ(n1 n01 n2 n02 · · · nk n0k ) = (n1 n01 , n2 n02 , · · · , nk n0k ) = ψ(g)ψ(h).
Hence, ψ is a homomorphism. It is surjective since for all (n1 , n2 , . . . , nk ) ∈ N1 ⊕ N2 ⊕ · · · ⊕ Nk , we
have ψ(n1 n2 · · · nk ) = (n1 , n2 , . . . , nk ). It is also injective, since Ker ψ = {1}. Consequently, ψ is an
isomorphism and the theorem follows. 

Example 4.3.13. As an example of the Direct Sum Decomposition Theorem, consider the group
U (35). This is an abelian group with φ(35) = 24 elements. Again, recall that in any abelian group,
all subgroups are normal. Consider the subgroups N1 = h6i and N2 = h2i. For elements, we have

h6i = {1, 6},


h2i = {1, 2, 4, 8, 16, 32, 29, 23, 11, 22, 9, 18}.

We note that N1 ∩ N2 = {1}. Furthermore, by Proposition 4.1.16, |N1 N2 | = 24, so N1 N2 = G. The


subgroups N1 and N2 satisfy the conditions of the Direct Sum Decomposition Theorem. Hence,

U (35) ∼
= h6i ⊕ h2i ∼
= Z2 ⊕ Z12 .
The group U (35) can be decomposed even further. Consider the three subgroups h6i, h8i, and
h16i. We leave it to the reader (Exercise 4.3.26) to verify that these subgroups satisfy the conditions
of the Direct Sum Decomposition Theorem. Then, we conclude that

U (35) ∼
= h6i ⊕ h16i ⊕ h8i ∼
= Z2 ⊕ Z3 ⊕ Z4 .
Note that since gcd(3, 4) = 1, then Z12 ∼
= Z4 ⊕ Z3 , so the above two decompositions are equivalent.
In this latter decomposition, depicted visually in Figure 4.5, we could write
a b c
U (35) = {6 16 8 | a = 0, 1; b = 0, 1, 2; c = 0, 1, 2, 3}. 4

31
3
24
16 2 17
11 26
18 33
4 19
161 32 12
16 6
23 13
9 34
160 2 27
1
8
29 61
8 0 22
81
b 82
83 60
a

Figure 4.5: A visualization of the decomposition of U (35)


184 CHAPTER 4. QUOTIENT GROUPS

Exercises for Section 4.3


1. Prove that Sn /An ∼
= Z2 .
2. Prove that hr2 i E D4 , list the elements in D4 /hr2 i, and show that D4 /hr2 i ∼
= Z2 ⊕ Z2 .
3. Consider the multiplicative group R× and the subgroup {1, −1}. Prove that R× /{1, −1} ∼
= R>0 ,
where the latter set is given the group structure of multiplication.
4. Consider the group G = U (33) and consider the subgroup H = h4i. Since G is abelian, H E G. List
the elements of G/H and determine its isomorphism type (i.e., find to which group in the table of
Section A.2.1 it is isomorphic.)
5. Explicitly list the elements in the quotient group U (33)/h10i. Show that this quotient group is cyclic
and find an explicit generator.
6. Consider the quotient group construction described in Example 4.3.7. Fill out the Cayley table of
Q8 and shade boxes of this Cayley table with 4 different colors to mimic the visual illustration of the
quotient group process as shown in Example 4.3.6.
7. Consider the function π : R2 → R given by π(x, y) = 2x + 3y.
(a) Show that π is a homomorphism from (R2 , +) to (R, +) and describe the fibers geometrically.
(b) Determine Ker π.
(c) Interpret geometrically the addition operation in R2 / Ker π.
8. Let N be a normal subgroup of G. Prove that for all g ∈ G and all k ∈ Z, (gN )k = (g k )N in G/N .
9. Prove that the quotient group of a cyclic group by any subgroup is again a cyclic group. Deduce that
if d|n, then in Z/nZ/hdi ∼
= Z/dZ.
10. Consider the group (R, +) and the quotient group R/Z. Recall that the torsion subgroup of an abelian
group is the subgroup of elements of finite order. (See Exercise 3.5.18.)
(a) Prove that Tor(R/Z) = Q/Z.
(b) Prove that R/Z is isomorphic to the circle group, defined as
  
cos θ − sin θ
∈ GL2 (R) θ ∈ R .
sin θ cos θ

11. Prove that if G is generated by a subset {g1 , g2 , . . . , gn }, then the quotient group G/N is generated
by {ḡ1 , ḡ2 , . . . , ḡn }.
12. Consider the dihedral group Dn and let d be a divisor of n.
(a) Prove that hrd i E Dn .
∼ Dd .
(b) Show that Dn /hrd i =
(c) Give a geometric interpretation of this last result. (What information is conflated when taking
the quotient group?)
13. Let G be a group and let ∼ be an equivalence relation on G that behaves well with respect to the
operation. Prove that (G/ ∼, ·) is a group.
14. Let N be a normal subgroup of a group G and write g for the coset gN in the quotient group G/N .
(a) Show that for all g ∈ G, if the order of g is finite, then |g| is the least positive integer k such
that g k ∈ N .
(b) Deduce that the element order |g| is equal to |hgiN |/|N |.
15. Consider the group called G, which is given in generators and relations as

G = hx, y | x4 = y 3 = 1, x−1 yx = y −1 i.

(a) Prove that G is a nonabelian group of order 12.


(b) Prove that hyi is a normal subgroup and that G/hyi ∼
= Z4 .
(c) Prove that G is not isomorphic to A4 or D6 . [In Section 9.3, we will describe G as the semidirect
product Z3 o Z4 .]
4.3. QUOTIENT GROUPS 185

16. Consider the group G given in generators and relations as

G = hx, y | x4 = y 5 = 1, x−1 yx = y 2 i.

(a) Prove that G is a nonabelian group of order 20.


(b) Prove that hyi is a normal subgroup and that G/hyi ∼
= Z4 .
(c) Prove that G is not isomorphic to D10 .
[Note: This group is called the Frobenius group of order 20 and is denoted by F20 .]
17. Let G be the group defined by

G = hx, y | x2 = y 8 = 1, yx = xy 5 i.

(a) Show that hy 4 i E G.


(b) Show that G/hy 4 i is a group of order 8.
(c) Find the isomorphism type of G/hy 4 i (i.e., find to which group in the table of Section A.2.1 it is
isomorphic.) Explain.
[Note: This group is called the modular group of order 16.]
18. Consider the group SL2 (F3 ), i.e., the special linear group of 2 × 2 matrices with determinant 1, with
entries in F3 , modular arithmetic base 3.
(a) Show that this group is nonabelian with 24 elements.
(b) Prove that the center of SL2 (F3 ) is the subgroup of two elements
   
1 0 2 0
Z(SL2 (F3 )) = , .
0 1 0 2

(c) The projective special linear group

PSL2 (F3 ) = SL2 (F3 )/Z(SL2 (F3 ))

has order 12. Determine, with proof, to which group in the table of Section A.2.1 it is isomorphic.
19. Consider the group GL2 (F3 ), i.e. the general linear group of 2 by 2 invertible matrices with elements
in modular arithmetic modulo 3. (By the result of Exercise 3.2.32, this group has 48 elements.) We
consider the group G = PGL2 (F3 ), the projective general linear group of order 2 on F3 .
(a) Prove that |G| = 24 and show that G and S4 have the same number of elements of any given
order.
(b) Show explicitly that G ∼= S4 . [Showing that they have the same number of elements of a given
order is evidence but not sufficient for a proof of the isomorphism.]
20. Find an example of a group G in which the center of G/Z(G) is not trivial.
21. Prove that if G/Z(G) is cyclic, then G is abelian. Give an example to show that G is not necessarily
abelian if we only assume that G/Z(G) is abelian.
22. Prove that if |G| = pq, where p and q are two primes, not necessarily distinct, then G is either abelian
or Z(G) = {1}. [Hint: Use Exercise 4.3.21.]
23. Let N be a normal subgroup of a finite group G and let g ∈ G. Prove that if gcd(|g|, |G/N |) = 1, then
g ∈ N.
24. Let G be a group. The commutator subgroup, denoted G0 , is defined as the subgroup generated by all
products x−1 y −1 xy, for any x, y ∈ G. In other words,

G0 = hx−1 y −1 xy | x, y ∈ Gi.

(a) Prove that G0 E G.


(b) Without using part (a), prove that G0 is a characteristic subgroup of G. (See Exercise 4.2.20.)
(c) Prove that G/G0 is abelian.
186 CHAPTER 4. QUOTIENT GROUPS

25. Let A and B be two groups and let G = A ⊕ B. The subgroup A × {1} ≤ G is isomorphic to A. Prove
that A × {1} E G and that G/(A × {1}) ∼
= B.
26. In Example 4.3.13, prove that the subgroups h6i, h8i, and h16i satisfy the conditions of the Direct
Sum Decomposition Theorem.
27. Use the direct sum decomposition to show that U (100) ∼
= Z20 ⊕ Z2 ∼
= Z5 ⊕ Z4 ⊕ Z2 .
28. Let p be a prime number and let k be a positive integer. Prove that Zpk is not isomorphic to the
direct product of any other groups.
29. Prove that Q8 is not isomorphic to the direct product of any other groups.

4.4
Isomorphism Theorems
As we saw in the previous section, properties of quotient groups of a group G may imply relationships
between certain subgroups of G. Much more can be said, however. In this section, we discuss some
theorems, the four Isomorphism Theorems, that describe further structure within a group.

4.4.1 – First Isomorphism Theorem


When we introduced the concept of a homomorphism, we gave the intuitive explanation that a
homomorphism preserves the group structure. The first of the Isomorphism Theorems shows more
precisely in what sense a homomorphism carries the structure of one group into another.

Theorem 4.4.1 (First Isomorphism Theorem)


Let ϕ : G → H be a homomorphism between groups. Then Ker ϕEG and G/ Ker ϕ ∼
= Im ϕ.

Proof. Proposition 4.2.7 established that Ker ϕ E G.


The elements in the quotient group G/ Ker ϕ are left cosets of the form g(Ker ϕ). We define the
function

ϕ : G/ Ker ϕ −→ Im ϕ = ϕ(G)
g(Ker ϕ) 7−→ ϕ(g).

For any g ∈ G, the element g is only one of many possible representatives of the coset g Ker ϕ.
Thus, in order to verify that this is even a function, we first need to check that ϕ(g(Ker ϕ)) gives the
same output for every representative of g(Ker ϕ). Suppose gh is any element in the coset g Ker ϕ.
Then ϕ(gh) = ϕ(g)ϕ(h) = ϕ(g). Thus, the choice of representative has no effect on the stated
output ϕ(g). This simply means that ϕ is well-defined as a function.
However, it is easy to check that ϕ is a homomorphism. Furthermore, ϕ is surjective since every
element ϕ(g) in ϕ(G) is obtained as the output ϕ(g Ker ϕ). To prove injectivity, let g1 , g2 ∈ G. Then

ϕ(g1 Ker ϕ) = ϕ(g2 Ker ϕ) ⇐⇒ ϕ(g1 ) = ϕ(g2 )


⇐⇒ ϕ(g2 )−1 ϕ(g1 ) = 1 ⇐⇒ ϕ(g2−1 g1 ) = 1
⇐⇒ g2−1 g1 ∈ Ker ϕ ⇐⇒ g1 Ker ϕ = g2 Ker ϕ.

This proves injectivity of ϕ. Consequently, ϕ is bijective and thus an isomorphism and the theorem
follows. 
4.4. ISOMORPHISM THEOREMS 187

The First Isomorphism Theorem shows that the image of an isomorphism, as a subgroup of the
codomain, must already exist within the structure of the domain group, not as a subgroup but as
a quotient group. This theorem also shows how any homomorphism ϕ : G → H can be factored
into the surjective (canonical projection) map π : G → G/ Ker ϕ and an injective homomorphism
ϕ : G/ Ker ϕ → H so that ϕ = ϕ ◦ π. We often depict this relationship by the following commutative
diagram.

π
G G/ Ker ϕ
ϕ
ϕ

The First Isomorphism Theorem leads to many consequences about groups, some elementary
and some more profound. One implication is that if ϕ : G → H is an injective homomorphism, then
Ker ϕ = {1} and so G/ Ker ϕ = G ∼ = ϕ(G). In this situation, we sometimes say that G is embedded
in H or that ϕ is an embedding of G into H because ϕ maps G into an exact copy of itself as a
subgroup of H.
As another example, suppose that G is a simple group. By definition, it contains no normal
subgroups besides {1} and itself. Hence, by the First Isomorphism Theorem, any homomorphism ϕ
from G is either injective (Ker ϕ = {1}) or trivial (ϕ(G) = {1}). Thus, under any homomorphism,
a simple group is either an embedding into any other group or its image is trivial.
Combining the First Isomorphism Theorem with Lagrange’s Theorem, we are able to deduce the
following not obvious corollary.

Corollary 4.4.2
Let G and H be finite groups with gcd(|G|, |H|) = 1. Then the only homomorphism
ϕ : G → H is the trivial homomorphism, ϕ(g) = 1H .

Proof. Since ϕ(G) ≤ H, then by Lagrange’s Theorem |ϕ(G)| divides |H|. By the First Isomor-
phism Theorem, |ϕ(G)| = |G|/| Ker ϕ|. Consequently |ϕ(G)| divides |G|. Hence, |ϕ(G)| divides
gcd(|G|, |H|) = 1, so |ϕ(G)| = 1. The only subgroup of H that has only 1 element is {1H }. 

The First Isomorphism Theorem leads to many other more subtle results in group theory. The
following Normalizer-Centralizer Theorem is an immediate consequence but is important in its own
right. In future sections, we will see how this theorem implies more subtle constraints on the internal
structure of a group, leading to consequences for the classification of groups.

Theorem 4.4.3 (Normalizer-Centralizer Theorem)


Let H be a subgroup of a group G. Then NG (H)/CG (H) is isomorphic to a subgroup of
Aut(H).

Proof. By Proposition 4.2.15, the function Ψ : NG (H) → Aut(H), defined by Ψ(g) = ψg , where
ψg : N → N with ψg (n) = gng −1 is a homomorphism. The image subgroup Ψ(NG (H)) ≤ Aut(H)
could be strictly contained in Aut(H).
Now the kernel of Ψ is precisely

Ker Ψ = {g ∈ NG (H) | ghg −1 = h for all h ∈ H} = CG (H).

By the First Isomorphism Theorem, we deduce that NG (H)/CG (H) is isomorphic Ψ(NG (H)), which
is a subgroup of Aut(H). 
188 CHAPTER 4. QUOTIENT GROUPS

As a general hint to the reader, if someone ever asks for a proof that G/N ∼
= H, where G and
H are groups with N E G, then there must exist a surjective homomorphism ϕ : G → H such that
Ker ϕ = N . Hence, when attempting to prove such results, the First Isomorphism Theorem offers
the strategy of looking for an appropriate homomorphism.

4.4.2 – Second Isomorphism Theorem

Theorem 4.4.4 (The Second Isomorphism Theorem)


Let G be a group and let A and B be subgroups such that A ≤ NG (B). Then AB is a
subgroup of G, B E AB, A ∩ B E A and

AB/B ∼
= A/A ∩ B.

Proof. Since A ≤ NG (B), then A normalizes B and AB, which is also a subgroup of G, normalizes
B. Thus, B E AB.
Define the function φ : A → AB/B by φ(a) = aB. This is a homomorphism precisely because
the group operation in AB/B is well-defined:

φ(a1 a2 ) = (a1 a2 )B = (a1 B)(a2 B) = φ(a1 )φ(a2 ).

Clearly φ is surjective so φ(A) = AB/B, but we would like to determine the kernel. Now φ(a) = 1B
if and only if aB = 1B if and only if a ∈ B. This means that Ker φ = A ∩ B. Thus, A ∩ B E A and
by the First Isomorphism Theorem

A/A ∩ B = AB/B. 

The Second Isomorphism Theorem is also called the Diamond Isomorphism Theorem because it
concerns the relative sides of particular “diamonds” inside the lattice structure of a group.

AB
// /
A B
/ //
A∩B

{1}

In the above diagram, assuming A ≤ NG (B), then the opposite /-sides not only have the same
index, |AB : B| = |A : A ∩ B|, but correspond to normal subgroups and satisfy AB/B ∼= A/A ∩ B.
On the other hand, if B ≤ NG (A), then the opposite //-sides satisfy the same property. Finally,
in the special case that A and B are both normal subgroups of G, then the Second Isomorphism
Theorem applies to both pairs of opposite sides of the diamond.
4.4. ISOMORPHISM THEOREMS 189

8 8 8 120◦
7 7 7
5 5 5
6 6 6

4 4 4
3 3 3
1 1 1
2 2 2

Figure 4.6: Symmetries of a cube

4.4.3 – Third Isomorphism Theorem

Theorem 4.4.5 (The Third Isomorphism Theorem)


Let G be a group and let H and K be normal subgroups of G with H ≤ K. Then
K/H E G/H and (G/H)/(K/H) ∼ = G/K.

Proof. Consider the mapping ϕ : G/H → G/K with ϕ(gH) = gK. We first need to show that it
is a well-defined function. Suppose that g1 H = g2 H. Then g2−1 g1 ∈ H. But since H ≤ K, then
g2−1 g1 ∈ H implies that g2−1 g1 ∈ K. Thus, g1 K = g2 K. Hence, ϕ is well-defined. By properties of
cosets, ϕ is also a surjective homomorphism. The kernel of ϕ is

Ker ϕ = {gH ∈ G/H | g ∈ K} = K/H.

Now by the First Isomorphism Theorem, we deduce that K/H E G/H and that

(G/H)/(K/H) ∼
= G/K. 

Example 4.4.6. A simple example of the Third Isomorphism Theorem concerns subgroups of G =
Z. Let H = 48Z and K = 8Z. We have H ≤ K and, since G is abelian, both H and K are normal
subgroups. We have

G/H = Z/48Z, K/H = 8Z/48Z = h8i, and G/K = Z/8Z,

where in K/H, the element 8 is in Z/48Z. The Third Isomorphism Theorem gives

(Z/48Z)/(8Z/48Z) ∼
= Z/8Z. 4

Example 4.4.7. Let G be the group of symmetries in R3 on the vertices of the cube that preserve
the cube structure. This group is similar to the dihedral group on the square but more complicated
because the rotations in the plane correspond to rigid motions of the cube. Furthermore, this group
is strictly larger than the group of rigid motions of a cube since it also includes reflections through
planes.
We can see that |G| = 48 by reasoning in the following way. Under a symmetry σ of the cube,
the vertex 1 may be mapped to any of the eight other vertices. Then, under a cube symmetry, the
three edges that are incident with vertex 1 may be mapped in any way to the three edges that are
incident with σ(1). There are 3! = 6 possibilities for this mapping of incident edges. Once we know
190 CHAPTER 4. QUOTIENT GROUPS

where 1 goes and where its incident edges go, the rest of the mapping of the cube is completely
determined. Hence, |G| = 8 × 6 = 48.
By Exercise 3.7.36, the group of rigid motions R, which includes no reflections through planes,
is isomorphic to S4 as the permutation group on the 4 maximal diagonals. The group R of rigid
motions of a cube is only a subgroup of G. Since |G : R| = 2, then R E G. Figure 4.6 illustrates three
symmetries of a cube, two reflections through a plane, and one rotation about a maximal diagonal.
The reflections through planes are not rigid motions. The reflection through a maximal diagonal is
a rigid motion but there are many other rigid motions.
Consider also the subgroup H of G generated by the rotations by multiples of 120◦ about the
maximal diagonals. It is not hard to see that all symmetries of the cube are generated by reflections
through planes. If f is a reflection through a plane and r is a rotation by 2π/3 through a maximal
diagonal L, then f rf −1 is the rotation through the maximal diagonal L0 = f (L), i.e., obtained by
reflecting L via f . From Theorem 4.2.8, we conclude that H E G. Note that if we view R as S4
via how it permutes the maximal diagonals of the cube, H is generated by 3-cycles and hence H
corresponds to the subgroup A4 in R.
This example gives a situation in which H ≤ R ≤ G and both H and R are normal subgroups of G.
We can interpret the quotient group G/R ∼ = Z2 as carrying the information of whether a symmetry
is a rigid motion (orientation preserving) or a reflected rigid motion (orientation reversing). The
quotient group R/H ∼ = Z2 carries information about whether a rigid motion is odd or even in
the identification of R with S4 . Finally, G/H ∼ = Z2 ⊕ Z2 contains information about both even
versus odd and orientation-preserving versus orientation-reversing. Intuitively speaking, the Third
Isomorphism Theorem, which states that

(G/H)/(R/H) ∼
= G/R,

says that this information about orientation is contained without loss of structure in G/H. 4

4.4.4 – Fourth Isomorphism Theorem

Theorem 4.4.8 (The Fourth Isomorphism Theorem)


Let G be a group with a normal subgroup N . Then there is a bijection between the
subgroups A of G which contain N and the set of subgroups of G/N . The bijection is given
by A ←→ Ā = A/N . Furthermore, this bijection is such that for all subgroups A, B with
N ≤ A, B ≤ G we have:
(1) A ≤ B if and only if A/N ≤ B/N .
(2) If A ≤ B then |B/N : A/N | = |B : A|.

(3) hA/N, B/N i = hA, Bi/N .


(4) A/N ∩ B/N = (A ∩ B)/N .
(5) A E G if and only if A/N E G/N .

Proof. (Parts (2) through (5) are left as exercises for the reader. See Exercise 4.4.13.)
For part (1), suppose first that A ≤ B. Then for all gN ∈ A/N , we have g ∈ A ⊆ B and
hence gN ∈ B/N . Conversely, suppose that A/N ≤ B/N . Let a ∈ A. Then by the hypothesis,
aN ∈ B/N . If aN = bN for some b ∈ B, then b−1 a ∈ N . But N ≤ B so b−1 a = b0 and hence
a = bb0 . Thus, a ∈ B. Since a was arbitrary, A ≤ B. 

The Fourth Isomorphism Theorem is also called the Lattice Isomorphism Theorem because it
states that the lattice of a quotient group G/N can be found from the lattice of G by ignoring all
vertices and edges that are not above N in the lattice of G. Furthermore, part 2 indicates that if we
4.4. ISOMORPHISM THEOREMS 191

labeled each edge in the lattice of subgroups with the index between groups, then even these indices
are preserved when passing to the quotient group.
Example 4.4.9 (Quaternion Group). Consider the quaternion group Q8 . Note that Z(Q8 ) =
h−1i and hence this is a normal subgroup. The following lattice diagram of Q8 depicts with double
edges all parts of the diagram above h−1i. Hence, according to the Fourth Isomorphism Theorem,
the lattice of Q8 /h−1i is the sublattice involving double edges.

Q8

hii hji hki

h−1i

{1}
4

The Fourth Isomorphism Theorem parallels how lattices interact with subgroups. The lattice of
a subgroup H of a group G can be found from the lattice of G by ignoring all vertices and edges
that are not below H in the lattice of G. If this similarity tempts someone to guess that a group
might be completely determined from knowing G/N and N , the reader should remain aware that
this is not the case. We do not need to look further than G = D3 with N = hri as compared to Z6
with N = hz 2 i for an example. In both cases G/N ∼= Z2 and N ∼ = Z3 .

4.4.5 – Universal Quotient Group Property (Optional)


An important application of the Isomorphism Theorems is the following Universal Quotient Group
Property. As we will see, this theorem is important to understand in more detail group presentations.

Theorem 4.4.10 (Universal Quotient Group Property)


Every finitely generated group is isomorphic to a quotient group of a free group.

Proof. Suppose that G can be generated by the elements g1 , g2 , . . . , gn ∈ G. Consider the free group
on n symbols F (x1 , x2 , . . . , xn ). Since the symbols x1 , x2 , . . . , xn have no relations among them, by
Theorem 3.8.8, there exists a unique homomorphism ϕ : F (x1 , x2 , . . . , xn ) → G such that ϕ(xi ) = gi
for all i = 1, 2, . . . , n.
The homomorphism ϕ is surjective, since Im ϕ = hg1 , g2 , . . . , gn i = G. Hence, by the First
Isomorphism Theorem,
G∼ = F (x1 , x2 , . . . , xn )/ Ker ϕ, (4.8)
and the theorem follows. 

More can be said about Ker ϕ in the above theorem and its connection to a presentation of G.
Suppose that G can be presented by

G = hg1 , g2 , . . . , gn | w1 = 1 w2 = 1 · · · wm = 1i (4.9)

where each wj is a word in the generators of G. (It is always possible to write the relations in a
presentation in this way. For example, the relation rs = sr−1 in a dihedral group can be written as
rsrs = 1.)
Let H be another group generated by elements {h1 , h2 , . . . , hn }. Suppose that there exists a
homomorphism ψ : G → H such that ψ(gi ) = hi . Since H is generated by {h1 , h2 , . . . , hn },
192 CHAPTER 4. QUOTIENT GROUPS

then ψ is surjective. Furthermore, the generators hi satisfy the relations given by ψ(wj ) = 1 for
j = 1, 2, . . . , m. By the First Isomorphism Theorem, H ∼
= G/ Ker ψ. If Ker ψ is trivial, then G ∼
= H.
If Ker ψ is not trivial, then for each element u ∈ Ker ψ, expressed as a word in the generators
α`
u = uα1 α2
1 u2 · · · u` with ui ∈ {g1 , g2 , . . . , gn },

leads to a relation ψ(u1 )α1 ψ(u2 )α2 · · · ψ(u` )α` = 1 in the generators of H. By definition of a presenta-
tion, this relation on the generators of H does not follow from the relations wj = 1 for j = 1, 2, . . . , m.
Consequently, we can now understand a group defined by the presentation (4.9) as the largest group
G generated by n elements satisfying the relations wj = 1 for j = 1, 2, . . . , m, where by “largest”
we mean that for any other group H generated by n elements, which also satisfy the corresponding
relations, there exists a surjective homomorphism G → H.
Return now to the proof of Theorem 4.4.10. Let w̃j be the equivalent word in F (x1 , x2 , . . . , xn )
where in each wj the symbol gi is replaced with xi . Because of the isomorphism in (4.8) and wj = 1
in G, we have w̃j ∈ Ker ϕ for all j. Hence,

hw̃1 , w̃2 , . . . , w̃m i ≤ Ker ϕ.

However, Ker ϕ is a normal subgroup of F (x1 , x2 , . . . , xn ) whereas, in general, hw̃1 , w̃2 , . . . , w̃m i is
not a normal subgroup of F (x1 , x2 , . . . , xn ).
Now suppose that there exists a normal subgroup N E F (x1 , x2 , . . . , xn ) such that

hw̃1 , w̃2 , . . . , w̃m i ≤ N ≤ Ker ϕ.

Then by the Third Isomorphism Theorem, the mapping

π : F (x1 , x2 , . . . , xn )/N −→ F (x1 , x2 , . . . , xn )/ Ker ϕ ∼


=G
xN 7−→ x(Ker ϕ)

is a surjective homomorphism with kernel (Ker ϕ)/N . In (Ker ϕ)/N , the generators xi N satisfy
the same relations as the generators of G. Consequently, by our remark above, there also exists
a surjective homomorphism ψ : G → F (x1 , x2 , . . . , xn )/N . Since π and ψ are both surjective
and π ◦ ψ is the identity on G, both homomorphisms are bijections and we deduce that G ∼ =
F (x1 , x2 , . . . , xn )/N so N = Ker ϕ. We conclude that Ker ϕ is the smallest (by inclusion) normal
subgroup of F (x1 , x2 , . . . , xn ) that contains hw̃1 , w̃2 , . . . , w̃m i. We say that Ker ϕ is the normal
closure of hw̃1 , w̃2 , . . . , w̃m i in F (x1 , x2 , . . . , xn ).
Consequently, a group given by a presentation is a quotient group of F (x1 , x2 , . . . , xn ) by
this normal closure of the subgroup generated by the corresponding relations hw̃1 , w̃2 , . . . , w̃m i in
F (x1 , x2 , . . . , xn ).

Exercises for Section 4.4


1. Show that the only homomorphism from D7 to Z3 is trivial.
2. Show that the only homomorphism from S4 to Z3 is trivial.
3. Let ϕ : G → H. Prove that |ϕ(G)| divides |G|.
4. Let F be Q, R, C, or Fp . Prove that GLn (F )/ SLn (F ) ∼
= F ×.
5. Let p be an odd prime and let H be any group of odd order. Prove that the only homomorphism from
Dp to H is trivial.
6. Let G be a group and consider homomorphisms ϕ : G → Z2 .
(a) Prove that if ϕ is surjective, then exactly half of the elements of G map to the identity in Z2 .
(b) Show that there does not exist a surjective homomorphism from A5 to Z2 .
7. Let G1 and G2 be groups and let N1 and N2 be normal subgroups respectively in G1 and G2 . Prove
that N1 ⊕ N2 E G1 ⊕ G2 and that (G1 ⊕ G2 )/(N1 ⊕ N2 ) ∼
= (G1 /N1 ) ⊕ (G2 /N2 ).
8. Let N1 and N2 be normal subgroups of G. Prove that N1 N2 /(N1 ∩ N2 ) ∼
= (N1 N2 /N1 ) ⊕ (N1 N2 /N2 ).
4.5. FUNDAMENTAL THEOREM OF FINITELY GENERATED ABELIAN GROUPS 193

9. Let F = Q, R, C, or Fp where p is prime. Let T2 (F ) be the set of 2 × 2 upper triangular matrices with
a nonzero determinant. We consider T2 (F ) as a group with the operation of multiplication.
n 1 b  o
(a) Prove that U2 (F ) = | b ∈ F is a normal subgroup that is isomorphic to (F, +).
0 1
(b) Prove that the corresponding quotient group is T2 (F )/U2 (F ) = ∼ U (F ) ⊕ U (F ), where U (F ) is
the group of elements with multiplicative inverses in F equipped with multiplication.
10. Let G be a group and let N E G such that |G| and | Aut(N )| are relatively prime. Then N ≤ Z(G).
[Hint: Use Proposition 4.2.15.]
11. Suppose that H and K are distinct subgroups of G, each of index 2. Prove that H ∩ K is a normal
subgroup of G and that G/(H ∩ K) ∼= Z2 ⊕ Z2 .
12. Let p be a prime number and suppose that ordp (|G|) = a. Assume P ≤ G has order pa and let N E G
with ordp (|N |) = b. Prove that |P ∩ N | = pb and |P N/N | = pa−b .
13. Prove all parts of the Fourth Isomorphism Theorem.
14. Let G be the group of isometries of Rn . Prove that the subgroup D of direct isometries is a normal
subgroup and that G/D ∼ = Z2 . [See Definitions 3.9.5 and 3.9.6.]

4.5
Fundamental Theorem of Finitely Generated
Abelian Groups
Early on in the study of groups, we discussed classifications theorems, which are theorems that
list all possible groups with some specific property. The First Isomorphism Theorem leads to the
Fundamental Theorem of Finitely Generated Abelian Groups (abbreviated by FTFGAG), which,
among other things, provides a complete classification of all finite abelian groups. The proof begins
with a study of free abelian groups.

4.5.1 – Free Abelian Groups


The concept of a free abelian group is modeled after certain aspects of linear algebra in Rn . In this
section, we will write abelian groups additively so that nx corresponds to the usual n-fold addition as
defined by (3.3). For any finite set {x1 , x2 , . . . , xr } in an abelian group (G, +) we call any expression
of the form
c1 x1 + c2 x2 + · · · + cr xr ,
with ci ∈ Z, a linear combination of {x1 , x2 , . . . , xr }.

Definition 4.5.1
A subset X ⊆ G of an abelian group is called linearly independent if for every finite subset
{x1 , x2 , . . . , xr } ⊆ X, linear combinations satisfy

c1 x1 + c2 x2 + · · · + cr xr = 0 =⇒ c1 = c2 = · · · = cr = 0. (4.10)

A basis of an abelian group G is a linearly independent subset X that generates G.

The cyclic group Z has {1} as a basis. The direct sum Z ⊕ Z has {(1, 0), (0, 1)} as a basis
since every element (m, n) can be written as m(1, 0) + n(0, 1). However, {(3, 1), (1, 0)} is a basis
as well because (3, 1) − 3(1, 0) = (0, 1), so again h(3, 1), (1, 0)i = Z ⊕ Z and the set is also linearly
independent. In contrast, Z/10Z does not have a basis since for all x ∈ Z/10Z, we have 10x = 0
194 CHAPTER 4. QUOTIENT GROUPS

and 10 6= 0 in Z. Note that a basis cannot contain the identity 0 since then 1 · 0 = 0 is a nontrivial
linear combination of the basis elements that gives 0.

Definition 4.5.2
An abelian group (G, +) is called a free abelian group if it has a basis.

In particular, Z, Z ⊕ Z, and more generally Zr for a positive integer r are free abelian groups. A
free abelian group could have an infinite basis. For example, Z[x] is a free abelian group with basis
{1, x, x2 , . . .} because every polynomial in Z[x] is a (finite) linear combination of the powers of the
variable x. If a free abelian group has an infinite basis, every element must still be a finite linear
combination of basis elements.

Proposition 4.5.3
Let (G, +) be a free abelian group with a basis X. Every element g ∈ G can be expressed
uniquely as a linear combination of elements in X.

Proof. By definition, every element g ∈ G is a linear combination of elements in X. Suppose that

g = c1 x1 + c2 x2 + · · · + cr xr and g = c01 x01 + c02 x02 + · · · + c0s x0s

are two linear combination expressions of the element g. By allowing some ci and some c0j to be 0,
and by taking the union {x1 , x2 , . . . , xr , x01 , x02 , . . . , x0s }, we can assume that r = s and that xi = x0i
for i = 1, 2, . . . , r. Then subtracting these two expressions, gives

0 = (c1 − c01 )x1 + (c2 − c02 )x2 + · · · + (cr − c0r )xr .

Since the set of elements {x1 , x2 , . . . , xr } is linearly independent, ci − c0i = 0 so ci = c0i for all i.
Hence, there is a unique expression for g as a linear combination of elements in X. 

Example 4.5.4. The group (Q, +) is not a free abelian group. Assume that Q does have a basis
X. Assume that X contains at least two elements nonzero ab and dc . Then
a c
(−bc) + (da) = −ac + ac = 0.
b d
This contradicts the condition of linear independence. Now assume that X contains only one element
a a
b 6= 0. Then X does not generate Q because the element 2b ∈ / hXi. We conclude by contradiction
that Q does not have a basis. 4

Theorem 4.5.5
Let G be a nonzero free abelian group with a basis of r elements. Then G is isomorphic to
Z ⊕ Z ⊕ · · · ⊕ Z = Zr .

Proof. Let X = {x1 , x2 , . . . , xr } be a finite basis of G. Consider the function ϕ : Zr → G given by

ϕ(c1 , c2 , . . . , cr ) = c1 x1 + c2 x2 + · · · + cr xr .

This function satisfies

ϕ((c1 , c2 , . . . , cr ) + (d1 , d2 , . . . , dr )) = ϕ(c1 + d1 , c2 + d2 , . . . , cr + dr )


= (c1 + d1 )x1 + (c2 + d2 )x2 + . . . + (cr + dr )xr
= c1 x1 + c2 x2 + . . . + cr xr + d1 x1 + d2 x2 + . . . + dr xr
= ϕ(c1 , c2 , . . . , cr ) + ϕ(d1 , d2 , . . . , dr )
4.5. FUNDAMENTAL THEOREM OF FINITELY GENERATED ABELIAN GROUPS 195

so it is a homomorphism. Since the basis generates G, then ϕ is surjective. Furthermore, since

Ker ϕ = {(c1 , c2 , . . . , cr ) ∈ Zr | c1 x1 + c2 x2 + · · · + cr xr = 0}
= {(0, 0, . . . , 0)},

the homomorphism is also injective. Thus, ϕ is an isomorphism. 

Theorem 4.5.5 leads immediately to this important corollary.

Proposition 4.5.6
Let G be a finitely generated free abelian group. Then every basis of G has the same
number of elements.

Proof. Suppose that G has a basis with r elements. Then G is isomorphic to Zr . The subgroup
2G = {g + g | g ∈ G} is isomorphic to (2Z)r so by a generalization of Exercise 4.4.7,

G/2G = (Z ⊕ Z ⊕ · · · ⊕ Z)/(2Z ⊕ 2Z ⊕ · · · ⊕ 2Z) ∼


= Z2r .

Thus, |G/2G| = 2r . Assume that G also has a finite basis with s 6= r elements. Then |G/2G| =
2s 6= 2r a contradiction.
We must also prove that G cannot also have an infinite basis. Assume that G does have an infinite
basis X. Let x1 , x2 ∈ X. Assume x1 = x2 in G/2G, then x1 − x2 ∈ 2G, so x1 − x2 is a finite linear
combination of elements in X (with even coefficients). In particular, X is not a linearly independent
set, which contradicts that X is a basis. Thus, in the quotient group G/2G, the elements {x | x ∈ X}
are all distinct and hence G/2G is an infinite group. This contradicts the fact that |G/2G| = 2r .
Consequently, if G has a basis of r elements, then every other basis is finite and has r elements.

Definition 4.5.7
If G is a finitely generated free abelian group, then the common number r of elements in a
basis is called the rank . The rank is also called the Betti number of G and is denoted by
β(G).

In our efforts to classify all finitely generated groups, we introduced free groups as a stepping
stone to the general classification theorem (Theorem 4.5.11). We are still faced with a few questions.
As remarked earlier, a free abelian group has no elements of finite order; however, our theorems to
this point do not establish the converse, namely whether a group with no elements of finite order
is free. Furthermore, Theorem 4.5.5 does not tell us much about the possible subgroups of a free
group and this will become a key ingredient in what follows.

Lemma 4.5.8
Let X = {x1 , x2 , . . . , xr } be a basis of a free abelian group G. Let i be an index 1 ≤ i ≤ r
with i 6= j and let t ∈ Z. Then

X 0 = {x1 , . . . , xj−1 , xj + txi , xj+1 , . . . , xr }

is also a basis of G.

Proof. Since xj = (−t)xi + xj + txi , then xj ∈ hX 0 i and so X ⊆ hX 0 i. Since hXi = G ≤ hX 0 i, then


G = hX 0 i so X 0 generates G.
Furthermore, suppose that

c1 x1 + · · · + cj−1 xj−1 + cj (xj + txi ) + cj+1 xj+1 + · · · + cr xr = 0


196 CHAPTER 4. QUOTIENT GROUPS

for some c1 , c2 , . . . , cr ∈ Z. Then since X is a basis, the coefficients of the xk must be 0, so


(
ck = 0 6 i
if k =
ck = −tnj if k = i.
However, since i 6= j, we have cj = 0, so ci = −t 0 = 0. Thus, X 0 is a linearly independent set.
Hence, X 0 is another basis of G. 
The reader may notice the similarity between the new basis described in Lemma 4.5.8 and the
replacement row operation used in the Gauss-Jordan elimination algorithm from linear algebra.

Theorem 4.5.9
Let G be a nonzero free abelian group of finite rank s and let H ≤ G be a nontrivial
subgroup. Then H is a free abelian group of rank t ≤ s. There exists a basis {x1 , x2 , . . . , xs }
for G and positive integers n1 , n2 , . . . , nt , where ni divides ni+1 for all 1 ≤ i ≤ t − 1 such
that {n1 x1 , n2 x2 , . . . , nt xt } is a basis of H.

Proof. We prove the theorem by starting from a basis of G and repeatedly adjusting it using
Lemma 4.5.8.
By the well-ordering of the integers, there exists a minimum value n1 in the set
{c1 ∈ N∗ | c1 y1 + c2 y2 + · · · + cs ys ∈ H for any basis {y1 , y2 , . . . , ys } of G}.
We emphasize that, in the above set, we consider all possible bases {y1 , y2 , . . . , ys } of G. We write
the element that gives this minimum value as z1 = n1 y1 + c2 y2 + · · · + cs ys . By integer division,
for all i ≥ 2, we can write ci = n1 qi + ri with 0 ≤ ri < n1 . Set x1 = y1 + q2 y2 + · · · + qs ys . By
Lemma 4.5.8, {x1 , y2 , . . . , ys } is a basis of G. Furthermore,
z1 = n1 x1 + r2 y2 + · · · + rs ys .
However, since n1 is the least positive coefficient that occurs in any linear combination over any
basis of G and, since 0 ≤ ri < n1 , we have r2 = · · · = rs = 0. So, in fact z1 = n1 x1 .
If {n1 x1 } generates H then we are done and {n1 x1 } is a basis of H. If not, then there is a least
positive value of c2 for linear combinations
z2 = a1 x1 + c2 y2 + · · · + cs ys ∈ H
where y2 , . . . , ys ∈ G such that {x1 , y2 , . . . , ys } is a basis of G. Call this least positive integer n2 .
Note that we must have n1 |a1 , because otherwise, since n1 x1 ∈ H by subtracting a suitable multiple
of n1 x1 from a1 x1 + c2 y2 + · · · + cs ys we would obtain a linear combination of {x1 , y2 , . . . , ys } that
is in H and has a lesser positive coefficient for x1 , which would contradict the minimality of n1 .
Again, if we take the integer division of ci by n2 , ci = n2 qi + ri with 0 ≤ ri < n2 for all i ≥ 3, then
z2 = a1 x1 + n2 (y2 + q3 y3 + · · · + qs ys ) + r3 y3 + · · · + rs ys .
We denote x2 = y2 + q3 y3 + · · · + qs ys . Also, all the ri = 0 because if any ri > 0, it would contradict
the minimality of n2 . Thus, z2 = a1 x1 + n2 x2 ∈ H and also n2 x2 = z2 − (a1 /n1 )n1 x1 ∈ H. By
Lemma 4.5.8, {x1 , x2 , y3 , . . . , ys } is a basis of G. Furthermore, if we consider the integer division of
n2 by n1 , written as n2 = n1 q + r with 0 ≤ r < n1 , then
n1 x1 + n2 x2 = n1 (x1 + qx2 ) + rx2 ∈ H.
But then, since r < n1 , by the minimal positive condition on n1 , we must have r = 0. Hence, n1 | n2 .
If {n1 x2 , n2 x2 } generates H, then we are done because the set is linearly independent since by
construction {x1 , x2 , y3 , . . . , ys } is a basis of G so {x1 , x2 } is linearly independent and {n1 x1 , n2 x2 }
is then a basis of H. The pattern continues and only terminates when it results in a basis
{x1 , . . . , xt , yt+1 , . . . , ys }
of G such that {n1 x1 , n2 x2 , . . . , nt xt } is a basis of H for some positive integers ni such that ni | ni+1
for 1 ≤ i ≤ t − 1. 
4.5. FUNDAMENTAL THEOREM OF FINITELY GENERATED ABELIAN GROUPS 197

This proof is not constructive since it does not provide a procedure to find the ni , which is
necessary to construct the x1 , x2 , and so on. We merely know the ni exist by the well-ordering of
integers. In some instances it is easy to find a basis of the subgroup as in the following example.
Example 4.5.10. Consider the free abelian group G = Z3 and the subgroup H = {(x, y, z) ∈
Z3 | x + 2y + 3z = 0}. If we considered the equation x + 2y + 3z = 0 in Q3 , then the Gauss-
Jordan elimination algorithm gives H as the span of Span({(−2, 1, 0), (−3, 0, 1)}. Taking only integer
multiples of these two vectors does give all points (x, y, z) with y and z taking on every pair of
integers. Hence, {(−2, 1, 0), (−3, 0, 1)} is a basis of H and we see clearly that H is a free abelian
group of rank 2. 4

4.5.2 – Invariant Factors Decomposition


The difficult work in the proof of Theorem 4.5.9 leads to the Fundamental Theorem for Finitely
Generated Abelian Groups.

Theorem 4.5.11 (FTFGAG)


Let G be a finitely generated abelian group. Then G can be written uniquely as

G∼
= Zr ⊕ Zd1 ⊕ Zd2 ⊕ · · · ⊕ Zdk (4.11)

for some nonnegative integers r, d1 , d2 , . . . , dk satisfying di ≥ 2 for all i and di+1 | di for
1 ≤ i ≤ k − 1.

Proof. Since G is finitely generated, there is a finite subset {g1 , g2 , . . . , gs } that generates G. Define
the function h : Zs → G by

h(n1 , n2 , . . . , ns ) = n1 g1 + n2 g2 + · · · + ns gs .

By the same reasoning as the proof of Theorem 4.5.5, h is a surjective homomorphism. Then Ker h
is a subgroup of Zs and by the First Isomorphism Theorem, since h is surjective, Zs /(Ker h) ∼ = G.
By Theorem 4.5.9, there exists a basis {x1 , x2 , . . . , xs } for Zs and positive integers n1 , n2 , . . . , nt ,
where ni divides ni+1 for all 1 ≤ i ≤ t − 1 such that {n1 x1 , n2 x2 , . . . , nt xt } is a basis of Ker h. Then

G∼
= (Z ⊕ Z ⊕ · · · ⊕ Z)/(n1 Z ⊕ · · · ⊕ nt Z ⊕ {0} ⊕ · · · ⊕ {0})

= Zn ⊕ Zn ⊕ · · · ⊕ Zn ⊕ Z ⊕ · · · ⊕ Z.
1 2 t

Now if ni = 1, then Zni ∼


= {0}, the trivial group. Let k be the number of indices such that ni > 1.
The theorem follows after setting di = nt+1−i for 1 ≤ i ≤ k. 

Definition 4.5.12
As with free groups, the integer r is called the rank or the Betti number of G. It is sometimes
denoted by β(G). The integers d1 , d2 , . . . , dk are called the invariant factors of G and the
expression (4.11) is called the invariant factors decomposition of G.

It is interesting to note that the proof of Theorem 4.5.11 is not constructive in the sense that it
does not provide a method to find specific elements in G whose orders are the invariant factors of
G. The invariant factors exist by virtue of the well-ordering principle of the integers.
Theorem 4.5.11 applies to any finitely generated abelian group. However, applied to finite groups,
which are obviously finitely generated, it gives us an effective way to describe all abelian groups of
a given order n. If G is finite, the rank of G is 0. Then we must find all finite sequences of integers
d1 , d2 , . . . , dk such that
• di ≥ 2 for 1 ≤ i ≤ k;
198 CHAPTER 4. QUOTIENT GROUPS

• di+1 | di for 1 ≤ i ≤ k − 1;
• n = d1 d2 · · · dk .
The first two conditions are explicit in the above theorem. The last condition follows from the fact
that di = |xi | where the {x1 , x2 , . . . , xk } is a list of corresponding generators of G. Then every
element in G can be written uniquely as

g = α1 x1 + α2 x2 + · · · + αk xk

for 0 ≤ αi < di . Thus, the order of the group is n = d1 d2 · · · dk .


We list a few examples of abelian groups of a given order where we find the invariant factors
decomposition. (Note that in the notation Zd for a cyclic group, we assume the operation is multi-
plication but that is irrelevant to the theorem.)
Example 4.5.13. All the abelian groups of order 16 are
Z16 , Z8 ⊕ Z2 , Z4 ⊕ Z4 , Z4 ⊕ Z2 ⊕ Z2 , Z2 ⊕ Z2 ⊕ Z2 ⊕ Z2 . 4

Example 4.5.14. All the abelian groups of order 24 are


Z24 , Z12 ⊕ Z2 , Z6 ⊕ Z2 ⊕ Z2 . 4

Example 4.5.15. All the abelian groups of order 360 are


Z360 , Z180 ⊕ Z2 , Z120 ⊕ Z3 , Z60 ⊕ Z6 , Z90 ⊕ Z2 ⊕ Z2 , Z30 ⊕ Z6 ⊕ Z2 . 4

4.5.3 – Elementary Divisors Decomposition


Though Theorem 4.5.11 gives a complete classification of finite abelian groups, finding all possible
sequences of invariant factors that multiply to a given order is not always easy, especially if the order
is high. There is an alternative characterization that is often easier to find, involving the so-called
elementary divisors. The added benefit of this alternative decomposition is that it leads to a formula
for the number of abelian groups of a given order.
To obtain the elementary divisors decomposition, we first recall Exercise 3.7.19 that states that
if gcd(m, n) = 1, then Zm ⊕ Zn ∼ = Zmn . We need one more lemma.

Lemma 4.5.16
Let q be a prime number and let G be an abelian group of order q m . Then G is uniquely
isomorphic to a group of the form

Zqα1 ⊕ Zqα2 ⊕ · · · ⊕ Zqαk

such that α1 ≥ α2 ≥ · · · ≥ αk ≥ 1 and α1 + α2 + · · · + αk = m.

Proof. (This follows as a corollary to Theorem 4.5.11 so we leave the proof as an exercise for the
reader. See Exercise 4.5.13.) 

Definition 4.5.17
Let m be a positive integer. Any decreasing sequence of the form α1 ≥ α2 ≥ · · · ≥ αk ≥ 1
such that α1 + α2 + · · · + αk = m is called a partition of m. The partition function,
sometimes denoted by p(m), is the number of partitions of m. The partition function is
often extended to nonnegative integers by assigning p(0) = 1.
4.5. FUNDAMENTAL THEOREM OF FINITELY GENERATED ABELIAN GROUPS 199

According to Lemma 4.5.16, if G is an abelian group of order q m for some prime q, then there
are p(m) possibilities for G, each corresponding to a partition of m.

Theorem 4.5.18
Let G be a finite abelian group of order n > 1 and let n = q1β1 q2β2 · · · qtβt be the prime
factorization of n. Then G can be written in a unique way as

G∼
= A1 ⊕ A2 ⊕ · · · ⊕ At where |Ai | = qiβi

and for each A ∈ {A1 , A2 , . . . , At } with |A| = q m ,

A∼
= Zq α1 ⊕ Zq α2 ⊕ · · · ⊕ Zq αl

with α1 ≥ α2 ≥ · · · ≥ αl ≥ 1 and α1 + α2 + · · · + αl = m.

Proof. According to Theorem 4.5.11,

G∼
= Zd1 ⊕ Zd2 ⊕ · · · ⊕ Zdk

where di+1 | di for i = 1, 2, . . . , k − 1 and n = d1 d2 · · · dk . Consider the k × t matrix of nonzero


integers (αij ) defined such that
t
α
Y
di = q1αi1 q2αi2 · · · qtαit = qj ij
j=1

is the prime factorization of di , where by αij = 0 we mean that qj is not a prime factor of di .
The condition di+1 | di implies that for each j, the exponents on qj satisfy αi+1,j ≤ αij . The
condition that n = d1 d2 · · · dk implies that for each j,

βj = α1j + α2j + · · · + αsj .

Note that gcd(qja , qjb0 ) = 1 for any nonnegative integers a and b if j 6= j 0 . Therefore,

Zdi ∼
= Zq1αi1 ⊕ Zq2αi2 ⊕ · · · ⊕ Zqtαit .

The theorem follows from the fact that G1 ⊕ G2 ∼


= G2 ⊕ G1 for two groups G1 and G2 . 

Definition 4.5.19
The integers qiαi that arise in the expression of G described in the above theorem are called
the elementary divisors of G. The expression in Theorem 4.5.18 is the elementary divisors
decomposition.

We use the terminology “elementary divisors” because every cyclic group Zpα where p is a prime
number is not isomorphic to a direct sum of any smaller cyclic groups.
As a first example, notice that since 16 is a prime power, then because of Lemma 4.5.16, the
list of groups given in Example 4.5.13 provides both the elementary divisor decompositions and the
invariant factor decompositions of all 5 abelian groups of order 16.
Example 4.5.20. Let n = 2160 = 24 33 5. We find all abelian groups of order 2160. We remark
that 3 has three partitions, namely

3,
2 + 1,
1 + 1 + 1,
200 CHAPTER 4. QUOTIENT GROUPS

while 4 has five partitions, namely


4,
3 + 1,
2 + 2,
2 + 1 + 1,
1 + 1 + 1 + 1.
Then A1 , which corresponds to the prime factor q1 = 2, has the following possibilities
Z16 , Z8 ⊕ Z2 , Z4 ⊕ Z4 , Z4 ⊕ Z2 ⊕ Z2 , or Z2 ⊕ Z2 ⊕ Z2 ⊕ Z2 .
The component A2 , which corresponds to the prime factor q2 = 3, has the following possibilities
Z27 , Z9 ⊕ Z3 , or Z3 ⊕ Z3 ⊕ Z3 .
The component A3 has only one possibility, namely A3 ∼ = Z5 .
We can now list all 15 abelian groups of order 2160. The following list shows the elementary
divisor decomposition on the left and the isomorphic invariant factor decomposition on the right.
Z16 ⊕ Z27 ⊕ Z5 ∼
= Z2160
Z16 ⊕ Z9 ⊕ Z3 ⊕ Z5 ∼
= Z720 ⊕ Z3
Z16 ⊕ Z3 ⊕ Z3 ⊕ Z3 ⊕ Z5 ∼
= Z240 ⊕ Z3 ⊕ Z3

Z8 ⊕ Z2 ⊕ Z27 ⊕ Z5 ∼
= Z1080 ⊕ Z2
Z8 ⊕ Z2 ⊕ Z9 ⊕ Z3 ⊕ Z5 ∼
= Z360 ⊕ Z6
Z8 ⊕ Z2 ⊕ Z3 ⊕ Z3 ⊕ Z3 ⊕ Z5 ∼
= Z120 ⊕ Z6 ⊕ Z3

Z4 ⊕ Z4 ⊕ Z27 ⊕ Z5 ∼
= Z540 ⊕ Z4
∼ Z180 ⊕ Z12
Z4 ⊕ Z4 ⊕ Z9 ⊕ Z3 ⊕ Z5 =
Z4 ⊕ Z4 ⊕ Z3 ⊕ Z3 ⊕ Z3 ⊕ Z5 ∼
= Z60 ⊕ Z12 ⊕ Z3

Z4 ⊕ Z2 ⊕ Z2 ⊕ Z27 ⊕ Z5 ∼
= Z540 ⊕ Z2 ⊕ Z2
Z4 ⊕ Z2 ⊕ Z2 ⊕ Z9 ⊕ Z3 ⊕ Z5 ∼
= Z180 ⊕ Z6 ⊕ Z2
Z4 ⊕ Z2 ⊕ Z2 ⊕ Z3 ⊕ Z3 ⊕ Z3 ⊕ Z5 ∼
= Z60 ⊕ Z6 ⊕ Z6

Z2 ⊕ Z2 ⊕ Z2 ⊕ Z2 ⊕ Z27 ⊕ Z5 ∼
= Z270 ⊕ Z2 ⊕ Z2 ⊕ Z2

Z2 ⊕ Z2 ⊕ Z2 ⊕ Z2 ⊕ Z9 ⊕ Z3 ⊕ Z5 = Z90 ⊕ Z6 ⊕ Z2 ⊕ Z2
Z2 ⊕ Z2 ⊕ Z2 ⊕ Z2 ⊕ Z3 ⊕ Z3 ⊕ Z3 ⊕ Z5 ∼
= Z30 ⊕ Z6 ⊕ Z6 ⊕ Z2 4
When listing out each possible isomorphism type for a group of given order, each group listed
according to the invariant factors decomposition corresponds to a unique group in the list according
to elementary divisors. The proof of Theorem 4.5.18 describes how to go from the invariant factors
decomposition to the corresponding elementary divisors decomposition. To go in the opposite di-
rection, collect the highest prime powers corresponding to each prime to get Zd1 ; collect the second
highest prime powers corresponding to each prime to get Zd2 ; and so forth.
We often refer to Theorems 4.5.11 and 4.5.18 collectively as the Fundamental Theorem of Finitely
Generated Abelian Groups. The two theorems simply provide alternative ways to uniquely describe
the torsion part of the group.
The power of the Fundamental Theorem of Finitely Generated Abelian Groups (FTFGAG), and
of classification theorems in general, is that in many applications of group theory, we encounter
abelian groups for which we naturally know the order. Then the FTFGAG gives us a list of possible
isomorphism types.
4.5. FUNDAMENTAL THEOREM OF FINITELY GENERATED ABELIAN GROUPS 201

Example 4.5.21. Consider the group U (20) of units in modular arithmetic modulo 20. We know
that this multiplicative group is abelian and that it has order φ(20) = φ(4)φ(5) = 8. According to
FTFGAG, U (20) may be isomorphic to Z8 , Z4 ⊕ Z2 , or Z2 ⊕ Z2 ⊕ Z2 . To determine which of these
possibilities it may be, consider powers of some elements. First, lets consider 3:
1 2 3 4
3 = 3, 3 = 9, 3 = 7, 3 = 1.
This actually gives us enough information to determine the isomorphism type. The element 3 has
order 4 so U (20) 6∼ = Z2 ⊕ Z2 ⊕ Z2 . Furthermore, both 9 and 19 = −1 have order 2. In Z8 , there
is only one element of order 2. Hence, U (20) 6∼
= Z8 . By FTFGAG and elimination of possibilities,
U (20) ∼ Z
= 4 ⊕ Z 2 . 4
Example 4.5.22. Consider the group U (46). This is an abelian group of order φ(46) = φ(2)φ(23) =
22. Using the elementary divisors decomposition of FTFGAG, we see that the only abelian group
of order 22 is Z22 . Hence, U (46) is cyclic. 4

Corollary 4.5.23
Let n > 1 be an integer and let n = q1β1 q2β2 · · · qtβt be the prime factorization of n. There
are
p(β1 )p(β2 ) · · · p(βt )
abelian groups of order n, where p is the partition function.

Proof. (Left as an exercise for the reader. See Exercise 4.5.22.) 

4.5.4 – A Few Comments on Partitions of Integers (Optional)


Partitions of integers form a vast area of research and find applications in many areas of higher
mathematics. See [35, Chapter XIX] or [2] for an introduction to partitions of integers. (The texts
[27, 42] give the interested reader a glimpse of how partitions arise in other branches of mathematics
besides number theory.)
Consider the partition 5 + 2 + 2 + 1 of 10. The most common way of writing a partition is
to simply record it as a finite sequence α = (5, 2, 2, 1). It is also common to define a partition
without reference to a positive integer n as a finite sequence α = (α1 , α2 , . . . , αk ) whose terms
satisfy α1 ≥ α2 ≥ · · · ≥ αk ≥ 1. We then use the notation
def
|α| = α1 + α2 + · · · + αk
and call this the content of α. The integer k is the length (or the number of parts) of α and is
written `(α) = k. In the terminology of partitions, each summand αi of a partition is called a part
of α.
A partition is often represented visually by its Young diagram in which each part αi is represented
by αi boxes that are left aligned, descending on the page as i increases. For example, the Young
diagram for (5, 2, 2, 1) is:

The main diagonal of a Young diagram consists of the boxes descending regularly starting with
the top left corner of the diagram. For (5, 2, 2, 1), the main diagonal only has two boxes.


202 CHAPTER 4. QUOTIENT GROUPS

The conjugate α is the partition α0 obtained by reflecting the Young diagram of α through its
main diagonal. Algebraically, the values of the conjugate partition are

αj0 = {1 ≤ i ≤ k | λi ≥ j} .

With α = (5, 2, 2, 1), the conjugate partition is α0 = (4, 3, 1, 1, 1). The intuition of the Young diagram
makes it clear that |α0 | = |α|, so if α is a partition of n, then so is α0 . In Young diagram notation,
we write:

λ= =⇒ λ0 =
We have already seen a place where partitions of integers arise naturally. Recall that the conju-
gacy classes in Sn correspond to all permutations with a given cycle type. Each cycle type corre-
sponds uniquely to a partition of n. For example, in S6 , the cycle type (a b c)(d e) corresponds to the
partition 3 + 2 + 1. More generally, each partition α with |α| = n corresponds to the conjugacy class
of permutations which, when written in cycle notations form, have cycles of length α1 , α2 , . . . , αk .

Exercises for Section 4.5


1. List all the possible invariant factors for the following integers: (a) 45; (b) 480; (c) 900.
2. List all the possible elementary divisors for the following integers: (a) 45; (b) 480; (c) 900.
3. List all the nonisomorphic abelian groups of order 945 both in invariant factors form and elementary
divisors form. Show which decompositions correspond to which in the different forms.
4. List all the nonisomorphic abelian groups of order 864 both in invariant factors form and elementary
divisors form. Show which decompositions correspond to which in the different forms.
5. What is the smallest value of n such that there exist 5 abelian groups of order n? List out the specific
groups.
6. What is the smallest value of n such that there exist 4 abelian groups of order n? List out the specific
groups.
7. What is the smallest value of n such that there exist at least 13 abelian groups of order n? List out
the specific groups.
8. Suppose that an abelian group has order 100 and has at least 2 elements of order 2. What are the
possible groups that satisfy these conditions?
9. List all the abelian groups of order 72 that contain an element of order 6.
10. Find the isomorphism type (by invariant factors decomposition) of U (21).
11. Find the isomorphism type (by invariant factors decomposition) of U (27).
12. Suppose that G is an abelian group of order 176 such that the subgroup H = {g 2 | g ∈ G} has order
22. What are the possible groups G with this property?
13. Prove Lemma 4.5.16.
14. Let p be a prime number. For all abelian groups of order p3 , list how many elements there are of each
order.
15. Let p and q be distinct prime numbers. For all abelian groups of order p2 q 2 , list how many elements
there are of each order.
16. Find all integers n such that there exists a unique abelian group of order n.
17. Let α be a partition of an integer and denote by Gα the group

Gα = Z2α1 ⊕ Z2α2 ⊕ · · · ⊕ Z2α`(α) .


0
(a) Prove that Gα contains 2`(α) − 1 = 2α1 − 1 elements of order 2.
(b) Prove that Gα contains 2|α| − 2|α|−m elements of order 2α1 , where α1 appears m times in the
partition α. [Hint: Use Inclusion-Exclusion.]
4.5. FUNDAMENTAL THEOREM OF FINITELY GENERATED ABELIAN GROUPS 203

18. Let G = hx, y | x12 = y 18 = 1, xy = yxi. Consider the subgroup H = hx4 y −6 i. Find the isomorphism
type of G/H.
19. Let G = (Z/12Z)2 and let H = {(a, b) | 2a + 3b = 0}. Determine the isomorphism type of G/H.
20. Let G = (Z/12Z)2 and let H = {(a, b) | a + 5b = 0}. Determine the isomorphism type of G/H.
21. Prove Cauchy’s Theorem for abelian groups. In other words, let G be an abelian group and prove
that if p is a prime number dividing |G| then G contains an element of order p.
22. Prove Corollary 4.5.23.
23. Let p be a prime number. Prove that Aut(Zp ⊕ Zp ) ∼
= GL2 (Fp ).
24. Let G be an abelian group. Prove that Aut(G) is abelian if and only if G is cyclic.
25. What is the first integer n such that there exist two distinct partitions α and β of n such that α0 = α
and β 0 = β?
26. Let p(n) be the partition function of n. Prove that as power series
∞ ∞  
X Y 1
p(n)xn = .
n=0
1 − xk
k=1

1
[Hint: = 1 + y + y 2 + y 3 + · · · .]
1−y
27. Find the error in the proof of the following erroneous proposition.

Proposition: Let G be an abelian group and let A be any subgroup. Then G ∼


= A ⊕ G/A.

Proof: Since G is abelian, A E G. Consider the function h : G ⊕ A → G defined by


h((g, a)) = ga. Obviously, this function is surjective. Furthermore, since G is abelian,

h((g1 , a1 )(g2 , a2 )) = h((g1 g2 , a1 a2 )) = g1 g2 a1 a2 = g1 a1 g2 a2 = h((g1 , a1 ))h((g2 , a2 )),

and so h is a homomorphism. By the First Isomorphism Theorem, (G ⊕ A)/ Ker h ∼


= G. It
is easy to see that the kernel of h is the subgroup

Ker h = {(a, a−1 ) | a ∈ A}.

This kernel is a subgroup of G ⊕ A that is isomorphic to A but embedded in a nontrivial


manner. Two cosets of Ker h are equal when

(g1 , a1 ) Ker h = (g2 , a2 ) Ker h ⇒ (g2−1 g1 , a−1


2 a1 ) ∈ Ker h

⇒ g2−1 g1 a−1
2 a1 = 1

⇒ g2−1 g1 ∈ A
⇒ g2 A = g1 A.

This shows that the association ϕ : G/A ⊕ A → (G ⊕ A)/ Ker h defined by

ϕ(gA, a) = (g, a) Ker h

is a function. Furthermore, it is easy to check that it is a homomorphism and bijective. We


conclude that G ∼= (G ⊕ A)/ Ker h ∼ = G/A ⊕ A and the proposition follows.

28. Prove that the set of functions from N to Z, denoted Fun(N, Z), equipped with function addition + is
not a free abelian group.
29. Consider the subgroup

H = {(x1 , x2 , x3 ) ∈ Z3 | 6 divides 2x1 + 3x2 + 4x3 }

of the free abelian group Z3 . Find a basis of H.


204 CHAPTER 4. QUOTIENT GROUPS

4.6
Projects
Project I. Fermat’s Theorem on Matrix Groups. Consider the family of groups G =
GLn (Fp ), where n is some positive integer with n ≥ 2 and p is a prime number. Lagrange’s
Theorem ensures that for all matrices A ∈ G, we have A|G| . Is |G| the smallest integer k such
that Ak = I for all A ∈ G? If not, try to find

k(n, p) = min{k ∈ N∗ | Ak = I for all A ∈ GL2 (Fp )}.

[Use a computer algebra system to assist with the calculations. For example, Maple has the
package LinearAlgebra[Modular] for working with matrices with coefficients in Z/nZ.]
Project II. Orders of Elements in Abelian p-Groups. Let α be a partition of an integer of
length r and denote by G2,α the group

G2,α = Z2α1 ⊕ Z2α2 ⊕ · · · ⊕ Z2αr .

Try to find a formula for how many elements G2,α contains of order 2k for all integers k. Also
study the same question for similar groups of the form Gp,α , where p is prime.
Project III. Escher and Symmetry. Consider some of M. C. Escher’s artwork that depicts
interesting tessellations of the plane. For each of these, consider their corresponding wallpaper
group E.
(1) Describe a set of natural (simple) generators for E.
(2) Write down relations between the generators of E and thus give a presentation of E.
(3) Find some normal subgroups N of E, determine E/N , and explain geometrically why N
is normal and what information is contained in E/N .
(4) Describe some subgroups of E that are not normal and explain geometrically why they
are not normal.
(5) Find an Escher tessellation of the Poincaré disk and find generators and relations for that
group.
[For this project, you should feel free to look up Escher art and information about the Poincaré
disk but nothing else beyond that.]
Project IV. The Special Projective Group PSL2 (F5 ). In Exercise 4.2.27, we proved that A5
is simple and we know that it has order 60. In Example 4.3.11, we introduced the projective
linear groups. Intuitively speaking, PSLn (Fp ), where p is prime, “removes” from GLn (Fp )
normal subgroups that are obvious. A quick calculation (that you should do) shows that
PSL2 (F5 ) has order 60. For this project, consider the following questions. Is PSL2 (F5 ) simple?
What are conjugacy classes in PSL2 (F5 )? Is PSL2 (F5 ) isomorphic to A5 ? Can you generalize
any of these investigations to other n and other p?
[Results about projective linear groups are well-known. The value of the project consists not
in doing some Internet search and reporting on the work of other people but attempting to
discover/prove things yourself.]
Project V. Quotient Groups and Cayley Graphs. Let G be a group and let N be a subgroup.
Suppose that G has a presentation with a set of generators {g1 , g2 , . . . , gs }. Then we know
that {g 1 , g 2 , . . . , g s }, where g i = gi N is a generating set of G/N . Interpret geometrically how
the Cayley graph of G/N with generators {g 1 , g 2 , . . . , g s } is related the Cayley graph of G
with generators {g1 , g2 , . . . , gs }. Illustrate this relationship with examples that can be realized
as polyhedra in R3 .
4.6. PROJECTS 205

Project VI. Embeddings of Sn . It is not hard to show (and you should do it) that for every
integer n, there is an embedding of Sn in GLn (Fp ) for all primes p. However, the fact that
GL2 (F2 ) ∼
= S3 gives an example where Sn is embedded in (in this case actually isomorphic to)
a group GLk (Fp ), where k < n. Investigate for what k < n and what p it may be possible or
is impossible to embed Sn into GLk (Fp ). Illustrate some embeddings with specific examples.

Project VII. A Visualization of a Group. This project describes a way (useful or not is up
to you to decide) to visualize some properties of a group with two generators. Suppose that
G = ha, bi and that there may be relations between a and b. We use the first quadrant of the
xy-plane. Start with the identity 1 at the origin. Then, translating a point by (1, 0) corresponds
to operating on the left by a and translating a point by (0, 1) corresponds to operating on the
left by b. Study this visualization and its usefulness for groups (or subgroups) generated by
two elements. In particular, will every element arise as some point in the first quadrant? Can
we “see” subgroups or normal subgroups in this visualization? Are cyclic subgroups visible?
Are quotient groups visible?
Project VIII. Pascal’s Triangle for Groups. The entries of Pascal’s (usual) Triangle are
nonnegative integers. If we consider Pascal’s Triangle in modular arithmetic Z/nZ, then all
the operations in the triangle occur in the group (Z/nZ, +). We generalize Pascal’s Modular
Triangle to a Pascal’s Triangle for a group in the following way. Let a and b be elements in
a group G. Start with a on the first row. On a diagonal going down from 1 to the left put
a’s. On a diagonal going down and to the right put b’s. Then, in rows below, similar to the
constructing of Pascal’s Triangle, we fill in the rows by operating the element above to the left
with the element above and to the right. The following diagram shows the first few rows.

a
a b
a ab b
a a2 b ab2 b
a a3 b a2 bab2 ab3 b

One way to visualize this Pascal’s Triangle for a group is to color code boxes based on the
group element. Write a program that draws the color-coded Pascal Triangle for a small group
and certain elements a and b up to any number of rows you specify. Then, explore any patterns
that emerge in Pascal’s Triangle for the group. (Try, for example, cyclic groups, a dihedral
group, or Z2 ⊕ Z2 .) Are there any patterns related to subgroup structure or quotient group
structure?
[This project was inspired by the article [7].]
5. Rings

Depending on the order of chapters covered, the reader will have encountered the following algebraic
structures so far: sets, posets, real vector spaces, groups, monoids, semigroups, and representations
of a group G. We now turn to the study of another algebraic structure, that of rings.
As with groups, rings possess a surprising number of properties and considerable internal struc-
ture. Furthermore, they arise in many natural contexts within mathematics. However, rings possess
some properties and applications with no equivalents in group theory while some issues that are im-
portant in group theory have no equivalent or are no longer interesting in ring theory. For example,
in ring theory, it is common to define various classes of rings in order to expedite the statement of
important theorems. We will again follow the outline provided in the preface to approach algebraic
structures.
Section 5.1 defines rings, gives a few initial examples, and points out some novel issues to consider
in rings. As a place to gather many more examples, Section 5.2 explores a few common classes of
rings. Section 5.3 specifically considers the examples of matrix rings. Section 5.4 introduces the
concept of ring homomorphism as functions that preserve the ring structure.
Section 5.5 defines ideals of a ring, discusses convenient ways to describe ideals, and presents
operations on ideals. Section 5.6 discusses quotient rings and presents two important applications
of quotient rings: the Chinese Remainder Theorem and the isomorphism theorems. Section 5.7
explores the concepts of maximal and prime ideals, two generalizations to commutative rings of the
notion of primeness in the integers.

5.1
Introduction to Rings
The arithmetic on the integers carries more than just the structure of a group. The arithmetic of
integers involves both addition and multiplication. (Subtraction and division simply involve the
inverses of addition and multiplication, so we do not need to distinguish them.) Rings have two
operations that satisfy a certain set of axioms inspired by some properties of integers. However, the
ring axioms are loose enough to include many other algebraic contexts that possess both an addition
and a multiplication and that differ considerably from the integers.

5.1.1 – Rings Axioms and First Examples

Definition 5.1.1
A ring is a triple (R, +, ×) where R is a set and + and × are binary operations on R that
satisfy the following axioms:
(1) (R, +) is an abelian group.
(2) × is associative.
(3) × is distributive over +, i.e., for all a, b, c ∈ R:

(a) (a + b) × c = a × c + b × c (right-distributivity);
(b) a × (b + c) = a × b + a × c (left-distributivity).

207
208 CHAPTER 5. RINGS

As in group theory, we will often simply refer to the ring by R if there is no confusion about
what the operations might be. In an abstract ring, we will denote the additive identity by 0 and
refer to it as the “zero” of the ring. The additive inverse of a is denoted by −a. As with typical
algebra over the reals, we will often write the multiplication as ab instead of a × b. Furthermore, if
n ∈ N and a ∈ R, then n · a represents a added to itself n times,

n times
def
z }| {
n · a = a + a + · · · + a.

We extend this notation to all integers by defining 0 · a = 0 and, if n > 0, then (−n) · a = −(n · a).

Definition 5.1.2
• A ring (R, +, ×) is said to have an identity, denoted by 1, if there is an element 1 ∈ R
such that 1 × a = a × 1 = a for all a ∈ R.
• A ring is said to be ring!commutative if × is commutative.

Since a ring always possesses an additive identity, when one simply says “a ring with identity,”
one refers to the multiplicative identity, the existence of which is not required by the axioms.
It is possible for the multiplicative identity 1 of a ring to be equal to the additive identity 0.
However, according to part (1) in the following proposition, the ring would then consist of just one
element, namely 0. This case serves as an exception to many theorems we would like to state about
rings with identity. Consequently, since this is not a particularly interesting case, we will often refer
to “a ring with identity 1 6= 0” to denote a ring with identity but excluding the case in which 1 6= 0.
We will soon introduce a number of elementary examples of rings. However, before we do so,
we prove the following proposition that holds for all rings. Many of these properties, as applied to
integers and real numbers, are rules that elementary school children learn early on.

Proposition 5.1.3
Let R be a ring.
(1) ∀a ∈ R, 0a = a0 = 0.
(2) ∀a, b ∈ R, (−a)b = a(−b) = −(ab).

(3) ∀a, b ∈ R, (−a)(−b) = ab.


(4) If R has an identity 1, then it is unique. Furthermore, −a = (−1)a.

Proof. For (1), note that 0 + 0 = 0 since it is the additive identity. By distributivity,

a0 = a(0 + 0) = a0 + a0.

Adding −(a0) to both sides of this equation, we deduce that 0 = a0. A similar reasoning holds for
0a = 0.
For (2), note that 0 = 0b = (a + (−a))b = ab + (−a)b. Adding −(ab) to both sides, we deduce
that (−a)b = −(ab). A similar reasoning holds for a(−b) = −(ab).
An application of (2) twice gives (−a)(−b) = −(a(−b)) = −(−(ab)) = ab.
Finally, for part (4), suppose that e1 and e2 satisfy the axioms of an identity. Then e1 = e1 e2
since e2 is an identity but e2 = e1 e2 since e1 is an identity. Thus, e1 = e1 e2 = e2 . Furthermore,
1a = a by definition so
0 = 0a = (1 + (−1))a = a + (−1)a.

Then adding −a to both sides of this equation produces the result. 


5.1. INTRODUCTION TO RINGS 209

In subsequent sections, we encounter many examples of rings and see a number of methods to
define new rings from old ones. In this first section, however, we just introduce a few basic examples
of rings.
Example 5.1.4. The triple (Z, +, ×) is a commutative ring. Note that (Z − {0}, ×) is not a group.
In fact, the only elements in Z that have multiplicative inverses are 1 and−1. 4

Example 5.1.5. The sets Q, R, and C with their usual operations of + and × form commutative
rings. In each of these, all nonzero elements have multiplicative inverses. 4

Example 5.1.6 (Modular Arithmetic). For every integer n ≥ 2, the context of modular arith-
metic, namely the triple (Z/nZ, +, ×) forms a ring. 4

In a ring R with an identity 1 6= 0, by distributivity n · a = (n · 1)a for all n ∈ N∗ and all a ∈ R.


The elements n · 1 are important elements in R. However, as the example of modular arithmetic
illustrates, n · 1 could be 0. This leads to a fundamental property of rings with identity 1 6= 0.

Definition 5.1.7
Let R be a ring with identity 1 6= 0. The characteristic of R is the smallest positive integer
n such that n · 1 = 0. If such a positive integer does not exist, the characteristic of R is
said to be 0. The characteristic of a ring is often denoted by char(R).

As elementary examples, the characteristic of Z/nZ is n and the characteristic of Z is 0.


Example 5.1.8 (Quaternions). After the discovery complex of numbers and the development of
the theory of complex analysis, mathematicians looked for other algebraic structures that extended
the complex numbers. In this sense, they sought vector spaces over R that possessed a multiplication
with nice properties, just as C has. In 1843, Hamilton defined the set of quaternions as

H = {a + bi + cj + dk | a, b, c, d ∈ R},

where elements add like vectors with {1, i, j, k} acting as a basis. He defined multiplication × on
H where arbitrary elements must satisfy distributivity and the elements 1, i, j, k multiply together
as they do in the quaternion group Q8 . The triple (H, +, ×) is a ring. (We have not checked
associativity but we leave this as an exercise for the reader. See Exercise 5.1.14.) It is obvious that
H is not commutative since ij = k, whereas ji = −k.
To illustrate a few simple calculations, consider for example the elements α = 2 − 3i + 2k and
β = 5 + 4i − 4j. We have

α + β = 7 + i − 4j + 2k,
αβ = (2 − 3i + 2k)(5 + 4i − 4j)
= 10 + 8i − 8j − 15i − 12i2 + 12ij + 10k + 8ki − 8kj
= 10 + 8i − 8j − 15i − 12(−1) + 12k + 10k + 8j − 8(−i)
= 22 + i + 22k,
βα = (5 + 4i − 4j)(2 − 3i + 2k)
= 10 − 15i + 10k + 8i − 12i2 + 8ik − 8j + 12ji − 8jk
= 10 − 15i + 10k + 8i − 12(−1) + 8(−j) − 8j + 12(−k) − 8i
= 22 − 15i − 16j − 2k.

It is vital that we do not change the order of the quaternion basis elements in any product, in
particular when applying distributivity. Hence, (4i)(2k) = 8ik = −8j, while (2k)(4i) = 8ki = 8j. 4

Example 5.1.9 (Ring of Functions). Let I be an interval of real numbers and let Fun(I, R) be
the set of all functions from I to R. Equipped with the usual addition and multiplication of functions,
210 CHAPTER 5. RINGS

Fun(I, R) is a commutative ring. The properties of a ring are inherited from R. In contrast, Fun(I, R)
is not a ring when equipped with addition and composition because the axiom of distributivity fails.
For example, consider the three functions f (x) = x + 1, g(x) = x2 , and h(x) = x3 . Then

(f ◦ (g + h))(x) = f (x2 + x3 ) + 1 = x3 + x2 + 1
(f ◦ g)(x) + (f ◦ h)(x) = x2 + 1 + x3 + 1 = x3 + x2 + 2. 4

Definition 5.1.10 (Direct Sum)


Let (R1 , +1 , ×1 ) and (R2 , +2 , ×2 ) be two rings. The direct sum of the two rings is the triple
(R1 × R2 , +, ×), where + and × are defined as

(a, b) + (c, d) = (a +1 c, b +2 d)
(a, b) × (c, d) = (a ×1 c, b ×2 d).

The direct sum of the rings is denoted by R1 ⊕ R2 .

5.1.2 – Units and Zero Divisors


Modular arithmetic presented examples of arithmetic with some properties that did not arise in
the arithmetic in Z or in Q. In particular, consider the examples of Z/5Z and Z/6Z introduced
in Example 2.2.6. There are qualitative differences between some of the arithmetic properties in
Z/5Z and in Z/6Z. These differences and more are common in ring theory. We introduce some key
terminology.

Definition 5.1.11
Let R be a ring.
• A nonzero element r ∈ R is called a zero divisor if there exists s ∈ R − {0} such that
rs = 0 or sr = 0.
• Assume R has an identity 1 6= 0. An element of u ∈ R is called a unit if it has a
multiplicative identity, i.e., ∃v ∈ R, such that uv = vu = 1. The element v is often
denoted u−1 . The set of units in R is denoted by U (R).

Note that the identity 1 is itself a unit, but that the 0 element is not a zero divisor. This lack of
symmetry in the definitions may seem unappealing but this distinction turns out to be useful in all
theorems that discuss units and zero divisors.
The notation U (R) is reminiscent of the notation U (n) as the set of units in Z/nZ. In the ring
(Z/nZ, +, ×), every nonzero element is either a unit or a zero divisor. Proposition 2.2.9 established
that the units in Z/nZ are elements a such that gcd(a, n) = 1. Now if gcd(a, n) = d 6= 1, then there
exists some k ∈ Z such that ak = n`. Then in Z/nZ we have ak = 0.
In an arbitrary ring, it is not in general true that every nonzero element is either a unit or a zero
divisor. We need look no further than the integers. The units in the integers are U (Z) = {−1, 1},
and all the elements greater than 1 in absolute value are neither units nor zero divisors. On the
other hand, as the following proposition shows, no element can be both.

Proposition 5.1.12
Let R be a ring with identity 1 6= 0. The set of units and the set of zero divisors in a ring
are mutually exclusive.

Proof. Assume that a is both a unit and a zero divisor. Then there exists b ∈ R − {0} such that
ba = 0 or ab = 0. Assume without loss of generality that ba = 0. There also exists c ∈ R such that
5.1. INTRODUCTION TO RINGS 211

ac = 1. Then
b = b(ac) = (ba)c = 0c = 0.

However, this is a contradiction since b 6= 0. 

As mentioned above, the set of units contains the multiplicative identity. Furthermore, by
definition, every element in U (R) has a multiplicative inverse. This leads to the simple remark that
we phrase as a proposition.

Proposition 5.1.13
Let R be a ring with identity 1 6= 0. Then U (R) is a group.

Proposition 5.1.14
Let R1 and R2 be rings each with an identity 1 6= 0. Then, R1 ⊕ R2 has an identity (1, 1)
and, as an isomorphism of groups,

U (R1 ⊕ R2 ) ∼
= U (R2 ) ⊕ U (R2 ).

Proof. (Left as an exercise for the reader. See Exercise 5.1.20.) 

The following definitions refer to elements with specific properties related to their powers. Prop-
erties of such ring elements are studied in the exercises.

Definition 5.1.15
Let R be a ring. An element a ∈ R is called nilpotent if there exists a positive integer k
such that ak = 0. The subset of nilpotent elements in R is denoted by N (R).

Definition 5.1.16
Let R be a ring. An element a ∈ R is called idempotent if a2 = a.

5.1.3 – Integral Domains, Division Rings, Fields


It is common in ring theory to define classes of rings in which the elements possess certain properties.
Then it is convenient to state theorems for a particular class of rings. In fact, we have already defined
the class of commutative rings. However, many classes possess particular terminology evocative of
their properties. We illustrate this common habit by already introducing three important classes of
rings defined in reference to the existence of units and zero divisors.

Definition 5.1.17
A ring R is called an integral domain if it is commutative, contains an identity 1 6= 0, and
contains no zero divisors.

The terminology of integral domain evokes the fact that integral domains resemble the algebra
of the integers. However, we will encounter many other integral domains besides the integers.

Example 5.1.18. The ring Z ⊕ Z is not an integral domain because, in particular, (1, 0) · (0, 1) =
(0, 0) so (1, 0) and (0, 1) are zero divisors. 4
212 CHAPTER 5. RINGS

Proposition 5.1.19 (Cancellation Law)


Let R be an integral domain. Then R satisfies the cancellation law, namely that for all
a, b, c ∈ R,
ab = ac =⇒ b = c.

Proof. Adding −(ac) to both sides, we have


ab = ac ⇒ ab − ac = 0 ⇒ a(b − c) = 0.
Since a is not a zero divisor, we must have b − c = 0. Thus, b = c. 
One consequence of this proposition is that a does not have to be a unit. In fact, the cancellation
law applies in any ring whenever a is not a zero divisor.

Definition 5.1.20
A ring R with identity 1 6= 0 is called a division ring if every nonzero element in R is a
unit.

Example 5.1.21. The ring of quaternions H is a division ring. A simple calculation gives
(a + bi + cj + dk)(a − bi − cj − dk)
= a2 − abi − acj − adk + abi − b2 (−1) − bck − bd(−j)
+ acj − bc(−k) − c2 (−1) − cdi + adk − bdj − cd(−i) − d2 (−1)
= a2 + b2 + c2 + d2 .
By changing the signs on b, c, and d, we get the same result with the product in reverse order. For
all quaternions α = a + bi + cj + dk 6= 0, the sum of squares a2 + b2 + c2 + d2 6= 0 and so the inverse
of α is
1
α−1 = 2 (a − bi − cj − dk).
a + b2 + c2 + d2
The quaternions H are an example of a noncommutative division ring. Because of the importance
of the above calculation, if α = a + bi + cj + dk ∈ H we define the notation α = a − bi − cj − dk and
we call
def
N (α) = αα = a2 + b2 + c2 + d2
the norm of α. 4

Definition 5.1.22
A commutative division ring is called a field .

A field is a set F that possesses an addition + and a multiplication ×, in which (F, +) is an


abelian group, (F − {0}, ×) is an abelian group, and in which × is distributive over +. In previous
levels of algebra, students encounter the fields of Q, R, and C. However, in the context of modular
arithmetic, we have encountered other finite fields. When p is a prime number, Z/pZ is a field
containing p elements. We denote it by Fp to indicate the implied field structure.

5.1.4 – Subrings
As in every algebraic structure, we define the concept of substructure.

Definition 5.1.23
Let (R, +, ×) be a ring. A subset S is called a subring of R if (S, +) is a subgroup of (R, +)
and if S is closed under ×. If R is a ring with an identity 1R and if S is a subring with an
identity 1S = 1R , then S is called a unital subring of R.
5.1. INTRODUCTION TO RINGS 213

Using the One-Step Subgroup Criterion from group theory, in order to prove that S is a subring
of a ring R, we simply need to prove that S is closed under subtraction with the usual definition of
subtraction
def
a − b = a + (−b)
and closed under multiplication.
As an example, if R = Z, then for any integer n, consider the subset of multiples nZ. We know
that the subset nZ is a subgroup with +. Furthermore, for all na, nb ∈ nZ we have (na)(nb) =
n(nab) ∈ nZ, so nZ is closed under multiplication. Hence, nZ is a subring of Z.
This first example illustrates that the definition of a subring makes no assumption that if R
contains an identity 1 6= 0 then a subring S does also. In contrast, it is possible that a subring S
contains an identity 1S that is different from the identity 1R . For example, consider the subring
S = {0, 2, 4} of the ring R = Z/6Z. Obviously, 1R = 1 but it is easy to check that the identity of S
is 1S = 4. With the above terminology, S is a subring of R but not a unital subring of R.
Example 5.1.24 (Ring of Continuous Functions). Consider the set C 0 ([a, b], R) of continuous
real-valued functions from the interval [a, b]. This is a subset of the ring of functions Fun([a, b], R)
from the interval [a, b] to R. The reader should recall that some theorems, usually introduced in a
first calculus course and proven in an analysis course, establish that subtraction and × are binary
operations on C 0 ([a, b], R). In particular, the proof that the product of two continuous functions is
continuous is not trivial. Consequently, C 0 ([a, b], R) is a subring of Fun([a, b], R). 4

Some properties of a ring are preserved in subrings. For example, any subring of a commutative
ring is again a commutative ring. Furthermore, if R is an integral domain, then any subring of R
that contains 1 is also an integral domain. On the other hand, a subring of a field need not be a
field even if it contains the identity. The integers Z as a subring in Q gives an example of this.
As an abstract example of subrings we discuss the center of a ring.

Definition 5.1.25
Let R be a ring. The center of R, denoted C(R), consists of all elements that commute
with every other element under multiplication. In other words,

C(R) = {z ∈ R | zr = rz for all r ∈ R}.

Proposition 5.1.26
Let R be a ring. The center C(R) is a subring of R.

Proof. Let z1 , z2 ∈ C(R) and let r ∈ R. Then

r(z1 + (−z2 )) = rz1 + r(−z2 ) = rz1 + (−(rz2 )) = z1 r + (−(z2 r)) = z1 r + (−(z2 )r) = (z1 + (−z2 ))r.

Hence, z1 + (−z2 ) ∈ C(R) and thus (C(R), +) is a subgroup of (R, +). Furthermore,

r(z1 z2 ) = (rz1 )z2 = (z1 r)z2 = z1 (rz2 ) = z1 (z2 r) = (z1 z2 )r,

so z1 z2 ∈ C(R). This proves that C(R) is a subring. 

Exercises for Section 5.1

In Exercises 5.1.1 through 5.1.8, decide whether the given set R along with the stated addition and multipli-
cation form a ring. If it is, prove it and decide whether it is commutative and whether it has an identity. If
it is not, decide which axioms fail. You should always check that the symbol is in fact a binary operation on
the given set.
1. Let R = R>0 , with the addition of x  y = xy, and the multiplication x ⊗ y = xy .
214 CHAPTER 5. RINGS

×y = x + y + xy. [See Exercise 3.2.2.]


2. Let R = Q, with addition x  y = x + y + 1, and multiplication x+
3. Let R = R3 with addition as vector addition and multiplication as the cross product.
4. Let S be any set and consider R = P(S) with 4 as the addition operation and ∩ as the multiplication.
5. Let S be any set and consider R = P(S) with ∪ as the addition operation and ∩ as the multiplication.
6. Let S be any set and consider R = P(S) with 4 as the addition operation (defined as A4B = A4B)
and ∪ as the multiplication.
7. Let R be the set of finite unions of bounded intervals in R (possibly empty or singletons {a}). Let
A, B ∈ R. Define the symmetric difference A4B on R as the addition and the convex hull of A ∪ B
as the multiplication. (We define the convex hull of a subset S of R as the smallest bounded interval
containing S.)
8. Let R = Z × Z and define (a, b) + (c, d) = (a + c, b + d) and define also (a, b) × (c, d) = (ad − bc, bd).
9. Let R be a ring, let r, s ∈ R, and let m, n ∈ Z. Prove the following formulas with the · notation.

(a) m · (r + s) = (m · r) + (m · s)
(b) (m + n) · r = (m · r) + (n · r)

10. Let R be a ring, let r, s ∈ R, and let m, n ∈ Z. Prove the following formulas with the · notation.

(a) m · (rs) = r(m · s) = (m · r)s


(b) (mn) · r = m · (n · r)

11. Prove that in C 0 ([a, b], R) the composition operation ◦ is right-distributive over +.
12. Let I be an interval of real numbers. Prove that the zero divisors in Fun(I, R) are nonzero functions
f (x) such that there exists x0 ∈ I such that f (x0 ) = 0. Prove that all the elements in Fun(I, R) are
either 0, a zero divisor, or a unit.
13. Prove (carefully) that the nonzero elements in (C 0 ([a, b], R), +, ×) that are neither zero divisors nor
units are functions for which there exists an x0 ∈ [a, b] and an ε > 0 such that f (x0 ) = 0 and for
which f (x) 6= 0 for all x such that 0 < |x − x0 | < ε.
14. Prove that multiplication in H is associative.
15. Let α = 1 + 2i + 3j + 4k and β = 2 − 3i + k in H. Calculate the following operations: (a) α + β; (b)
αβ; (c) βα; (d) αβ −1 ; (e) β 2 .
16. Let α, β ∈ H be arbitrary. Decide whether any of the operations αβ −1 , β −1 α, βα−1 , or α−1 β are
equal.
17. Let R = {a+bi+cj +dk ∈ H | a, b, c, d ∈ Z}. Prove that R is a subring of H and prove that U (R) = Q8 ,
the quaternion group.
18. Fix an integer n ≥ 2. Let R(n) be the set of symbols a + ib where a, b ∈ Z/nZ. Define + and × on R
like addition and multiplication in C.

(a) Prove that R(n) is a ring.


(b) Set n = 6. Identify all the zero divisors in R(6).

19. Define Hom(V, W ) as the set of linear transformations from a real vector space V to an other real
vector space W . Prove that Hom(V, V ), equipped with + and ◦ (composition) is a ring.
20. Let R1 and R2 be rings with nonzero identity elements. Prove that U (R1 ⊕ R2 ) ∼= U (R2 ) ⊕ U (R2 ).
Prove the equivalent result for a finite number of rings R1 , R2 , . . . , Rn .
21. Prove that the characteristic char(R) of an integral domain R is either 0 or a prime number. [Hint:
By contradiction.]
22. Consider the ring Z ⊕ Z and consider the subset R = {(x, y) | x − y = 0}. Prove that R is a subring.
Decide if R is an integral domain.
23. Prove that a finite integral domain is a field.
24. Let R1 and R2 be rings. Prove that R1 ⊕ R2 is an integral domain if and only if R1 is an integral
domains and R2 = {0} or vice versa.
5.1. INTRODUCTION TO RINGS 215

25. (Binomial Formula) Let R be a ring and suppose that x and y commute in R. Prove that for all
positive integers n, !
n
n
X n n−i i
(x + y) = x y.
i=0
i

26. Let R be a ring and suppose that x and y commute in R. Prove that for all positive integers n,

xn − y n = (x − y)(xn−1 + xn−2 y + xn−3 y 2 + · · · + y n−1 ).

27. Prove that if a ∈ R is idempotent, then an = a for all positive integers n.


28. In the ring M2 (Z),
(a) find two nilpotent elements;
(b) find two idempotent elements that are not the identity.
29. Prove that if R contains an identity, then all idempotent elements that are not the identity are zero
divisors.
30. Prove that if A ∈ Mn (R) is nilpotent then all of its eigenvalues are 0. [Remark: The converse is also
true but follows from the Jordan canonical form of a matrix. See Section 10.9.]
31. Let R be a commutative ring.
(a) Prove that the set of nilpotent elements, N (R), is closed under addition. [Hint: Binomial
formula. See Exercise 5.1.25.]
(b) Prove that N (R) is closed under multiplication.
(c) Conclude that N (R) is a subring of R.
32. Let R be a commutative ring with an identity 1 6= 0. Let x ∈ N (R). Prove that 1 − x is a unit.
33. Let R = Z/81Z. Determine the elements in N (R). In particular, determine the cardinality of N (R).
34. Let R = Z/700Z. Determine the elements in N (R). In particular, determine the cardinality of N (R).
35. A Boolean ring is a ring R in which r2 = r for all r ∈ R.
(a) Prove that the characteristic of a Boolean ring with an identity is 2.
(b) Prove that every Boolean ring is commutative.
36. Let R be a ring and suppose that a and b are two elements such that a3 = b3 and a2 b = b2 a. Can
a2 + b2 be a unit? [This exercise appeared in modified form as Problem A-2 on the 1991 Putnam
Mathematics Competition.]
37. Let R be a ring such that x3 = x for all x ∈ R. Prove that R is commutative.
38. Prove that nk | n, k ∈ Z with k odd is a subring of Q.


39. Prove that {a + bi | a, b ∈ Z} is a subring of C.


40. Let R be any ring and let n be a positive integer. Prove that {n · r | r ∈ R} is a subring of R.
41. Determine with proof, which of the following subsets are subrings of Z ⊕ Z.
(a) {(a, b) ∈ Z ⊕ Z | 2a + b = 0}
(b) {(a, b) ∈ Z ⊕ Z | a = b}
(c) {(a, b) ∈ Z ⊕ Z | a + b is even }
(d) {(a, b) ∈ Z ⊕ Z | ab = 0}
42. Determine with proof, which of the following subsets are subrings of Q.
(a) Fractions, which when written in reduced form, have an odd denominator.
(b) Fractions, which when written in reduced form, have an even denominator.
(c) Fractions of the form k2n , where k is odd and n ∈ Z.
n
(d) Fractions, which when written in reduced form, are 2k .
43. Prove that the set of periodic real-valued functions of period p is a subring of Fun(R, R). [Note:
Functions that are periodic with period p satisfy f (x + p) = f (x) for allx. Such functions may be
periodic with a lower period or even constant.]
216 CHAPTER 5. RINGS

44. Let C n ([a, b], R) be the set of real-valued functions on [a, b] whose first n derivatives exist and are
continuous. Prove that C n+1 ([a, b], R) is a proper subring of C n ([a, b], R).
45. Let R be any ring and let {Si }i∈I be a collection of subrings (not necessarily finite or countable).
Prove that the intersection \
Si
i∈I

is a subring of R.
46. Let R be a ring and let R1 and R2 be subrings. Prove by a counterexample that R1 ∪ R2 is in general
not a subring.
47. Let R be a ring and let a be a fixed element of R. Define C(a) = {r ∈ R | ra = ar}. Prove that C(a)
is a subring of R.
48. Let R be a ring and let a be a fixed element of R.
(a) Prove that the set {x ∈ R | ax = 0} is a subring of R.
(b) With R = Z/100Z, and a = 5, find the subring defined in part (a).

5.2
Rings Generated by Elements
Following the general outline presented in the preface, this section first introduces a particular
method to efficiently describe certain types of subrings. Motivated by the notation, we introduce
two important families of rings that build new rings from old ones.

5.2.1 – Generated Subrings


Let A be a commutative ring. Let R be a subring of A and let S be a subset of A. The notation
R[S] denotes the smallest (by inclusion) subring of A that contains both R and S. Obviously, if
S ⊂ R, the ring R[S] = R so the notation is uninteresting. However, if elements of S are not in R,
then R is a proper subring of R[S].
the ring Z 21 as a subring of Q. Since Z 12 is closed under multiplica-
   
Example 5.2.1. Consider
tion, 41 = 12 × 12 ∈ Z 21 and more generally 21n ∈ Z 21 for all nonnegative integers n. Also because
   

the
 1subring
 is closed under multiplication, for all integers k and n, the fraction 2kn is an element in
Z 2 . It is not hard to show that the set
 
k
k, n ∈ Z
2n

is a subring of Q. Hence, this set is precisely the ring Z 12 .


 
4

It is not uncommon for the ring A to be implied by the elements in the set S. The following two
examples illustrate this habit of notation.
Example 5.2.2 (Gaussian Integers). Consider the ring Z[i]. It is understood that i is the imag-
inary number that satisfies i2 = −1. This notation assumes that the superset ring A is the ring
C. The ring Z[i] contains all the integers and, since it is closed under multiplication, it contains all
integer multiples of i. Since Z[i] is closed under addition, it must contain the subset

{a + bi ∈ C | a, b ∈ Z}.

However, this subset is closed under subtraction and under multiplication with

(a + bi)(c + di) = (ac − bd) + (ad + bc)i.


5.2. RINGS GENERATED BY ELEMENTS 217

−4 + 3i

3 + 2i

x
−1 − i

Figure 5.1: Gaussian integers Z[i]

Hence, this subset is the smallest subring in C containing both Z and the element i and so it is
precisely Z[i]. In the usual manner of depicting a complex number a + bi as a point in the plane,
the subring Z[i] consists of the points with integer coordinates. (See Figure 5.1.)
The ring Z[i] is called the ring of Gaussian integers and is important in elementary number
theory.
In C, the multiplicative inverse of an element is
a − bi
(a + bi)−1 = .
a2 + b2
The group of units U (Z[i]) consists of elements a + bi ∈ Z[i] such that
a b
, ∈ Z.
a2 + b2 a2 + b2
If |a| ≥ 2, then a2 > |a|, in which case a2 + b2 > |a| and hence a2 + b2 could not divide a. A
symmetric result holds for b. Consequently, if a + bi ∈ U (Z[i]), then |a| ≤ 1 and |b| ≤ 1. However,
if |a| = 1 and |b| = 1, then a2 + b2 = 2, while a = ±1 and so a/(a2 + b2 ) ∈
/ Z. Thus, we see that the
only units in Z[i] have |a| = 1 and b = 0 or a = 0 and |b| = 1. Hence,

U (Z[i]) = {1, −1, i, −i}.

This group of units is isomorphic to Z4 . 4



Example 5.2.3.
√ √ Consider as another example the ring Z[ 2]. Obviously, Z is a subring √ of R and
2 ∈ R so Z[ 2] is the smallest subring of R containing the subring of integers and 2.
Following a similar process as in Example 5.2.2, it is easy to find that
√ √
Z[ 2] = {a + b 2 ∈ R | a, b ∈ Z}.
√ √ √
We comment that all the powers of 2 must be in Z[ 2] but all even powers of 2 are powers of 2
√ 2k+1 √
but for all odd integers 2k +√1, the powers 2 = 2k 2.
The group of units U (Z[ 2]) is more complicated than U (Z[i]) and we leave this investigation
as a project. (See Project I.) √ √
One way to depict the elements of Z[ 2] is to represent the number a + b 2 in the plane as the
point
a~ı + b~u, where ~ı = (1, 0) and ~u = 2(cos(π/4), sin(π/4)).

The actual real number a + b 2 is obtained by projecting the vector onto the real line. (See
Figure 5.2.) 4
218 CHAPTER 5. RINGS

−7
~ı +
3~u
~u

5− 2
√ R
−7 + 3 2 ~ı 5~ı −
~
u


Figure 5.2: A representation of Z[ 2]

5.2.2 – Polynomial Rings


The notation we introduced from subrings generated by elements motivates another important con-
struction for rings, that of polynomial rings. If R is any commutative ring, the notation R[x] denotes
the set of polynomials in the variable x and with coefficients in R. More precisely, polynomials are
finite expressions of the form
a(x) = am xm + · · · + a1 x + a0
with m ∈ N, the coefficients ai ∈ R for all i = 0, 1, . . . , m, and am 6= 0.
A summand ai xi of the polynomial is called a term. We call two terms like terms if in each,
the variable x has the same degree. The element ai is called the coefficient of the term ai xi . The
integer m, i.e., the highest power of x appearing in a nonzero term of a nonzero polynomial, is called
the degree of the polynomial, and is denoted deg a(x). (The concept of degree of a polynomial is
undefined for the zero polynomial.) If m = deg a(x), the term am xm is called the leading term of
a(x) and is denoted by LT(a(x)). The ring element am is called the leading coefficient of a(x) and
is denoted by LC(a(x)). If R is a ring with identity 1 6= 0, a polynomial a(x) ∈ R[x] is called monic
if LC(a(x)) = 1.
Polynomials can be added and multiplied in the usual way, except that all the operations on the
coefficients must occur in the ring of coefficients R. More explicitly, suppose that
a(x) = am xm + · · · + a1 x + a0 and b(x) = bn xn + · · · + b1 x + b0 .
For addition, if m 6= n, we can make some initial coefficients 0 and assume n = m. Then
a(x) + b(x) = (an xn + an−1 xn−1 + · · · + a1 x + a0 ) + (bn xn + bn−1 xn−1 + · · · + b1 x + b0 )
= (an + bn )xn + (an−1 + bn−1 )xn−1 + · · · + (a1 + b1 )x + (a0 + b0 ).
For multiplication, one distributes the terms of a(x) with the terms of b(x), multiplies terms and
variable powers as appropriate, and then gathers like terms. We can write succinctly
 
m+n
X X
a(x)b(x) =  a i bj  x k . (5.1)
k=0 i+j=k

The following proposition is the main point of this subsection. The result may feel intuitively
obvious but we provide the proof to illustrate that the details are not so obvious.

Proposition 5.2.4
Let R be a commutative ring. Then with the operations of addition and multiplication
defined as above, R[x] is a commutative ring that contains R as a subring.
5.2. RINGS GENERATED BY ELEMENTS 219

Proof. Let a(x), b(x), c(x) be polynomials in R[x]. Let n = max{deg a, deg b, deg c}. If k > deg p(x)
for any polynomial, we will assume pk = 0. Then
a(x) + (b(x) + c(x)) = a(x) + ((bn + cn )xn + · · · + (b1 + c1 )x + (b0 + c0 ))
= (an + (bn + cn )) xn + · · · + (a1 + (b1 + c1 )) x + (a0 + (b0 + c0 ))
= ((an + bn ) + cn ) xn + · · · + ((a1 + b1 ) + c1 ) x + ((a0 + b0 ) + c0 )
= ((an + bn )xn + · · · + (a1 + b1 )x + (a0 + b0 )) + c(x)
= (a(x) + b(x)) + c(x).
So + is associative on R[x].
The additive identity is the 0 polynomial. The additive inverse of a polynomial a(x) is −an xn −
· · · − a1 x − a0 . The addition is commutative so (R[x], +) is an abelian group.
To show that polynomial multiplication is associative, we use (5.1). If deg a(x) = m, deg b(x) = n,
and deg c(x) = `, then
   
m+n
X X
(a(x)b(x)) c(x) =   ai bj  xk  c(x)
q=0 i+j=q
   
m+n+`
X X X
=   ai bj  ck  xh
h=0 q+k=h i+j=q
 
m+n+`
X X
=  ai bj ck  xh
h=0 i+j+k=h
  
m+n+`
X X X
=  ai  bj ck  xh
h=0 i+r=h j+k=r
   
Xn+` X
= a(x)   bj ck  xr 
r=0 j+k=r

= a(x) (b(x)c(x)) .
Since R is commutative, we have
   
m+n
X X m+n
X X
a(x)b(x) =  ai bj  xk =  bj ai  xk = b(x)a(x).
k=0 i+j=k k=0 i+j=k

Thus, the multiplication in R[x] is commutative.


Finally, to prove distributivity, by virtue of commutativity in R[x] we only need to prove left-
distributivity. Assume without loss of generality that deg b(x) = deg c(x) = n. Then
a(x) (b(x) + c(x)) = a(x) ((bn + cn )xn + · · · + (b1 + c1 )x + (b0 + c0 ))
 
m+n
X X
=  ai (bj + cj ) xk
k=0 i+j=k
 
m+n
X X
=  (ai bj + ai cj ) xk
k=0 i+j=k
   
m+n
X X m+n
X X
=  ai bj  xk +  ai cj  xk
k=0 i+j=k k=0 i+j=k

= a(x)b(x) + a(x)c(x).
220 CHAPTER 5. RINGS

This proves all the axioms of a commutative ring.


The subset of R[x] consisting of polynomials of degree 0 along with the 0 polynomial, add and
multiply just as in the ring R, so R is naturally a subring. 

In elementary algebra, we regularly work in the context of Z[x], Q[x], or R[x], which are poly-
nomial rings with integer, rational, or real coefficients respectively. However, consider the following
example where the ring of coefficients is a finite ring.
Example 5.2.5. Consider the polynomial ring S = (Z/3Z)[x]. As examples of operations in this
ring, let p(x) = x2 + 2x + 1 and q(x) = x2 + x + 1 be two polynomials in S. (For brevity, we omit
the bar and write 2 instead of 2.) We calculate the addition and multiplication:

p(x) + q(x) = (x2 + 2x + 1) + (x2 + x + 1) = 2x2 + 0x + 2 = 2x2 + 2,


p(x)q(x) = (x2 + 2x + 1)(x2 + x + 1)
= x4 + x3 + x2 + 2x3 + 2x2 + 2x + x2 + x + 1
= x4 + (1 + 2)x3 + (1 + 2 + 1)x2 + (2 + 1)x + 1 = x4 + x2 + 1. 4

Polynomial rings are important families of rings and find applications in countless areas. We will
use them for many examples of properties of rings and we will study properties of polynomial rings
at length.

Proposition 5.2.6
Let R be an integral domain.
(1) Let a(x), b(x) be nonzero polynomials in R[x]. Then deg a(x)b(x) = deg a(x) +
deg b(x).
(2) The units of R[x] are the units of R. In other words, U (R[x]) = U (R).
(3) R[x] is an integral domain.

Proof. If deg a(x) = m and deg b(x) = n, then a(x)b(x) has no terms of degree higher than m + n.
However, since am 6= 0 and bn 6= 0, the product contains the term am bn xm+n as long as am bn 6= 0.
Since R contains no zero divisors, am bn 6= 0. Hence, deg a(x)b(x) = m + n = deg a(x) + deg b(x).
The multiplicative identity in R[x] is the degree 0 polynomial 1. Suppose a(x) ∈ U (x) and let
b(x) ∈ U (x) with a(x)b(x) = 1. Then deg(a(x)b(x)) = 0. Since the degree of a polynomial is a
nonnegative integer, part (1) implies that deg a(x) = 0. Hence, a(x) ∈ R and part (2) follows.
Since R is an integral domain it contains an identity 1 6= 0. This is also the multiplicative
identity for R[x]. Let a(x), b(x) be nonzero polynomials. Let am xm be the leading term of a(x)
and let bn xn be the leading term of b(x). Then am bn xm+n is the leading term of their product.
Since the coefficients am and bn are nonzero and R contains no zero divisors, am bn 6= 0 and hence
a(x)b(x) 6= 0. Thus, R[x] is an integral domain. 

Proposition 5.2.7
Let R be a commutative ring. A polynomial a(x) ∈ R[x] is a zero divisor if and only if
∃r ∈ R − {0} such that r(a(x)) = 0.

Proof. (⇐=). This direction is obvious, since r can be viewed as a polynomial (of degree 0).
(=⇒). Suppose that a(x) is a zero divisor in R[x]. This means that there exists a polynomial
b(x) ∈ R[x] such that a(x)b(x) = 0. We write

a(x) = am xm + · · · + a1 x + a0 ,
b(x) = bn xn + · · · + b1 x + b0 .
5.2. RINGS GENERATED BY ELEMENTS 221

We will show that r = bm+10 satisfies ra(x) = 0. More precisely, we show by (strong) induction that
ai bi+1
0 = 0 for all 0 ≤ i ≤ m.
The term of degree 0 in the product a(x)b(x) has the coefficient a0 b0 . We must have a0 b0 = 0
since a(x)b(x) = 0. This gives the basis step of our proof by induction. Now suppose that ai bi+1
0 =0
for all 0 ≤ i ≤ k. The term of degree k + 1 in a(x)b(x) = 0 is
k+1
X
0 = ak+1 b0 + ak b1 + · · · + a1 bk + a0 bk+1 = ai bk+1−i .
i=0

Multiplying this equation by bk+1


0 , we have

0 = ak+1 bk+2
0 + ak bk+1
0 b1 + · · · + a1 bk+1
0 bk + a0 bk+1
0 bk+1 .

Using the induction hypothesis, we get 0 = ak+1 bk+2 0 because all the other terms are 0. By induction,
ai bi+1
0 = 0 for all coefficients ai in the polynomial a(x). Since deg a(x) = m, we have bm+1
0 a(x) = 0.

Having described the construction of a polynomial ring in one variable, the construction extends
naturally to polynomial rings in more than one variable. If R is a commutative ring, then R[x] is
another commutative ring and R[x][y] is then a polynomial ring in the two variables x and y. We
typically write R[x, y] instead of R[x][y] for the polynomial ring with coefficients in R and with the
two variables x and y. More generally, if x1 , x2 , . . . , xn are symbols for variables, then we inductively
define the polynomial ring with coefficients in R in these variables by

R[x1 , x2 , . . . , xn ] = R[x1 , x2 , . . . , xn−1 ][xn ].

Because of this inductive definition, it is not particularly appropriate to use the terminology of
“multivariable polynomial ring” because R[x1 , x2 , . . . , xn ] can be viewed as a polynomial ring in one
variable but with coefficients in R[x1 , x2 , . . . , xn−1 ].
Polynomial rings offer many examples of properties of rings, in particular commutative rings. In
upcoming sections, we will study properties of rings F [x], where F is a field, or other polynomial
rings R[x] where R has specific properties. Polynomial rings F [x1 , x2 , . . . , xn ], where F is a field, are
more challenging to study than F [x]. Chapter 12 studies such rings, and more generally, Noetherian
rings, in detail.

5.2.3 – Group Rings


We now consider group rings as a final example of a class of rings described by some generating
elements.
Let G be a group and list the elements out as G = {g1 , g2 , . . . , gn }. Let R be a commutative
ring. We define the set R[G] as the set of formal sums

a1 g1 + a2 g2 + · · · + an gn

where ai ∈ R. Note that if g1 is the group identity, we usually write a1 g1 as just a term a1 . As with
polynomials, we call any summand ai gi a term of the formal sum.
Addition of formal sums is done component-wise:
n
X n
 X  Xn
ai gi + bi gi = (ai + bi )gi .
i=1 i=1 i=1

We define the multiplication · of formal sums by distributing · over + and then rewriting terms as

(ai gi ) · (bj gj ) = (ai bj )(gi gj ) = (ai bj )gk

where the product ai bj occurs in R and the operation gi gj = gk corresponds to the group operation
in G. Then, just as with polynomials, one gathers like terms.
We illustrate the operations defined on formal sums with a few examples.
222 CHAPTER 5. RINGS

Example 5.2.8. Let G = D5 and consider the set of formal sums Z[D5 ]. Let
α = r2 + 2r3 − s and β = −r2 + 7sr.
Then α + β = 2r3 − s + 7sr and
αβ = (r2 + 2r3 − s)(−r2 + 7sr)
= −r4 + 7r2 sr − 2r5 + 14r3 sr + sr2 − 7ssr
= −r4 + 7sr4 − 2 + 14sr3 + sr2 − 7r
= −2 − 7r − r4 + sr2 + 14sr3 + 7sr4 . 4
It is not uncommon to use the group itself as the indexing set for the coefficients of the terms.
Hence, we often denote a generic group ring element as
X
α= ag g.
g∈G

In the proof of the following proposition, establishing associativity is the most challenging part.
We have used the above notation combined with iterated sums. The notation (x, y) : xy = g stands
for all pairs x, y ∈ G such that xy = g.

Proposition 5.2.9
Let R be a commutative ring and let G be a finite group. The set R[G], equipped with
addition and multiplication as defined above, is a ring and is called the group ring of R
and G. Furthermore, R is a subring of R[G].

Proof. If |G| = n, then the group (R[G], +) is isomorphic as a group to the direct sum of (R, +)
with itself n times. Hence, (R[G], +) is an abelian group. We need to prove that multiplication is
associative and
P that multiplication
P is distributive
Pover the addition.
Let α = g∈G ag g, β = g∈G bg g, and γ = g∈G cg g be three elements in R[G]. Then
       
X X X X X
(αβ)γ =   ax bz  g  γ =   ax by  cz  h
g∈G (x,y): xy=g h∈G (g,z): gz=h (x,y): xy=g
   
X X X X
=  ax by cz  h =  ax by cz  h
h∈G (x,y,z): (xy)z=h h∈G (x,y,z): x(yz)=h
      
X X X X X
=  ax  by c z   h = α   by cz  g 0 
h∈G (x,g 0 ): xg 0 =h (y,z): yz=g 0 g 0 ∈G (y,z): yz=g 0

= α(βγ).
This proves associativity of multiplication. Also
   
X X X X
α(β + γ) =  ax (by + cy ) g =  (ax by + ax cy ) g
g∈G (x,y): xy=g g∈G (x,y): xy=g
 
X X X
=  a x by + a x cy  g
g∈G (x,y): xy=g (x,y): xy=g
   
X X X X
=  ax by  g +  ax cy  g
g∈G (x,y): xy=g g∈G (x,y): xy=g

= αβ + αγ.
5.2. RINGS GENERATED BY ELEMENTS 223

This establishes left-distributivity. Right-distributivity is similar and establishes that R[G] is a ring.
The subset {r · 1 ∈ R[G] | r ∈ R} is a subring that is equal to R. 

Note that even if R is commutative, R[G] is not necessarily commutative. Most of the examples
of rings introduced so far in the text have been commutative rings. The construction of group rings
gives a wealth of examples of noncommutative rings.

Example 5.2.10. Consider the group ring (Z/3Z)[S3 ]. The elements in (Z/3Z)[S3 ] are formal sums

α = a1 + a(12) (12) + a(13) (13) + a(23) (23) + a(123) (123) + a(132) (132),

where each aσ ∈ Z/3Z. Since there are 3 options for each ai , there are 36 elements in this group
ring. As a simple illustration of some properties of elements, we point out that (Z/3Z)S3 contains
zero divisors. For example,

(1 + 2(12))(1 + (12)) = 1 + (12) + 2(12) + 2(12)2 = 1 + 0(12) + 2 = 0.

The ring has the identity element 1, which really is 1·1 and it contains units that are not the identity
since 1(12) · 1(12) = 1 as an operation in the group ring. 4

Proposition 5.2.11
Let R be a commutative ring with an identity 1 6= 0 and let G be a group. Then the
element 1 · 1G is the identity in R[G] and G is a subgroup of U (R[G]).

Proof. (Left as an exercise. See Exercise 5.2.17.) 

Proposition 5.2.11 along with Proposition 5.2.9 together show that the group ring R[G] is a ring
that includes the ring R as a subring and the group G as a subgroup of U (R[G]). This observation
shows in what sense R[G] is a ring generated by R and G.

Proposition 5.2.12
Let G be a finite group with |G| > 1 and R a commutative ring with more than one element.
Then R[G] always has a zero divisor.

Proof. Let r ∈ R − {0} and suppose that the element in g ∈ G has order m > 1. Then

(r − rg)(r + rg + rg 2 + · · · + rg m−1 )
= r2 + r2 g + r2 g 2 + · · · + r2 g m−1 − (r2 g + r2 g 2 + r2 g 3 + · · · + r2 g m )
= r2 − r2 g m = r2 − r2 = 0.

Thus, r − rg is a zero divisor. 

Among the examples of rings we have encountered so far, group rings are likely the most abstract.
They do not, in general, correspond to certain number sets, modular arithmetic, polynomials, func-
tions, matrices, or any other mathematical object naturally encountered so far. However, as with
any other mathematical object, one develops an intuition for it as one uses it and finds applications.
In Section 5.4.3, we will introduce a more general construction that simultaneously subsumes
polynomial rings and group rings. This leads to a definition for a group ring R[G] with an arbitrary
ring R and a group G that are not necessarily finite.
224 CHAPTER 5. RINGS

Exercises for Section 5.2


1. Prove that Z[i] is an integral domain.
√ √
2. Prove that Z[ 2, 5] consists of the following subring of R,
√ √ √
{a + b 2 + c 5 + d 10 ∈ R | a, b, c, d ∈ Z}.
√ √ √ √ √ √
Write out the multiplication between a + b 2 + c 5 + d 10 and a0 + b0 2 + c0 5 + d0 10 and collect
like terms.
1
3. Let r1 , r2 , . . . , rn ∈ Q. Prove that Z[r1 , r2 , . . . , rn ] = Z m for some integer m.
4. Let p be a prime number. Consider the subset R of Q of fraction, which, when written in reduced
form, has a denominator that is not divisible by p. Prove that R is a subring of Q. Prove that R
cannot be written as Z[S] for any finite set S ⊆ Q.

5. Prove that for all primes p, the ring Q[ p] is a field.

6. Consider the ring Q[ 3 2] as a subring of R.
√ √
(a) Prove that it consists of elements of the form a + b 3 2 + c( 3 2)2 with a, b, c ∈ Q.

(b) Prove that every element of the form a+b 3 2 with (a, b) 6= (0, 0) is a unit. [Hint: Exercise 5.1.26.]
7. Consider the ring (Z/2Z)[x]. Let α(x) = x3 + x + 1 and β(x) = x2 + 1. Calculate: (a) α(x) + β(x);
(b) α(x)β(x); (c) α(x)2 .
8. Consider the ring (Z/6Z)[x]. Let α(x) = 2x3 + 3x + 1 and β(x) = 2x2 + 5. Calculate: (a) α(x) + β(x);
(b) α(x)β(x); (c) α(x)3 .
9. For all n, calculate (2x + 3)n in Z/6Z[x]. Repeat the same question but in Z/12Z[x].
10. Suppose that R is a ring with identity 1 6= 0 of characteristic n. Prove that R[x] is also of characteristic
n.
11. Let p be a prime number. Prove that for all a ∈ Z/pZ, the following identity holds in the ring
(Z/pZ)[x],
(x + a)p = xp + a.
Prove that n is prime if and only if (x + a)n = xn + a in Z/nZ[x].
12. Let G = S3 and let R = Z/3Z. In the group ring Z/3Z[S3 ] consider α = 1 + (1 2) + 2(1 3) + (2 3) and
β = 2 + 2(1 3) + (1 2 3). Calculate: (a) α + 2β; (b) αβ; and (c) βα.
13. Show that the element (1 2) + (1 3) + (2 3) is in the center of the group ring Z/3Z[S3 ].
14. In the ring Z[Q8 ], find (i + j)n for all positive integers n. [Hint: Note that in Z[Q8 ], the element
(−1)k is not the same as (−k). In Q8 , k and (−k) are distinct group elements so they are not integer
multiples of each other.]
15. In the subring {a + bi + cj + dk ∈ H | a, b, c, d ∈ Z} of H, find (i + j)n for all positive integers n.
[Compare to the previous exercise.]
16. Let R be a commutative ring and G a group. Prove that R[G] is commutative if and only if G is
abelian.
17. Let R be a commutative ring with an identity 1 6= 0. Show that there is an embedding of G in
U (R[G]). Find an example of a ring R and a group G in which G is a strict subgroup of U (R[G]).
18. Let R be a commutative ring and let G be a group. Prove that α is in the center of R[G] if and only
if gα = αg for all g ∈ G.
19. Let R be a commutative ring. We denote by R[[x]] the set of formal power series

X
an xn
n=0

with coefficients in R. In R[[x]], we do not worry about issues of convergence. Addition of power
series is performed term by term and for multiplication

! ∞ ! ∞ n
X n
X n
X X X
an x bn x = cn xn where cn = ak bn−k = ai bj .
n=0 n=0 n=0 k=0 i+j=n
5.3. MATRIX RINGS 225

(a) Prove that R[[x]] with the addition and the multiplication defined above is a commutative ring.
(b) Suppose that R has an identity 1 6= 0. Prove that 1 − x is a unit.
(c) Prove that a power series of ∞ n
P
n=0 an x is a unit if and only if a0 is a unit.

20. Consider the power series ring Q[x]. (See Exercise 5.2.19.)
(a) P
Suppose that c0 is a nonzero square element. Prove that that there exists a power series
∞ n
n=0 an x such that !2
X∞ X∞
n
an x = cn xn .
n=0 n=0

(b) Find a recurrence relation for the terms an such that



!2
X n
an x = 1 + x.
n=0

21. Let R be a ring and let X be a set. Prove that the set Fun(X, R) of functions from X to R is a ring
with the addition and multiplication of functions defined by
def
(f1 + f2 )(x) = f1 (x) + f2 (x)
def
(f1 f2 )(x) = f1 (x)f2 (x).

22. Let R be a ring and let X be a set. The support of a function f ∈ Fun(X, R) is the subset

Supp(f ) = {x ∈ X | f (x) 6= 0}.

Consider the subset Funf s (X, R) of functions in Fun(X, R) that are of finite support, i.e., that are
0 except on a finite subset of X. Prove that Funf s (X, R) is a subring of Fun(X, R) as defined in
Exercise 5.2.21.

5.3
Matrix Rings
This section introduces an important family of examples of noncommutative rings, that of matrix
rings.

5.3.1 – Matrix Rings


Let R be an arbitrary ring. We define Mn (R) as the set of n × n matrices with entries from the ring
R. As in linear algebra, we typically denote elements of Mn (R) with a capital letter, say A, and we
denote the entries of the matrix with the corresponding lowercase letters aij ∈ R with the index i
indicating the row and the index j indicating the column.
Let A = (aij ) and B = (bij ) be two elements in Mn (R). The sum A + B is defined as the n × n
matrix whose (i, j)th entry is
aij + bij .
Inspired by the usual matrix product as defined in linear algebra, the product AB is defined as the
n × n matrix whose (i, j)th entry is
Xn
aik bkj . (5.2)
k=1

Since R need not be commutative, the order given in (5.2) is important.


2
The addition on Mn (R) has the same properties as the direct sum group (Rn , +). Hence,
(Mn (R), +) is an abelian group.
226 CHAPTER 5. RINGS

In linear algebra courses, students usually encounter matrices as representing linear transforma-
tions with respect to certain bases. The product of two matrices is defined as the matrix representing
the composition of the linear transformations. Since the composition of functions is always asso-
ciative (see Proposition 1.1.15), it follows that the product of matrices over the real (or complex)
numbers is associative. In order to prove that the multiplication in Mn (R) is associative, we can
only use (5.2) as the definition.
Let R be any ring. Let A = (aij ), B = (bij ), and C = (cij ) be matrices in Mn (R). Then the
(i, j)th entry of (AB)C is
n n
! n Xn n Xn
X X X X
aik bk` c`j = aik bk` c`j = aik bk` c`j
`=1 k=1 `=1 k=1 k=1 `=1
n n
!
X X
= aik bk` c`j .
k=1 `=1

This is the (i, j) entry of A(BC). Hence, (AB)C = A(BC) and matrix multiplication is associative.
The (i, j)th entry of A(B + C) is
n n n
! n
!
X X X X
aik (bkj + ckj ) = (aik bkj + aik ckj ) = aik bkj + aik ckj .
k=1 k=1 k=1 k=1

This is the (i, j)th entry of AB +AC so A(B +C) = AB +AC. This shows the matrix multiplication
is left-distributive over addition. Right-distributivity is proved in a similar way. We have proven
the key theorem of this section.

Proposition 5.3.1
The set Mn (R) equipped with the operations of matrix addition and matrix multiplication
is a ring.

In a first linear algebra course, students encounter matrices with real or complex coefficients. As
one observes with Mn (R), the multiplication in Mn (R) is not commutative even if R is. With a
little creativity, we can think of all manner of matrix rings. Consider for example,
M2 (Z/2Z); Mn (Z[x]); Mn (Z); Mn (C 0 ([0, 1], R)); or Mn (H).
Example 5.3.2. As an example of a matrix product in Mn (R) where R is not commutative, consider
the following product in M2 (H).
    
i 1 + 2j i+j k i(i + j) + (1 + 2j)(2 + i) ik + (1 + 2j)(2i − j)
=
i−k 3k 2 + i 2i − j (i − k)(i + j) + 3k(2 + i) (i − k)k + 3k(2i − j)
 
1 + i + 4j − k 2 + 2i − 2j − 4k
= . 4
−1 + i + 2j + 7k 1 + 3i + 5j
Rings of square matrices Mn (R) naturally contain many subrings. We mention a few here but
leave the proofs as exercises. Given any ring R, the following are subrings of Mn (R):
• Mn (S), where S is a subring of R;
• the set of upper triangular matrices;
• the set of lower triangular matrices;
• the set of diagonal matrices.
In Section 10.2, we will review general vector spaces over any field. However, we already point out
that most of the algorithms introduced in linear algebra—Gauss-Jordan elimination, various matrix
factorizations, and so on—can be applied without any modification to Mn (F ), where F is a field.
However, when R is a general ring, because nonzero elements might not be invertible and elements
might not commute, some algorithms are no longer guaranteed to work and some definitions no
longer make sense.
5.3. MATRIX RINGS 227

5.3.2 – Matrix Inverses


Let R be a ring with an identity 1 6= 0. Denote by In the n × n matrix (aij ) with aii = 1 for all
i = 1, 2, . . . , n and aij = 0 for i 6= j. Just as with matrices with real entries, the matrix In is the
multiplicative identity matrix in Mn (R).
Invertible matrices form an important topic in linear algebra. In ring theory language, invertible
matrices are the units in Mn (R), where R is a ring with an identity. In particular, if F is a field,
then the group of units U (Mn (F )) is the general linear group GLn (F ) introduced in Example 3.2.11,
properties of which are studied in group theory. This inspires the following more general definition.

Definition 5.3.3
Let R be a ring. We denote by GLn (R) the group of units U (Mn (R)) and call it the general
linear group of index n on the ring R.

This definition gives meaning to groups such as GLn (Z/kZ) or GLn (Z) but also general linear
groups over noncommutative rings, such as GLn (H).

5.3.3 – Determinants
If a ring R is commutative it is possible to define the determinant and recover some of the properties
of determinants we encounter in linear algebra. The propositions are well-known but the proofs
given in linear algebra sometimes rely on the ring of coefficients being in a field. For completeness,
we give proofs for the context of arbitrary rings.
Note, throughout this discussion on determinants, we assume that the ring R is commutative.

Definition 5.3.4
If R is a commutative ring, then we define the determinant as a function det : Mn (R) → R
defined on a matrix A = (aij ) ∈ Mn (R) by
X
det A = (sign σ)a1σ(1) a2σ(2) · · · anσ(n) . (5.3)
σ∈Sn

Example 5.3.5. Let R = Z/6Z and consider the matrix


 
2 3 5
A = 1 0 3 .
4 2 1

Then the determinant of A is

det A = 2 × 0 × 1 + 3 × 3 × 4 + 5 × 1 × 2 − 3 × 1 × 1 − 5 × 0 × 4 − 2 × 3 × 2
= 0 + 0 + 4 − 3 − 0 − 0 = 1.

In the above calculation, the products correspond (in order) to the following permutations: 1, (1 2 3),
(1 3 2), (1 2), (1 3), and (2 3). 4

Definition 5.3.4 is called the Leibniz formula for the determinant. Many courses on linear algebra
first introduce the determinant via the Laplace expansion. As we will see shortly, the two defini-
tions are equivalent. The Leibniz definition for the determinant leads immediately to the following
important properties of determinants.
228 CHAPTER 5. RINGS

Proposition 5.3.6
Let τ ∈ Sn be a permutation. If A0 is the matrix obtained from A by permuting the rows
(respectively the columns) of A according to the permutation τ , then

det(A0 ) = sign(τ ) det(A).

Proof. We prove the proposition first for permutations of the rows. By the Leibniz definition,
X
det A0 = (sign σ)aτ (1)σ(1) aτ (2)σ(2) · · · aτ (n)σ(n) .
σ∈Sn

Since R is commutative, we can permute the terms in each product so that the row indices are in
increasing order. This amounts to reordering the product according to the permutation τ −1 . Hence

aτ (1)σ(1) aτ (2)σ(2) · · · aτ (n)σ(n) = a1σ(τ −1 (1)) a2σ(τ −1 (2)) · · · anσ(τ −1 (n)) .

For any fixed τ , as σ runs through all permutations, so does στ −1 . Hence, we have
X
det A0 = (sign(στ −1 ))(sign(τ ))a1σ(τ −1 (1)) a2σ(τ −1 (2)) · · · anσ(τ −1 (n))
σ∈Sn
X
= (sign τ ) (sign σ 0 )a1σ0 (1) a2σ0 (2) · · · anσ0 (n) = (sign τ )(det A).
σ 0 ∈Sn

If A0 is obtained from A by permuting the columns of A according to the permutation τ , then


X
det A0 = (sign σ)a1σ(τ (1)) a2σ(τ (2)) · · · anσ(τ (n)) .
σ∈Sn

By a similar reasoning as for the rows, it again follows that det A0 = (sign τ )(det A). 

Proposition 5.3.7
For all A ∈ Mn (R), if A> denotes the transpose of A, then det(A> ) = det(A).

Proof. By definition, X
det(A> ) = (sign σ)aσ(1)1 aσ(2)2 · · · aσ(n)n .
σ∈Sn

We recall that sign(σ −1 ) = sign σ. Since R is commutative, by permuting the coefficients in each
product so that the row index is listed in sequential order, we have
X
det(A> ) = (sign σ)a1σ−1 (1) a2σ−1 (2) · · · anσ−1 (n)
σ∈Sn
X
= (sign σ −1 )a1σ−1 (1) a2σ−1 (2) · · · anσ−1 (n) .
σ∈Sn

However, the inverse function on group elements is a bijection Sn → Sn so as σ runs through all the
permutations in Sn , the inverses σ −1 also run through all the permutations. Hence,
X
det(A> ) = (sign σ 0 )a1σ0 (1) a2σ0 (2) · · · anσ0 (n) = det(A).
0
σ ∈Sn 

Note that neither Proposition 5.3.6 nor 5.3.7 would hold if R were not commutative. Commu-
tativity is also required for the following theorem. Though the Leibniz formula could be used for a
definition of the determinant for a matrix with coefficients in a noncommutative ring, many if not
5.3. MATRIX RINGS 229

most of the usual properties we expect for determinants would not hold. This is why we typically
only consider the determinant function on matrix rings over a commutative ring of coefficients.

Theorem 5.3.8
Let R be a commutative ring, n a positive integer, and let A ∈ Mn (R). Denote by Aij the
submatrix of A obtained by deleting the ith row and the jth column of A. For each fixed i,
n
X
det A = (−1)i+j aij det(Aij ) (5.4)
j=1

and for each fixed j,


n
X
det A = (−1)i+j aij det(Aij ). (5.5)
i=1

Formula (5.4) is called the Laplace expansion about row i and (5.5) is called the Laplace
expansion about column j.

Proof. Fix an integer i with 1 ≤ i ≤ n. Break the sum in (5.3) by factoring out each matrix entry
with a row index of i. Then (5.3) becomes
 
n ij a removed
X  X z }| {
det A = aij  (sign σ)a 1σ(1) a2σ(2) · · · anσ(n)  .


j=1 σ∈Sn
σ(i)=j

In the product inside the nested summation, all terms with row index i and with column index j
have been removed. Consequently, the inside summation resembles the Leibniz formula (5.3) of the
submatrix Aij , though we do not know if the sign of the permutation σ corresponds to that required
by (5.3).
Let σ ∈ Sn with σ(i) = j. Then the permutation

σij = (j j + 1 . . . n)−1 σ(i i + 1 . . . n)

leaves n fixed but has the same number of inversions as σ does if we remove i from the domain
{1, 2, . . . , n} of σ and remove j from the codomain. Since the sign of an m-cycle is (−1)m−1 , then

sign σij = (−1)n−i (sign σ)(−1)n−j = (−1)2n−i−j (sign σ) = (−1)i+j (sign σ).

Thus, we also have sign σ = (−1)i+j sign σij and so


 
n aij removed
X  X
aij (−1)i+j 
z }| {
det A =  (sign σij )a1σ(1) a2σ(2) · · · anσ(n) 

j=1 σ∈Sn
σ(i)=j
n
X
= aij (−1)i+j det(Aij ).
j=1

This proves (5.4).


Laplace expansion about column j in (5.5) follows immediately from Proposition 5.3.7. 

Another property that follows readily from the Leibniz formula is that the determinant is linear
by row and, by virtue of Proposition 5.3.7, linear by column as well. (See Exercise 5.3.11.) This
property inspires us to consider other functions F : Mn (R) → R that are linear in every row and
every column.
230 CHAPTER 5. RINGS

Proposition 5.3.9
A function F : Mn (R) → R is linear in every row and in every column if and only if there
exists a function f : Sn → R such that
X
F (A) = f (σ)a1σ(1) a2σ(2) · · · anσ(n) . (5.6)
σ∈Sn

Proof. Suppose first that F is linear in each row. By properties of linear transformations, if F is
linear in row 1, then
Xn
F (A) = cj1 a1j1
j1 =1

for some elements cj1 whose value may depend on the other rows. Furthermore, by picking appro-
priate values for the first row of A, we see that the functions cj1 must be linear in all the other rows.
Since each cj1 is linear in row 2, then
 
Xn Xn X
F (A) =  dj1 ,j2 a2j2  a1j1 = dj1 ,j2 a1j1 a2j2 ,
j1 =1 j2 =1 1≤j1 ≤j2 ≤n

where dj1 ,j2 are ring elements that depend on the elements in rows 3 through n. Continuing until
row n, we deduce that if F is linear in every row, then there exists a function f : {1, 2, . . . , n}n → R
such that X
F (A) = f (j1 , j2 , . . . , jn )a1j1 a2j2 · · · anjn .
1≤j1 ≤j2 ≤···≤jn ≤n

Now if F is also linear in each column, then f (j1 , j2 , . . . , jn ) must be 0 any time two of the indices
j1 , j2 , . . . , jn are equal, because otherwise, F (A) would contain a quadratic term in one of the entries
of the matrix. This proves that there exists a function f : Sn → R that satisfies (5.6).
Conversely, regardless of the function f : Sn → R, the function F defined as in (5.6) is linear in
every row and column. 

Proposition 5.3.10
Suppose that F : Mn (R) → R is a function that is linear in every row, is linear in every col-
umn, and satisfies the alternating property that if A0 is obtained from the matrix A by per-
muting the rows (or columns) according to the permutation τ , then F (A0 ) = (sign τ )F (A).
Then there exists a constant c ∈ R such that F (A) = c det(A).

Proof. According to Proposition 5.3.9, there exists a function f : Sn → R such that (5.6) holds. Call
c = f (1). According to (5.6), F (I) = f (1) = c. Consider the permutation matrix Eσ , for σ ∈ Sn ,
whose entries are (
1 if j = σ(i)
eij =
0 otherwise.
Then by the alternating property f (σ) = F (Eσ ) = (sign σ)f (I) = c(sign σ). Thus,
X
F (A) = c(sign σ)a1σ(1) a2σ(2) · · · anσ(n) = c det A.
σ∈Sn 

The property described in Proposition 5.3.9 characterizes determinants. Indeed, the determinant
is the unique function Mn (R) → R that is linear in the rows, linear in the columns, satisfies the
alternating condition, and is 1 on the identity matrix. This characterization of the determinant
leads to the following important theorem about determinants.
5.3. MATRIX RINGS 231

Proposition 5.3.11
Let R be a commutative ring. Then for any matrices A, B ∈ Mn (R),

det(AB) = (det A)(det B).

Proof. Given the matrix B, consider the function F : Mn (R) → R defined by F (A) = det(AB). For
a fixed i, suppose that we can write the ith row of the matrix A as aij = ra0ij + sa00ij with 1 ≤ j ≤ n.
We denote by A0 the matrix of A but with the ith row replaced with the row (a0ij )nj=1 and denote
by A00 the matrix of A but with the ith row replaced with the row (a00ij )nj=1 . Denote by C = AB,
C 0 = A0 B, and C 00 = A00 B. Then the ith row of C can be written as
n
X n
X
cij = aik bkj = (ra0ik bkj + sa00ik bkj )
k=1 k=1
n
! n
!
X X
=r a0ik bkj +s a00ik bkj
k=1 k=1
= rc0ij + sc00ij .

Since the determinant is linear in each row, then det C = r(det C 0 ) + s(det C 00 ). Thus, F (A) =
rF (A0 ) + sF (A00 ). Hence, F is linear in each row. By a similar reasoning, F is linear in each column.
We leave it as an exercise (Exercise 5.3.19) to prove that a function F : Mn (R) → R that
is linear in each row and linear in each column satisfies the alternating property (described in
Proposition 5.3.10) if and only if F (A) = 0 for every matrix A that has a repeated row or a repeated
column. By the definition of matrix multiplication, if A has two repeated rows, then AB also has
two repeated rows so and hence F (A) = det(AB) = 0. Hence, F satisfies the alternating property.
Consequently, by Proposition 5.3.10, F (A) = c det(A) and F (I) = c = det(B). Thus, det(AB) =
det(A) det(B). 

Finally, if the ring R has an identity 1 6= 0, the determinant gives a characterization of invertible
matrices.

Proposition 5.3.12
Let R be a commutative ring with an identity 1 6= 0. A matrix A ∈ Mn (R) is a unit if and
only if det A ∈ U (R). Furthermore, the (i, j)th entry of the inverse matrix A−1 is

(det A)−1 (−1)i+j det(Aji ).

Proof. (Left as an exercise for the reader. See Exercise 5.3.17.) 

Proposition 5.3.11 generalizes the definition in Example 3.7.13 that discussed general linear
groups over fields. Proposition 5.3.11 is equivalent to saying that the determinant function det :
GLn (R) → U (R) is a group homomorphism. We define the kernel of the homomorphism as the
special linear group
SLn (R) = {A ∈ Mn (R) | det A = 1}.

Exercises for Section 5.3


1. In M2 (Z/4Z), consider the matrices
     
1 2 0 1 2 2
A= , B= , C= .
2 3 1 1 3 1

Perform the following calculations, if they are defined: (a) A + BC; (b) ABC; (c) B n for all n ∈ N;
(d) C −1 ; (e) A−1 B.
232 CHAPTER 5. RINGS

2. Repeat Exercise 5.3.1 but with the ring of coefficient in Z/5Z.


3. Find the inverse of the matrix in Example 5.3.5.
4. Consider the following matrices in M2 (H):
   
1 i i+j k
A= , B= .
j k i j

Calculate: (a) A + B; (b) AB; (c) B 3 .


5. Let S be a subring of R. Prove that Mn (S) is a subring of Mn (R).
6. Prove that the subset of upper triangular matrices in Mn (R) is a subring.
7. Prove that the subset of diagonal matrices in Mn (R) is a subring.
8. Consider the ring Mn (Z). Consider the subsets S of upper triangular matrices A = (aij ) in which
2j−i divides aij for all indices (i, j) with j ≥ i. Prove that S is a subring of Mn (Z).
9. (Multivariable calculus required) Recall the Hessian matrix of a real-valued function f (x, y) defined
on an open set D ⊆ R2 . In what algebraic structure does the Hessian matrix of f exist?
10. Suppose that R is a ring with identity 1 6= 0. Prove that the center Z(Mn (R)) = {aIn | a ∈ Z(R)}
and In is the n × n identity matrix. Give an example where this result fails if R does not have an
identity.
11. Let R be a commutative ring. Prove that the determinant is “linear by row.” In other words, prove
that
a11 a12 ··· a1n
 
 a21 a22 ··· a2n 
.. .. ..
 
 .. 
 . . . . 
det  0
sa0i2 rain + sa0in 
 
 rai1 + sai1 rai2 + ··· 
 .. .. .. .. 
 . . . . 
an1 an2 ··· ann
a11 a12 ··· a1n a11 a12 ··· a1n
   
 a21 a22 ··· a2n   a21 a22 ··· a2n 
 .. .. ..   .. .. .. 
   
.. ..
 . . . .  + s det  .0
 . . . 
= r det   ai1 a0i2
.
 ai1
 ai2 ··· ain   ··· a0in 
 . .. .. ..   . .. .. .. 
 .. . . .   .. . . . 
an1 an2 ··· ann an1 an2 ··· ann

12. Let R be a commutative ring with an identity 1 = 6 0. Let σ ∈ Sn and defined the matrix Eσ as the
n × n matrix with entries (eij ) such that
(
1 if j = σ(i)
eij =
0 otherwise.

Prove that det Eσ = sign σ.


13. In the ring M2 (Q[x]), prove that the following matrix is invertible and find the inverse:
 
x+1 x−2
.
x+6 x+3

14. Let a, b ∈ Mn (R), where R is a ring.


(a) Prove that (AB)> = B > A> if R is commutative.
(b) Prove that this identity does not necessarily hold if R is not commutative.
15. Recall that the trace Tr : Mn (R) → R of a matrix is the sum of its diagonal elements. Let R be a
commutative ring. Prove that Tr(AB) = Tr(BA) for all matrices A, B ∈ Mn (R).
16. Let R be any ring. Suppose that A ∈ Mn (R) is strictly upper triangular (upper triangular but with
zeros on the diagonal). Prove that A is nilpotent. Prove that if B ∈ GLn (R), then BAB −1 is also
nilpotent.
5.4. RING HOMOMORPHISMS 233

17. Prove Proposition 5.3.17.


18. Find the number of units in M2 (Z/4Z).
19. Prove that a function F : Mn (R) → R that is linear in each row and linear in each column satisfies
the alternating property (described in Proposition 5.3.10) if and only if F (A) = 0 for every matrix A
that has a repeated row or a repeated column.
20. Let F : Mn (R) → R be a function that is linear in each row and linear in each column of an input matrix
A. Prove that F (A> ) = F (A) for all A ∈ Mn (R) if and only if the function f in Proposition 5.3.9
satisfies f (σ −1 ) = f (σ) for all σ ∈ Sn .

5.4
Ring Homomorphisms
In our study of groups, we emphasize that we do not typically concern ourselves with any functions
between objects with a given algebraic structure but only functions that preserve the structure. In
the context of rings, such functions are called ring homomorphisms.

5.4.1 – Ring Homomorphisms

Definition 5.4.1
Let R and S be two rings. A ring homomorphism is a function ϕ : R → S satisfying

• ∀a, b ∈ R, ϕ(a + b) = ϕ(a) + ϕ(b);


• ∀a, b ∈ R, ϕ(ab) = ϕ(a)ϕ(b).
A ring homomorphism that is also bijective is called a ring isomorphism. If there exists a
ring isomorphism between two rings R and S, we write R ∼ = S.

Example 5.4.2. The function ϕ : Z → Z/nZ defined simply by ϕ(a) = ā is a ring homomorphism.
This statement is simply a rephrasing of the definition of the usual operations in modular arithmetic,
namely that for all a, b ∈ Z,

a+b=a+b and ab = a b. 4

Example 5.4.3. Let S be a subring of R. The inclusion function i : S → R, simply defined by


i(a) = a is a ring homomorphism. 4

Example 5.4.4. Let R be a commutative ring. Fix an element r ∈ R. We define the evaluation
map evr : R[x] → R by

evr (an xn + · · · + a1 x + a0 ) = an rn + · · · + a1 r + a0 .

The function evr evaluates the polynomial at r. This function is a ring homomorphism. 4

Example 5.4.5. Let R be any ring. Let U2 (R) be the ring of upper triangular 2 × 2 matrices over
a ring R. Consider the function ϕ : U2 (R) → R ⊕ R defined by
 
a b
ϕ = (a, d).
0 d
234 CHAPTER 5. RINGS

Then
     
a1 b1 a2 b2 a1 + a2 b1 + b2
ϕ + =ϕ = (a1 + a2 , d1 + d2 )
0 d1 0 d2 0 d1 + d2
   
a b1 a b2
=ϕ 1 +ϕ 2 .
0 d1 0 d2

Furthermore,
    
a1 b1 a2 b2 a1 a2 a1 b2 + b1 d2
ϕ =ϕ = (a1 a2 , d1 d2 )
0 d1 0 d2 0 d1 d2
   
a b1 a b2
=ϕ 1 ϕ 2 .
0 d1 0 d2

The above two identities show that ϕ is a ring homomorphism. 4

Example 5.4.6. Consider the operator D : C 1 ([a, b], R) → C 0 ([a, b], R) defined by taking the
derivative D(f ) = f 0 . This is not a ring homomorphism. It is true that D(f + g) = D(f ) + D(g)
but the product rule is D(f g) = D(f )g + f D(g), which in general is not equal to D(f )D(g). 4

Example 5.4.7 (Reduction Homomorphism). Let n be an integer n ≥ 2. Define the function


π : Z[x] → (Z/nZ)[x] by

π(am xm + · · · + a1 x + a0 ) = am xm + · · · + a1 x + a0 .

It is not hard to show that π is a homomorphism. It is called the reduction homomorphism.


The reduction homomorphism leads to an important way to tell if an integer polynomial has
no roots. The key result is that for all a ∈ Z the following diagram of ring homomorphisms is
commutative:
eva
Z[x] Z

π π

eva
(Z/nZ)[x] Z/nZ

which means that π ◦ eva = eva ◦ π. Consequently, if q(x) ∈ Z[x] is a polynomial such that π(q(x))
has no roots as a polynomial in (Z/nZ)[x], then q(x) cannot have a root in Z[x]. But to check that
π(q(x)) has no roots in (Z/nZ)[x] we simply need to test all the congruence classes in Z/nZ. 4

Example 5.4.8. Consider the subset


  
a −b
R= ∈ M2 (R) a, b ∈ R
b a

in M2 (R). Consider the function ϕ : C → M2 (R) defined by


 
a −b
ϕ(a + bi) = .
b a

We easily see that


 
a + c −(b + d)
ϕ((a + bi) + (c + di)) = ϕ((a + c) + (b + d)i) =
b+d a+c
   
a −b c −d
= +
b a d c
= ϕ(a + bi) + ϕ(c + di).
5.4. RING HOMOMORPHISMS 235

Now the multiplication in C is (a + bi)(c + di) = (ac − bd) + (ad + bc)i while
    
a −b c −d ac − bd −ad − bc
= .
b a d c ad + bc ac − bd

These show that


  
a −b c −d
ϕ((a + bi)(c + di)) = = ϕ(a + bi)ϕ(c + di).
b a d c

We have proven that ϕ is a homomorphism We also notice that R = Im ϕ so R is a subring of


M2 (R). The homomorphism ϕ is surjective onto R but Ker ϕ = {0} so ϕ is also injective. Hence, ϕ
is an isomorphism between C and R. 4

Since a ring homomorphism ϕ : R → S is a group homomorphism between (R, +) and (S, +),
then by Proposition 3.7.9,

• ϕ(0R ) = 0S

• ϕ(−a) = −ϕ(a) for all a ∈ R;

• ϕ(n · a) = n · ϕ(a) for all a ∈ R and n ∈ Z.

It is also true that for all a ∈ R and all n ∈ N∗ , ϕ(an ) = ϕ(a)n . However, it is not necessarily true
that ϕ(1R ) = 1S , even if R and S both have identities. For example, if R and S are any rings, the
function ϕ : R → S such that ϕ(r) = 0 is a homomorphism. As a nontrivial example, consider the
function f : Z → Z/6Z defined by f (a) = 3̄ā. It is not hard to check that f is a ring homomorphism
but that the image is Im f = {0̄, 3̄}.
Following terminology introduced for groups (Definition 3.7.26), we call a homomorphism of a
ring R into itself an endomorphism on R and an isomorphism of a ring onto itself an automorphism
on R.

5.4.2 – Kernels and Images

Definition 5.4.9
Let ϕ : R → S be a ring homomorphism. The kernel of ϕ, denoted Ker ϕ, is the set of
elements of R that get mapped to 0, namely

Ker ϕ = {r ∈ R | ϕ(r) = 0}.

The image of ϕ, denoted Im ϕ, is the range of the function, namely

Im ϕ = {s ∈ S | ∃r ∈ R, ϕ(r) = s}.

Proposition 5.4.10
Let ϕ : R → S be a ring homomorphism.

(1) The image of ϕ is a subring of S.


(2) The kernel of ϕ is a subring of R. In fact, a stronger condition holds: if a ∈ Ker ϕ,
then for all r ∈ R, we also have ra ∈ Ker ϕ and ar ∈ Ker ϕ.

Proof. For part (1), let x, y ∈ Im ϕ. Then there exist a, b ∈ R such that x = ϕ(a) and y = ϕ(b).
Hence, x − y = ϕ(a) − ϕ(b) = ϕ(a − b) ∈ Im ϕ. Furthermore, xy = ϕ(a)ϕ(b) = ϕ(ab) ∈ Im ϕ. Since
Im ϕ is closed under subtraction and multiplication, it is a subring of S.
236 CHAPTER 5. RINGS

For part (2), let x, y ∈ Ker ϕ and let r ∈ R. Then ϕ(x) = ϕ(y) = 0. Consequently, ϕ(x − y) =
ϕ(x) − ϕ(y) = 0 − 0 = 0 so x − y ∈ Ker ϕ and Ker ϕ is closed under subtraction. Furthermore,
ϕ(rx) = ϕ(r)ϕ(x) = ϕ(r)0 = 0 and also ϕ(xr) = ϕ(x)ϕ(r) = 0ϕ(r) = 0 so Ker ϕ is closed under
multiplication within Ker ϕ but also closed under multiplication by any element in R. 
Example 5.4.11. Consider Example 5.4.4. Let s ∈ R be any element. Then evr (x + (s − r)) = s
so evr is surjective and hence the image of evr is all of R. The kernel Ker evr , however, is precisely
the polynomials that have r as a root. 4

5.4.3 – Convolution Rings (Optional)


We conclude this section with a discussion about a general class of rings that encompasses many
examples we have discussed above.
The reader may have noticed that the multiplications in polynomial rings and in group rings
look similar. However, the powers of x, namely {1, x, x2 , . . .}, equipped with multiplication do not
form a group. Therefore, a polynomial ring R[x] is not in general a group ring. Nonetheless, it is
possible to describe both constructions in a consistent way.
Let (R, +, ×) be a ring and let (S, ·) be a semigroup (a set equipped with an associative binary
operation). Suppose that a subring F of Fun(S, R) is such that for any pair of functions f1 , f2 ∈
Fun(S, R) the summation X
(f1 ∗ f2 )(s) = f1 (s1 )f2 (s2 ), (5.7)
s1 ·s2 =s

involves only a finite number of terms for all s ∈ S. We call this condition the convolution condition
and call the operation on functions the convolution product between f1 and f2 .

Proposition 5.4.12
Let R be a ring, let (S, ·) be a semigroup, and let F be a subring of Fun(S, R) that satisfies
the convolution condition. Then (F, +, ∗) is a ring.

Proof. Since (F, +) is an abelian group by virtue of (F, +, ×) being a ring, we only need to check
associativity of ∗ and the distributivity of ∗ over +.
Let α, β, γ ∈ F. Then for all s ∈ S, we have
!
X X X
(α ∗ (β ∗ γ)) (s) = α(s1 ) β(t1 )γ(t2 ) = α(s1 )β(t1 )γ(t2 )
s1 ·s2 =s t1 ·t2 =s2 s1 ·(t1 ·t2 )=s
!
X X X
= α(s1 )β(t1 )γ(t2 ) = α(s1 )β(t1 ) γ(t2 )
(s1 ·t1 )·t2 =s q·t2 =s s1 ·t1 =q

= ((α ∗ β) ∗ γ) (s).
Hence, α ∗ (β ∗ γ) = (α ∗ β) ∗ γ.
Also for all s ∈ S,
X
(α ∗ (β + γ)) (s) = α(s1 )(β(s2 ) + γ(s2 ))
s1 ·s2 =s
X
= (α(s1 )β(s2 ) + α(s1 )γ(s2 ))
s1 ·s2 =s
X X
= α(s1 )β(s2 ) + α(s1 )γ(s2 )
s1 ·s2 =s s1 ·s2 =s

= (α ∗ β)(s) + (α ∗ γ)(s).
Hence, α ∗ (β + γ) = α ∗ β + α ∗ γ. This proves left-distributivity of ∗ over +. The proof for
right-distributivity is similar and follows from right-distributivity of × over + in R. 
5.4. RING HOMOMORPHISMS 237

Definition 5.4.13
We call the ring (F, +, ∗) a convolution ring from (S, ·) to R.

There are a few common situations in which a subring of Fun(S, R) satisfies the convolution
condition. If the semigroup S is finite, then the condition is satisfied trivially. The semigroup (N, +)
is such for all n ∈ N there is a finite number of pairs (a, b) ∈ N such that a + b = n. Hence, for any
ring R, any subring of Fun(N, R) satisfies the convolution condition.
As a third general example, we consider functions of finite support. For any function f ∈
Fun(S, R), we define the support as

Supp(f ) = {s ∈ S | f (s) 6= 0}.

In Exercise 5.2.22 we proved that the set Funf s (S, R) of functions from S to R of finite support, i.e.,
functions f ∈ Fun(S, R) such that Supp(f ) is a finite set, is a subring of Fun(S, R). Furthermore,
Funf s (S, R) satisfies the convolution condition because all the terms in the summation (5.7) are 0
except possibly for pairs (s1 , s2 ) ∈ Supp(f1 ) × Supp(f2 ), which is a finite set.
We now give some specific examples of convolution rings that we have already encountered.
Example 5.4.14 (Polynomial Rings). Let R be a commutative ring and consider the semigroup
(N, +). Note that (N, +) is isomorphic as a monoid to ({1, x, x2 , . . .}, ×). Coefficients of polynomials
are 0 except for a finite number of terms, so we consider the function ψ : R[x] → Funf s (N, R)
where ψ(a(x)) is the function that to each integer n associates the coefficient an . The function ψ is
obviously injective. Furthermore, given any f ∈ Funf s (S, R), if n = max{i | f (i) 6= 0}, then

ψ(f (n)xn + f (n − 1)xn−1 + · · · + f (1)x + f (0)) = f.

Hence, ψ is a bijection.
Suppose that ψ(a(x)) = f so f (i) = ai and that ψ(b(x)) = g so g(i) = bi . Then

ψ(a(x) + b(x)) = (i 7→ ai + bi ) = f + g = ψ(a(x)) + ψ(b(x)).

According to (5.1),
   
X X
ψ(a(x)b(x)) = k 7→ ai bj  = k 7→ f (i)g(j) = f ∗ g = ψ(a(x)) ∗ ψ(b(x)).
i+j=k i+j=k

Hence, we have shown that R[x] is ring isomorphic to the convolution ring (Funf s (N, R), +, ∗). 4

Example 5.4.15 (Group Rings). Let G be a finite group and let R be a commutative ring. If

α = a1 g1 + a2 g2 + · · · + an gn
β = b1 g1 + b2 g2 + · · · + bn gn

are two elements in R[G] with αβ = c1 g1 + c2 g2 + · · · + cn gn , then


X
ck = ai bj .
(i,j): gi gj =gk

This is precisely the convolution product defined in (5.7). Consequently, as rings, R[G] is isomorphic
to the convolution ring (Fun(G, R), +, ∗). 4

The above example shows that (Fun(G, R), +, ∗) gives precisely the group ring that we defined
in Section 5.2.3 for a commutative ring R and a finite group G. However, this construction can be
generalized to infinite groups and noncommutative rings. For any group G and any ring R, we call
the group ring R[G] the convolution ring (Funf s (G, R), +, ∗).
238 CHAPTER 5. RINGS

Exercises for Section 5.4


1. Prove that there are only two ring homomorphisms f : Z → Z.
2. Find all positive integers n and k such that the function Z → Z/nZ defined by f (a) = ka is a
homomorphism.
3. Prove that the “projection” function π1 : R ⊕ R → R given by π1 (a, b) = a is a ring homomorphism
and determine its kernel.
4. Prove that the function f : Z ⊕ Z → Z given by f (m, n) = m − n is not a homomorphism.
5. Let D√ be an integer not divisible by a square integer (i.e., square-free). Prove that the function
f : Z[ D] → M2 (Z) defined by

 
a b
f (a + b D) =
Db a

is an injective ring homomorphism. Deduce that Z[ D] is ring-isomorphic to Im f .
6. Let R be a commutative ring and let a ∈ R be a fixed element. Consider the function fa : R[x, y] →
R[x] defined by fa (p(x, y)) = p(x, a). Prove that fa is a ring homomorphism.
7. Let R and S be commutative rings and let ϕ : R → S be a homomorphism. Prove that the function
ψ : R[x] → S[x] defined by

ψ(an xn + · · · + a1 x + a0 ) = ϕ(an )xn + · · · + ϕ(a1 )x + ϕ(a0 )

is a homomorphism.
8. Frobenius homomorphism. Let R be a commutative ring of prime characteristic p.
(a) Prove that p divides kp for all integers 1 ≤ k ≤ p − 1.


(b) Prove that the function f : R → R given by f (x) = xp is a homomorphism. In other words,
prove that
(a + b)p = ap + bp and (ab)p = ap bp .
9. Given any set S, the triple (P(S), 4, ∩) has the structure of a ring. Let S 0 be any subset of S. Show
that ϕ(A) = A ∩ S 0 is a ring homomorphism from P(S) to P(S 0 ).
10. Prove that (5Z, +) is group-isomorphic to (7Z, +) but that (5Z, +, ×) is not ring-isomorphic to
(7Z, +, ×).
11. Let U be a set. Show that (Fun(U, Z/2Z), +, ×) is isomorphic to (P(U ), 4, ∩).
12. Show that (Z ⊕ Z)[x] is not isomorphic to Z[x] ⊕ Z[x].
13. Consider the ring (R, +, ×), where R = Z/2Z × Z/2Z as a set, where + is the component-wise addition
but the multiplication is done according to the following table:

× (0, 0) (1, 0) (0, 1) (1, 1)


(0, 0) (0, 0) (0, 0) (0, 0) (0, 0)
(1, 0) (0, 0) (1, 1) (1, 0) (0, 1)
(0, 1) (0, 0) (1, 0) (0, 1) (1, 1)
(1, 1) (0, 0) (0, 1) (1, 1) (1, 0)

Prove that (R, +, ×) is a ring. Also prove that (R, +, ×) is not isomorphic to Z/2Z ⊕ Z/2Z.
14. Prove that Mm (Mn (R)) is isomorphic to Mmn (R).
15. Show that the function ϕ : H → M2 (C) defined by
 
a + bi −c − di
ϕ(a + bi + cj + dk) =
c − di a − bi

is an injective homomorphism. Conclude that H is ring-isomorphic to Im ϕ.


16. Use the reduction homomorphism with n = 5 to show that the polynomial x4 + 3x2 + 4x + 1 has no
roots in Z.
17. Use the reduction homomorphism with n = 3 to show that the polynomial x5 − x + 2 has no roots in
Z.
5.4. RING HOMOMORPHISMS 239

18. Let A be a matrix in M2 (R). Define the function ϕ : R[x] → M2 (R) defined by

ϕ(an xn + an−1 xn−1 + · · · + a1 x + a0 ) = an An + an−1 An−1 + · · · + a1 A + a0 I.

This looks like plugging A into the polynomial except that the constant term becomes the diagonal
matrix a0 I.
 
1 2
(a) Only for this part, take A = . Calculate ϕ(x2 + x + 1) and ϕ(3x3 − 2x).
3 4
(b) Show that ϕ is a ring homomorphism and that Im ϕ is a commutative subring of M2 (R).
(c) For any A ∈ M2 (R), show that the characteristic polynomial fA (x) of A is in Ker ϕ.
19. Suppose that R is a ring with identity 1 6= 0. Let ϕ : R → S be a nontrivial ring homomorphism (i.e.,
ϕ is not identically 0).
(a) Suppose that ϕ(1) is not the identity element in S (in particular, if S does not contain an identity
element). Prove that ϕ(1) is idempotent.
(b) Prove that whether or not S has an identity, Im ϕ has an identity, namely 1Im ϕ = ϕ(1).
(c) Suppose that S contains an identity 1S and that ϕ(1) 6= 1S . Prove that ϕ(1) is a zero divisor.
(d) Deduce that if S is an integral domain, then ϕ(1) = 1S .
(e) Suppose that R and S are integral domains. Prove that a nontrivial ring homomorphism ϕ :
R → S induces a group homomorphism ϕ× : (U (R), ×) → (U (S), ×).
20. Show that the function f : Z/21Z → Z/21Z defined by f (ā) = 7̄ā is an endomorphism on Z/21Z.
21. Let R be a commutative ring and let R[x, y] be the polynomial ring on two variables x and y. Con-
sider the function f : R[x, y] → R[x, y] such that f (p(x, y)) = p(y, x). Prove that f is a nontrivial
automorphism on R[x, y].
22. Denote by End(R) the set of endomorphisms on R (homomorphisms from R to itself).
(a) Show that End(R) is closed under the operation of function composition.
(b) Show that End(R) is closed neither under function addition nor under function multiplication.
23. Augmentation map. Let R be a commutative ring with identity 1 6= 0 and let G be a finite group.
Define the function ψ : R[G] → R by

ψ(a1 g1 + a2 g2 + · · · + an gn ) = a1 + a2 + · · · + an .

Prove that ψ is a ring homomorphism. This function is called the augmentation map of the group
ring R[G].
24. The augmentation map for a group ring generalizes in the following way. Let R be a commutative
ring with identity 1 6= 0 and let G be a finite group. Let f : G → U (R) be a group homomorphism.
Prove that the function ψ : R[G] → R defined by

ψ(a1 g1 + a2 g2 + · · · + an gn ) = a1 f (g1 ) + a2 f (g2 ) + · · · + an f (gn ),

where ai f (gi ) involves a product in the ring R, is a ring homomorphism.


25. Let R be a commutative ring and let G be a group. Prove that is ϕ : G → G is a group automorphism,
then the function Φ : R[G] → R[G] defined by
!
X X
Φ ag g = ag ϕ(g)
g∈G g∈G

is an automorphism of R[G].
26. Let R be a ring with identity 1 6= 0 and let (S, ·) be a monoid (semigroup with an identity e). Prove
that the identity for a convolution ring (F, +, ∗) of functions from S to R is the function i : S → R
defined by (
1 if s = e
i(s) =
0 if s 6= e.

27. Let R be a ring and consider the monoid S = ({1, −1}, ×). Prove that as a set Fun(R, S) is in bijection
with R × R but that the convolution ring (Fun(R, S), +, ∗) is not isomorphic to R ⊕ R.
240 CHAPTER 5. RINGS

28. Prove that ring of formal power series (see Exercise 5.2.19) over a commutative ring R is a convolution
ring.
29. Let R be a commutative ring and consider the semigroup Z. Prove that

F = {f ∈ Fun(Z, R) | ∃N ∈ Z such that f (n) = 0 for all n < N }

is a subring of Fun(Z, R) that satisfies the convolution condition. [Compare to Exercise 5.4.28. If we
view Z as isomorphic to the semigroup ({xn | n ∈ Z}, ×), then the convolution ring (F, +, ×) is called
the ring of formal Laurent series over R and is denoted by R((x)).]

5.5
Ideals
Ideals are an important class of subrings in a ring. We first encounter their importance in the next
section in reference to the construction of quotient rings, where ideals play the role in ring theory
that normal subgroups play in group theory. However, ideals possess many important properties
independent of their role in creating quotient rings.
The concept of an ideal first arose in number theory in the context of studying properties√of
integer extensions, i.e., certain subrings of C that contain Z. Gaussian integers and rings like Z[ 3 2]
are some examples. In integer extensions, numbers do not always have certain desired divisibility
properties but ideals do. (This is a result of Dedekind’s Theorem from algebraic number theory,
a topic beyond the scope of this book but easily accessible with the preparation that this book
provides.) This motivated the term “ideal.” It also turns out that ideals play a pivotal role in
algebraic geometry, a branch of mathematics where the tools of abstract algebra are brought to bear
on the study of geometry. (See Section 12.8.)

5.5.1 – Ideals
Recall that given any subset S in a ring R, if r ∈ R, then the notation rS denotes the set rS =
{rs | s ∈ S} and the notation Sr denotes the set Sr = {sr | s ∈ S}.

Definition 5.5.1
Let R be a ring and let I be any subset of R.

(1) A subset I ⊆ R is called a left ideal (resp. right ideal ) of R if I is a subring of R


and if rI ⊆ I (resp. Ir ⊆ I). In other words, I is closed under left (resp. right)
multiplication by elements of R.
(2) A subset I that is both a right ideal and a left ideal is called an ideal or a two-sided
ideal.

A ring always contains at least two ideals, the subset {0} and itself. The ideal {0} is called the
trivial ideal. Any ideal I ( R is called a proper ideal.
In light of the One-Step Subgroup Criterion, the definition of an ideal can be restated to say that
an ideal of a ring R is a nonempty subset I ⊆ R that is closed under subtraction and closed under
multiplication by any element in R (i.e., ra and ar are in I for all a ∈ I and r ∈ R).
If R is commutative, then left and right ideals are equivalent and hence are also two-sided ideals.
Hence, the distinction between left and right ideals only occurs in noncommutative rings.
Though this does not illustrate the full scope of possible properties for ideals, it is important to
keep as a baseline reference what ideals are in Z.
5.5. IDEALS 241

Example 5.5.2 (Ideals in Z). We claim that all ideals in Z are of the form nZ, where n is a
nonnegative integer. The subset {0} = 0Z is an ideal. Otherwise, let I be an ideal in Z and let n be
the least positive integer in I. This exists by virtue of the Well-Ordering Principle of Z.
Now let m be any integer in I. Integer division of m by n gives m = nq + r for some integer q
and some remainder 0 ≤ r < n. However, since I is closed under multiplication by any element in
the ring, then nq ∈ I and since I is closed under subtraction, r = m − nq ∈ I. Since n is the least
positive element in I and since r is nonnegative with r < n, then r = 0. We conclude that m = qn
and so every element in I is a multiple of n. Consequently, I = nZ. 4

The style of proof in the above example (namely referring to an element that is minimal in some
way) is not uncommon in ring theory. However, it will not apply in all rings. Indeed, the notion
of minimality derives from the partial order ≤ on Z. In contrast, many rings do not possess such a
partial order.
We now give an example where left and right ideals are not necessarily equal.
Example 5.5.3. Let R = M2 (Z) and consider the subset
n a a o
I= | a, c ∈ Z .
c c

Let A, B ∈ I and let C ∈ R. Then


     
a11 a11 b b11 a11 − b11 a11 − b11
A−B = − 11 =
a21 a21 b21 b21 a21 − b21 a21 − b21

so A − B ∈ I. Furthermore,
    
c11 c12 a11 a11 c11 a11 + c12 a21 c11 a11 + c12 a21
CA = =
c21 c22 a21 a21 c21 a11 + c22 a21 c21 a11 + c22 a21

so again CA ∈ I. So far, we have shown that I is a left ideal. However,


    
a11 a11 c11 c12 a11 (c11 + c21 ) a11 (c12 + c22 )
AC = = ,
a21 a21 c21 c22 a22 (c11 + c21 ) a22 (c12 + c22 )

which in general is not in I. Hence, I is not a right ideal. 4

Example 5.5.4. We consider a few ideals and nonideals in the polynomial ring R = R[x].
Let I1 be the set of polynomials whose nonzero term of lowest degree has degree 3 or greater.
Let a(x), b(x) be two polynomials. If ai = bi = 0 for i = 0, 1, 2, then (ai − bi ) = 0 for i = 0, 1, 2 so
a(x), b(x) ∈ I1 implies that a(x) − b(x) ∈ I1 . Now suppose that a(x) ∈ I1 and b(x) is arbitrary, then
the first few coefficients of the terms in a(x)b(x) are

· · · + (a2 b0 + a1 b1 + a0 b2 )x2 + (a0 b1 + a1 b0 )x + a0 b0 .

Since ai = 0 for i = 0, 1, 2, then the terms shown above are 0 and hence a(x)b(x) ∈ I1 . Hence, I1 is
a right ideal and hence is an ideal since R[x] is commutative.
Let I2 be the set of polynomials p(x) such that p(2) = 0 and p0 (2) = 0. Let a(x), b(x) ∈ I2 . Then
a(2) − b(2) = 0 and
d
(a(x) − b(x)) = a0 (2) − b0 (2) = 0.
dx x=2
Thus, I2 is closed under subtraction. Now let a(x) ∈ I2 and let p(x) be an arbitrary real polynomial.
Then a(2)p(2) = 0p(2) = 0 and
d
(a(x)p(x)) = a0 (2)p(2) + a(2)p0 (2) = 0p(2) + 0p0 (2) = 0.
dx x=2

Hence, a(x)p(x) ∈ I2 and we conclude that I2 is an ideal. The ideal I2 corresponds to polynomials
that have a double root at 2. Figure 5.3 shows the graphs of just a few such polynomials.
242 CHAPTER 5. RINGS

Figure 5.3: Some polynomials in the ideal {p(x) ∈ R[x] | p(2) = p0 (2) = 0}

In contrast, let S1 be the subset of polynomials whose degree is 4 or less. It is true that S1 is
closed under subtraction but it is not closed under multiplication (e.g., x × x4 = x5 ∈ / S1 ) so it is
not a subring and hence is not an ideal.
As another nonexample, let S2 be the subset of polynomials whose terms of odd degree are 0.
S2 is closed under subtraction and under multiplication and hence S2 is a subring. However, with
x ∈ R[x] and x2 ∈ S2 , the product x × x2 = x3 ∈/ S2 so S2 is not an ideal. 4
Proposition 5.4.10 already provided an important class of ideals but we restate the result here.

Proposition 5.5.5
Let ϕ : R → S be a ring homomorphism. Then Ker ϕ is an ideal of R.

Example 5.5.6. Let R be a commutative ring with a 1 6= 0 and let G be a finite group with
G = {g1 , g2 , . . . , gn }. Consider the group ring R[G] and consider also the subset
I = {a1 g1 + a2 g2 + · · · + an gn | a1 + a2 + · · · + an = 0}.
This subset I is an ideal by virtue of the fact that it is the kernel of the augmentation map defined
in Exercise 5.4.23. 4

5.5.2 – Ideals Generated by Subsets


There exists a convenient way to describe ideals in a ring using generating subsets.

Definition 5.5.7
Let A be a subset of a ring R.
(1) By the notation (A) we denote the smallest (by inclusion) ideal in R that contains
the set A. We say that the ideal (A) is generated by A.

(2) An ideal that can be generated by a single element set is called a principal ideal.
(3) An ideal that is generated by a finite set A is called a finitely generated ideal.
5.5. IDEALS 243

The student should note that when we refer to the subset (A) of R it is by definition an ideal
and hence there is no need to prove that it is.
The above notation, however, is not explicit since it does not directly offer a means of determining
all elements in (A). There is an explicit way to create an ideal from a subset A ⊂ R. In order to
describe it, we define the following sets.

Definition 5.5.8
Given a ring R and a subset A we can create some ideals from A.

(1) RA denotes the subset of finite left R-linear combinations of elements in A, i.e.,
RA = {r1 a1 + r2 a2 + · · · + rn an | ri ∈ R and ai ∈ A}.
(2) AR denotes the subset of finite right R-linear combinations of elements in A, i.e.,
AR = {a1 r1 + a2 r2 + · · · + an rn | ri ∈ R and ai ∈ A}.
(3) RAR denotes the subset of finite dual R-linear combinations of elements in A, i.e.,
RAR = {r1 a1 s1 + r2 a2 s2 + · · · + rn an sn | ri , si ∈ R and ai ∈ A}.

Proposition 5.5.9
Let R be a ring and let A be any subset.
(1) RA is a left ideal, AR is a right ideal, and RAR is a two-sided ideal.

(2) If R is a ring with an identity 1 6= 0, then (A) = RAR.


(3) If R is commutative with an identity 1 6= 0, then RA = AR = RAR = (A).

Proof. For part (1), let r1 a1 + r2 a2 + · · · + rm am and r10 a01 + r20 a02 + · · · + rn0 a0n be two elements of
RA. Then their difference

r1 a1 + r2 a2 + · · · + rm am − (r10 a01 + r20 a02 + · · · + rn0 a0n )

is another finite linear combination of elements in RA. Furthermore, given any s ∈ R,

s(r1 a1 + r2 a2 + · · · + rm am ) = (sr1 )a1 + (sr2 )a2 + · · · + (srm )am

is also a linear combination in RA. Hence, RA is a left ideal.


The proof that AR is a right ideal is similar.
Let r1 a1 s1 + r2 a2 s2 + · · · + rm am sm and r10 a01 s01 + r20 a02 s02 + · · · + rn0 a0n s0n be two elements in RAR.
Then their difference is

r1 a1 s1 + r2 a2 s2 + · · · + rm am sm + (−r10 )a01 s01 + (−r20 )a02 s02 + · · · + (−rn0 )a0n s0n ,

which is again an element in RAR. If t is any element in R, then

t(r1 a1 s1 + r2 a2 s2 + · · · + rm am sm ) = (tr1 )a1 s1 + (tr2 )a2 s2 + · · · + (trm )am sm


(r1 a1 s1 + r2 a2 s2 + · · · + rm am sm )t = r1 a1 (s1 t) + r2 a2 (s2 t) + · · · + rm am (sm t),

which are both elements of RAR. Hence, RAR is an ideal (two-sided).


For part (2), suppose that R has an identity 1 6= 0. By taking (ri , si ) as 1 or 0 as appropriate,
we deduce that A ⊆ RAR. Hence, RAR is an ideal that contains A. By definition, (A) ⊆ RAR.
However, by definition of ideals, r1 a1 s1 + r2 a2 s2 + · · · + rm am sm is an element of any ideal that also
contains A. Thus, RAR ⊆ (A). Hence, (A) = RAR.
Finally, for part (3) suppose that R is commutative with an identity 1 6= 0. By part (2) we
already have RAR = (A). Commutativity gives RA = AR. Also by commutativity,

r1 a1 s1 + r2 a2 s2 + · · · + rm am sm = r1 s1 a1 + r2 s2 a2 + · · · + rm sm am ,
244 CHAPTER 5. RINGS

so RAR ⊆ RA. However, since R has an identity, by setting si = 1 for i = 1, 2, . . . , m, we obtain all
elements in RA as elements in RAR. Hence, RA ⊆ RAR and thus we have RA = AR = RAR. 

Example 5.5.10. Consider the ideal I = (m, n) in Z. Since Z is commutative,

I = {sm + tn | s, t ∈ Z}.

By Proposition 2.1.12, I = (d) where d = gcd(m, n). Using this result, an induction argument shows
that every finitely generated ideal in Z can be generated by a single element. However, in arbitrary
rings there do exist ideals that are not finitely generated. The above induction argument would not
be sufficient to conclude that all ideals in Z are generated by a single element. On the other hand,
Example 5.5.2 gave a different argument that did prove that all ideals in Z are principle. 4

Example 5.5.11. Consider the ideal I = (5, x2 − x − 2) in Z[x]. This ideal consists of polynomial
linear combinations

5p(x) + (x2 − x − 2)q(x) with p(x), q(x) ∈ Z[x]. (5.8)

As in Example 5.5.10, just because the ideal is expressed using two generators does not mean that
two generators are necessary. Assume that I = (a(x)) for some polynomial a(x). Then since 5 ∈ I,
we have a(x)r(x) = 5 for some r(x) ∈ Z[x]. Hence, by degree considerations, deg a(x) = 0 and thus
a(x) must be either 1 or 5. Now x2 − x − 2 has coefficients that are not multiplies of 5 so since
x2 − x − 2 ∈ I, we cannot have I = (5). Hence, if I is a principal ideal, then I = (1) = Z[x].
However, we notice that every polynomial r(x) in the form of (5.8) has the property that r(2) is a
multiple of 5. This is not the case for all polynomials in Z[x]. Hence, I 6= Z[x]. The assumption
that I is principle leads to a contradiction, so I does require two generators. 4

The following examples illustrate how the conditions in the various parts of Proposition 5.5.9 are
required for the result to hold.

Example 5.5.12. Let R be the ring 2Z and consider the subset A = {4}. The ideal (4) = 4Z
whereas RA = AR = 8Z and RAR = 16Z. 4

Example 5.5.13. Consider the the ring R = M2 (Z), which is a ring with an identity 1 6= 0 but a
noncommutative ring. Let A be the matrix
 
0 1
a= .
0 0

The notation Ra means R{a} and consists of just left multiples ba of a. (The linear combinations
collapse to just one multiple of a.) Also aR consists of just right multiples of A. However, by
properties of ranks of matrices, since rank a = 1, the rank of any multiple of a (left or right)
can be at most 1. Hence, Ra and aR are strict subsets of R. From Proposition 5.5.9(2), we
know that RaR = (a). However, it can be shown (see Exercise 5.5.9) that (a) = M2 (Z) so that
Ra = aR 6= (a) = RaR. 4

Proposition 5.5.14
Let R be a ring with an identity 1 6= 0. An ideal I is equal to R if and only if I contains a
unit.

Proof. Suppose that I = R. Then I contains 1, which is a unit. Conversely, suppose that I
contains a unit u. Suppose that v is the inverse of u. Now let r ∈ R be any element. Then
r = r(vu) = (rv)u ∈ I. Thus, R ⊆ I and hence R = I. 
5.5. IDEALS 245

Proposition 5.5.15
Let R be a commutative ring with an identity 1 6= 0. R is a field if and only if its only
ideals are (0) and (1).

Proof. (Left as an exercise. See Exercise 5.5.23.) 

5.5.3 – Principal Ideal Domains


Example 5.5.2 established an important property of the integers, namely that every ideal is principal.
Rings with this property benefit from many associated nice properties. For this reason, we give a
name to this particular class of rings.

Definition 5.5.16
A principal ideal domain is an integral domain R in which every ideal is principal. We
often abbreviate the name and call such a ring a PID.

We will encounter properties of PIDs in the exercises and in subsequent sections.

5.5.4 – Operations on Ideals

Definition 5.5.17
Let I, J be ideals in a ring R. We define the following operations on ideals:
(1) The sum, I + J = {a + b | a ∈ I and b ∈ J}.
(2) The product, IJ consists of finite sums of elements ai bi with ai ∈ I and bi ∈ J. Thus,

IJ = {a1 b1 + a2 b2 + · · · + an bn | ai ∈ I, bi ∈ J}.

(3) The power I k is the iterated product operation on I. We also define I 0 = R.

Proposition 5.5.18
Let I, J be ideals in a ring R. Then I + J, IJ, and I ∩ J are ideals of R. Furthermore, the
ideals satisfy the following containment relations.

I
⊆ ⊆
IJ ⊆ I ∩J I +J
⊆ ⊆
J

Proof. Let a1 b1 + a2 b2 + · · · + am bm and a01 b01 + a02 b02 + · · · + a0n b0n be two elements in the IJ. Then
their difference

a1 b1 + a2 b2 + · · · + am bm + (−a01 )b01 + (−a02 )b02 · · · + (−a0n )b0n

is also a combination in IJ. Furthermore, for all t ∈ R,

t(a1 b1 + a2 b2 + · · · + am bm ) = (ta1 )b1 + (ta2 )b2 + · · · + (tam )bm ,


(a1 b1 + a2 b2 + · · · + am bm )t = a1 (b1 t) + a2 (b2 t) + · · · + am (bm t).
246 CHAPTER 5. RINGS

But for all i = 1, 2, . . . , m we have tai ∈ I since I is an ideal and bi t ∈ J since J is an ideal. Hence,
multiplying any element in IJ by any element in R produces an element in IJ. Thus, IJ is an ideal
of R.
(We leave the proof that I + J and that I ∩ J are ideals as an exercise. See Exercise 5.5.25.)
Some containments are obvious, namely, I ∩ J ⊆ I and I ∩ J ⊆ J. Since 0 is an element of
every ideal, I ⊆ I + J and J ⊆ I + J. Finally, let a ∈ I and b ∈ J. Then ab ∈ I because I is an
ideal and a ∈ I but also ab ∈ J because J is an ideal and b ∈ J. Thus, in a linear combination
a1 b1 + a2 b2 + · · · + an bn every product ai bi ∈ I ∩ J and hence the full linear combination is in I ∩ J.
We conclude that IJ ⊆ I ∩ J. 

Example 5.5.19. Let R = Z and let I = 12Z and J = 45Z. The ideal operations listed in
Definition 5.5.17 are

IJ = 540Z, I ∩ J = 180Z, and I + J = 3Z.

We observe that IJ = (12 × 45), I ∩ J = (lcm(12, 45)), and I + J = (gcd(12, 45)). Consequently, the
operations on ideals directly generalize the notions of product, least common multiple, and greatest
common divisor. 4

Since ideals generalize the notion of least common multiple and greatest common divisor, we
should have an equivalent concept for relatively prime. Recall that a, b ∈ Z are called relatively
prime if gcd(a, b) = 1. In ring theory, the generalized notion is similar.

Definition 5.5.20
Let R be a ring with an identity 1 6= 0. Two ideals I and J of R are called comaximal if
I + J = R.

Note that this definition only applies when a ring has an identity.
For example, (2) and (3) are comaximal in Z. As another example, consider the ideals (x2 ) and
(x + 1). We have (x2 ) + (x + 1) = (x2 , x + 1). But (1 − x)(x + 1) + 1x2 = 1 is in (x2 , x + 1) and
hence, (x2 , x + 1) = Z[x]. Thus, (x2 ) and (x + 1) are comaximal ideals.
It is easy to give generating sets of I + J and IJ from generating sets of I and J. The proofs of
these claims are left as exercises but we mention them here because of their importance. Suppose
that I and J are generated by certain finite sets of elements, say I = (a1 , a2 , . . . , am ) and J =
(b1 , b2 , . . . , bn ). Then I + J = (a1 , a2 , . . . , am , b1 , b2 , . . . , bn ) and the set IJ is generated by the set
{ai bj | i = 1, 2, . . . , m, j = 1, 2, . . . , n}.
We conclude the section by mentioning two other operations on ideals, which, in the context of
commutative rings, produce other ideals. The exercises explore some questions and examples related
to these operations.

Definition 5.5.21

Let I be an ideal in a commutative ring R. The radical ideal of I, denoted by I, is

I = {r ∈ R | rn ∈ I for some n ∈ N∗ }.

Definition 5.5.22
Let I and J be ideals in a commutative ring R. The fraction ideal of I by J, denoted
(I : J) is the subset
(I : J) = {r ∈ R | rJ ⊆ I}.
5.5. IDEALS 247

Exercises for Section 5.5


1. Prove that Z is not an ideal of Q. Find all the ideals of Q.
2. List all the ideals of Z/12Z.
3. Let R = M2 (Z). Let I be the set of matrices whose entries are multiples of 10. Show that I is an
ideal of R.
4. Modify Example 5.5.3 to find an ideal in M2 (Z) that is a right ideal but not a left ideal. Prove your
claim.
5. Let R = Z[x]. Which of the following subsets are ideals of R? Are any subrings of R but not ideals?
(a) The set of polynomials whose coefficients are even.
(b) The set of polynomials such that every other coefficient (starting at 0) is even.
(c) The set of polynomials such that the 0’th coefficient is even.
(d) The set of polynomials with odd coefficients.
(e) The set of polynomials whose terms all have even degrees.
6. Let S = {p(x) ∈ R[x] | p0 (3) = 0}. Prove that S is a subring of R[x] but not an ideal.
7. Let S be a set and consider the ring R = (P(S), 4, ∩). Prove that if S 0 ⊂ S, then the subring
R0 = (P(S 0 ), 4, ∩) is an ideal.
8. Let I1 , I2 , . . . , In be n ideals in a ring R. Consider the matrix ring Mn (R).
(a) Prove that the set of matrices in Mn (R) where all the elements of the k’th column are elements
of Ik is a left ideal of Mn (R).
(b) Prove that the set of matrices in Mn (R) where all the elements of the k’th row are elements of
Ik is a right ideal of Mn (R).
 
0 1
9. Let R = M2 (Z) and let A = .
0 0
(a) Determine all the matrices in RA and all the matrices in AR.
(b) Obtain the matrices        
a 0 0 b 0 0 0 0
, , ,
0 0 0 0 c 0 0 d
as elements in RAR.
(c) Deduce that RAR = (A) = M2 (Z).
10. Let R = M3 (R) and let  
n 0 1 0 o
A= 0 0 1 .
0 0 0
Determine explicitly RA, AR, RAR, and (A).
11. Let R be an integral domain. Prove that (a) = (b) in R if and only if a = bu for some unit u ∈ U (R).
12. Let R be a ring. Prove that the subring in M2 (R) of upper triangular matrices is not an ideal of
M2 (R).
13. Let k be a positive integer. Prove that Mn (kZ) is an ideal in Mn (Z).
14. Prove that if I is an ideal in a ring R and S is a subring that I ∩ S is an ideal of S.
15. Let R be a commutative ring and let I be an ideal in R.
(a) Prove that I[x] = {an xn + · · · + a1 x + a0 | ai ∈ I for all i} is an ideal of R[x].
(b) Let a ∈ R be fixed. Prove that {p(x) ∈ R[x] | p(a) ∈ I} is an ideal of R[x].
(c) Let k be a nonnegative integer. Prove that {an xn + · · · + a1 x + a0 | ai ∈ I for 0 ≤ i ≤ k} is an
ideal of R[x].
(d) Prove that {an xn + · · · + a1 x + a0 | a2 ∈ I} is not an ideal of R[x].
16. Let R be a commutative ring. Prove that (a) ⊆ (b) if and only if there exists r ∈ R such that a = rb.
248 CHAPTER 5. RINGS

17. Show that the ideal (2, x) in Z[x] is not a principal ideal. Conclude that Z[x] is not a PID.
18. Consider the ideal I = (13x + 16y, 11x + 13y) in the ring Z[x, y].
(a) Prove that I = (x − 2y, 3x + y). [Hint: By mutual inclusion.]
(b) Prove that (7x, 7y) ⊆ I but prove that this inclusion is strict.
19. Prove that Q[x, y] is not a PID.
20. In the ring R[x, y], let I = (ax + by − c, dx + ey − f ) where a, b, c, d, e, f ∈ R.
(a) Prove that if the lines ax + by = c and dx + ey = f intersect in a single point (r, s), prove that
I = (x − r, y − s).
(b) Prove that if the lines ax + by = c and dx + ey = f are parallel, then I = R[x, y].
(c) Prove that if the lines ax + by = c and dx + ey = f are the same, then I = (ax + by − c).
21. Consider the ideal I = (x, y) in the ring R[x, y]. Find a generating set for I k and show that I k requires
a minimum of k + 1 generators.
22. Let ϕ : R → S be a ring homomorphism.
(a) Show that if J is an ideal in S, then ϕ−1 (J) is an ideal in R.
(b) Show that if I is an ideal in R, then ϕ(I) is not necessarily an ideal in S.
(c) Show that if ϕ is surjective and I is an ideal of R, then ϕ(I) is an ideal of S.
23. Suppose that R is a commutative ring with 1 6= 0. Prove that R is a field if and only if its only ideals
are (0) and (1).
24. Suppose that R is a commutative ring with 1 6= 0. Prove that a principal ideal (a) = R if and only
if a is a unit. [Exercise 5.5.9 gives an example of a noncommutative ring where this result does not
hold.]
25. Let I and J be ideals of a ring R. Prove that: (a) I + J is an ideal of R and (b) I ∩ J is an ideal of R.
26. Let C be an arbitrary (not necessarily finite) collection of ideals of a ring R. Prove that
\
I
I∈C

is an ideal of R.
27. Suppose that I and J are ideals in a ring R that are generated by certain finite sets of elements, say
I = (a1 , a2 , . . . , am ) and J = (b1 , b2 , . . . , bn ).
(a) Prove that I + J = (a1 , a2 , . . . , am , b1 , b2 , . . . , bn ).
(b) Prove that IJ is generated by the set of mn elements, {ai bj | i = 1, 2, . . . , m, j = 1, 2, . . . , n}.
28. Let I and J be ideals of a ring R. Prove that I ∪ J is not necessarily an ideal of R.
29. Let I, J, K be ideals of a ring R. Prove that:
(a) I(J + K) = IJ + IK;
(b) I(JK) = (IJ)K.
30. Let I1 ⊆ I2 ⊆ · · · ⊆ Ik ⊆ · · · be a chain (by in the partial order of inclusion) of ideals in a ring R.
Prove that

[
Ik
k=1
is an ideal in R.
31. Let R be a commutative ring and let I be an ideal of R. Prove that the subset

{an xn + · · · + a1 x + a0 ∈ R[x] | ak ∈ I k }

is a subring of R[x] but not necessarily an ideal.


32. Let Un (R) be the ring of upper triangular n × n matrices with coefficients in a ring R. Let I be an
ideal in R. Prove that
S = {(aij ) ∈ Un (R) | aij ∈ I j−i }
is a subring of Un (R) but not an ideal.
5.6. QUOTIENT RINGS 249

33. Let R be a commutative ring.


(a) Prove that if I and J are comaximal ideals, then IJ = I ∩ J.
(b) Prove that if I1 , I2 , . . . , In is a finite collection of pairwise comaximal ideals, then

I1 I2 · · · In = I1 ∩ I2 ∩ · · · ∩ In .

34. Let R be a commutative ring and let I be an ideal in R. Prove that the radical defined in Defini-
tion 5.5.21 is an ideal.
p p p
35. Let R = Z. Calculate the following radical ideals: (a) (72); (b) (105); (d) (243).
√ √
36. Let I be an ideal in a commutative ring R. Prove that the radical of I is again I.
p
37. Let R be a commutative ring. Show that the set NR of nilpotent elements is equal to the ideal (0).
[For this reason, the subring of nilpotent elements in a commutative ring is often called the nilradical
of R.]
38. In the ring Z, prove the following fraction ideal equalities.
(a) ((2) : (0)) = Z
(b) ((24) : (4)) = (6)
(c) ((17) : (15)) = (17)

5.6
Quotient Rings
The introduction to Chapter 4 motivated the construction of quotient groups with modular arith-
metic. However, modular arithmetic has a ring structure and (Z/nZ, +, ×) is an example of a
quotient ring. This section parallels the discussion for quotient groups and introduces the quotient
object construction in the algebraic structure of rings.

5.6.1 – Quotient Rings


The concept of a quotient ring combines the process of taking a quotient set in such a way that the
ring operations induce a ring structure on the quotient set.
Let (R, +, ×) be a ring and let ∼ be an equivalence relation on R that behaves well with respect
to both + and ×. We recall from Section 4.3.1 that this means that for all r1 , r2 , s1 , s2 ∈ R,

r1 ∼ r2 and s1 ∼ s2 =⇒ (r1 + s1 ) ∼ (r2 + s2 ) and (r1 × s1 ) ∼ (r2 × s2 ).

According to Proposition 4.3.2, since ∼ behaves well with respect to +, then the equivalence
classes of ∼ are the cosets of a subgroup (A, +) of (R, +) where A is a normal subgroup. However,
since (R, +) is an abelian group, then any subgroup A ≤ R is normal. However, not all subgroups
(A, +) of the additive group (R, +) are such that the equivalence classes associated to the cosets of
A behave well with respect to ×.

Proposition 5.6.1
Let (R, +, ×) be a ring. An equivalence relation ∼ on R behaves well with respect to +
and × if and only if the equivalence classes of ∼ are the cosets of an ideal I.

Proof. We already know that ∼ behaves well with respect to + if and only if the equivalence classes
of ∼ are the cosets of some subgroup I of (R, +). Consequently, r1 ∼ r2 if and only if r2 − r1 ∈ I.
250 CHAPTER 5. RINGS

Suppose that ∼ behaves well with respect to + and ×. Let r1 ∼ r2 so that r2 − r1 = a ∈ I and
let s1 ∈ R be arbitrary. Since s1 ∼ s1 , then r1 s1 ∼ r2 s1 so

r2 s1 − r1 s1 = (r2 − r1 )s1 = as1 ∈ I.

Thus, I is closed by multiplication on the right by any element in R. Similarly, let s1 ∼ s2 so that
s2 − s1 = a0 ∈ I and let r1 ∈ R be arbitrary. Since r1 ∼ r1 , then r1 s1 ∼ r1 s2 so

r1 s2 − r1 s1 = r1 (s2 − s1 ) = r1 a0 ∈ I.

Thus, I is closed by multiplication on the left by any element in R. We have shown that I must be
an ideal of R.
Conversely, suppose that ∼ is the equivalence relation on R whose equivalence classes are the
cosets r + I of some ideal I in R. We already know that ∼ behaves well with respect to +. Let
r1 , r2 , s1 , s2 ∈ R with r1 + I = r2 + I and s1 + I = s2 + I. Then r2 − r1 = a ∈ I and s2 − s1 = a0 ∈ I.
So,
r2 s2 − r1 s1 = (r1 + a)(s1 + a0 ) − r1 s1 = r1 a0 + as1 + aa0 .
By the properties of ideals, since a, a0 ∈ I, then r1 a0 , as1 , aa0 ∈ I and hence r1 a0 + as1 + aa0 ∈ I.
Thus, ∼ behaves well with respect to ×. 

Proposition 5.6.2
The cosets in the additive quotient R/I form a ring with + and × defined by
def def
(a + I) + (b + I) = (a + b) + I and (a + I) × (b + I) = (ab) + I. (5.9)

Proof. From the quotient group construction in group theory, we already know that (R/I, +) is an
abelian group.
def
The main part of the proof is to check that (a + I) × (b + I) = (ab) + I is well-defined. Since ×
in R behaves well with respect to the equivalence relation created by the additive cosets of I, then
r1 + I = r2 + I and s1 + I = s2 + I. From Proposition 5.6.1, we deduce that (r1 s1 ) + I = (r2 s2 ) + I.
def
Now that we know that (a + I) × (b + I) = (ab) + I is well-defined, then using the right-hand
side of this expression, it follows easily that in R/I, the operation × is associative and distributive
over +. 

Proposition 5.6.1 establishes not only that cosets of an ideal I in a ring R determine an equivalence
relation that behaves well with respect to the operations of R, but more importantly that this
is the only type of equivalence relation on R that behaves well with respect to the operations.
Proposition 5.6.2 now justifies the central definition of this section.

Definition 5.6.3
Let R be a ring and let I be a (two-sided) ideal in R. The ring R/I with addition and
multiplication defined in Proposition 5.6.1 is the quotient ring of R with respect to I.

We point out one minor technicality in the notation. As subsets of R, the set (a + I) × (b + I)
is not necessarily equal to the subset (ab) + I in R. As subsets of R, we can only conclude that
(a + I) × (b + I) ⊆ (ab) + I and this inclusion is often proper. For example, let R = Z, let I = 11Z
and consider the cosets 2 + 11Z and 6 + 11Z. In the quotient ring

(2 + 11Z) × (6 + 11Z) = 12 + 11Z = 1 + 11Z.

As subsets in Z,
(2 + 11Z)(6 + 11Z) = {(2 + 11k)(6 + 11`) | k, ` ∈ Z}.
5.6. QUOTIENT RINGS 251

But (2 + 11k)(6 + 11`) = 12 + 22` + 66k + 121k`. Though 23 ∈ 1 + 11Z, we can show that there
exist no integers k, ` ∈ Z such that 23 = 12 + 22` + 66k + 121k`. Assume that there did. Then
11 = 22` + 66k + 121k` so 1 = 2` + 6k + 11k`. We deduce that 1 − 2` = k(6 + 11`) so 6 + 11` divides
2` − 1. If ` > 0, then 0 < 2` − 1 < 6 + 11` so 6 + 11` does not divide 2` − 1. Obviously, if ` = 0,
then 6 + 11` does not divide 2` − 1. If ` < 0, then
7 < 9(−`) =⇒ 7 + 2(−`) < 11(−`) =⇒ 1 + 2(−`) < −6 + 11(−`) =⇒ |1 − 2`| < |6 + 11`|
so again it is not possible for 6 + 11` to divide 2` − 1. Hence, this contradicts our assumption that
23 ∈ (2 + 11Z)(6 + 11Z).
Partly because of this technicality, it is even more common in ring theory to borrow the notation
from modular arithmetic and denote a coset r + I in R/I by r. As always, with this notation, the
ideal I must be clear from context. Also inspired from modular arithmetic, we may say that r and
s are congruent modulo I whenever r + I = s + I. Note that the operations in (5.9) are such that
the correspondence r 7→ r̄ from R to R/I is a homomorphism.

Definition 5.6.4
Given a ring R and an ideal I ⊆ R, the homomorphism π : R → R/I defined by π(r) =
r̄ = r + I is called the natural projection of R onto R/I.

Example 5.6.5. The first and most fundamental example comes from the integers R = Z. The
subring I = nZ is an ideal. The quotient ring is R/I = Z/nZ, the usual ring of modular arithmetic.
The reader can now see that our notation for modular arithmetic in fact comes from our notation
for quotient rings. 4
Example 5.6.6. Example 5.5.4 presented two ideals I1 and I2 in R[x]. We propose to describe the
corresponding quotient rings.
Let I1 be the ideal of polynomials whose nonzero term of lowest degree has degree 3 or greater.
Using the generating subset notation, this ideal can be written as I1 = (x3 ). In R/I1 , we have
a(x) = b(x) if and only if b(x) − a(x) ∈ I1 , so b(x) − a(x) = x3 p(x) for some polynomial p(x). Thus,
a(x) = b(x) if and only if a0 = b0 , a1 = b1 and a2 = b2 . Hence, for every polynomial a(x) ∈ R[x], we
have equality a(x) = b(x) for some unique polynomial b(x) of degree 2, namely b(x) = a2 x2 +a1 x+a0 .
Because addition and multiplication behave well with respect to the quotient ring process, we
can write any polynomial in R/I1 as
a2 x2 + a1 x + a0 = a2 x2 + a1 x + a0 .
Now for any two real numbers, ab = ab. However, the element x in the quotient ring has the
particular property that x3 = x3 = 0. In particular, any power of x greater than 2 gives 0. As a
sample calculation,
(x2 + 3x + 7)(2x2 − x + 3) = 3x2 − 3x2 + 9x + 14x − 7x + 21
= 14x2 + 2x + 21.
Now consider the ideal I2 defined as the set of polynomials p(x) such that p(2) = 0 and p0 (2) = 0.
In Section 6.3.3 we will offer a characterization of I2 with generators. However, we can understand
R/I2 without it. Every polynomial p(x) ∈ R[x] satisfies
p(x) = p(2) + p0 (2)(x − 2).
Hence, every polynomial is congruent modulo I2 to the 0 polynomial or a unique polynomial of
degree 1 or less. Though we could write such polynomials as a + bx, it is more convenient to write
the polynomials in R/I2 are a + b(x − 2). Addition in R/I2 is performed component-wise and the
multiplication is
(a + b(x − 2))(c + d(x − 2)) = ac + (ad + bc)(x − 2) + bd(x − 2)2
= ac + (ad + bc)(x − 2), (5.10)
252 CHAPTER 5. RINGS

2 − 12 x

Figure 5.4: Some polynomials in the coset 2 − 12 x + I2

where we replaced the product polynomial p(x) of (possibly) degree 2 with the polynomial p(2) +
p0 (2)(x − 2).
Since p(x) × q(x) = p(x)q(x), the product in (5.10) shows that

p(x)q(x) = p(2)q(2) + (p0 (2)q(2) + p(2)q 0 (2))(x − 2),

which recovers the product rule for derivatives.


Interestingly enough, the congruence class a + bx with respect to I2 has a clear interpretation:
It represents all polynomials that have a + bx as the tangent line above x = 2. Figure 5.4 illustrates
a few polynomials in 2 − 12 x. Hence, all the polynomials shown are congruent to each other modulo
I2 and thus are equal in R[x]/I2 . The addition (resp. the multiplication) between a + bx and c + dx
in R[x]/I2 corresponds to the tangent line at x = 2 of the addition (resp. the product) any two
polynomials with tangent lines y = a + bx and y = c + dx at x = 2. 4

The ideal I1 in Example 5.6.6 illustrates a common situation with quotient rings in polynomial
rings. In the exercises, the reader is guided to prove the following results. (See Exercise 5.6.9.) Let
R be a commutative ring with an identity 1 6= 0. Suppose that a(x) = an xn + · · · + a1 x + a0 ∈ R[x]
with an ∈ U (R). In the quotient ring R[x]/(a(x)), for every polynomial p(x) there exists a unique
q(x) ∈ R[x] with deg q(x) < n such p(x) = q(x). Furthermore, in the quotient ring R[x]/(a(x)), the
element x satisfies
xn = −a−1n (an−1 x
n−1
+ · · · + a1 x + a0 ).
Repeated application of this identity governs the multiplication operation in R[x]/(a(x)).
Example 5.6.7. Recall Example 5.5.11 which discussed the ideal I = (5, x2 − x − 2) in Z[x]. We
propose to describe Z[x]/I. Since 5 ∈ I, any element a0 ∈ Z satisfies

a0 = r

in Z[x]/I, where r is the remainder of a0 when divided by 5. So

Z[x]/I ∼
= (Z/5Z)/(x2 − x − 2) ∼
= (Z/5Z)[x]/(x2 + 4x + 3).

Then every element in (Z/5Z)[x]/(x2 + 4x + 3) can be written as a + bx, where a, b ∈ Z/5Z. It is


also not uncommon to write the elements as a + bx, where x satisfies x2 = x + 2. 4
5.6. QUOTIENT RINGS 253

The construction of quotient rings allows us to define many rings, the nature of whose elements
seem less and less removed from something familiar. However, from an algebraic perspective, their
ontology is no less strange than any other ring encountered “naturally.”
When mathematicians first explored complex numbers, they dubbed the elements bi, where b ∈ R,
as imaginary because they considered such numbers so far removed from reality. With the formalism
of quotient rings, complex numbers arise naturally as the quotient ring R[x]/(x2 + 1). Indeed, every
element in R[x]/(x2 + 1) can be written uniquely as a + bx with a, b ∈ R and, since x2 + 1 = 0,
the variable satisfies x2 = −1. Hence, x has exactly the algebraic properties of the unit imaginary
number i.

5.6.2 – Isomorphism Theorems


In the study of groups, the isomorphism theorems established internal structure within groups and
between groups related by homomorphisms. In a ring (R, +, ×), the pair (R, +) is an abelian group
so the group-theoretic isomorphism theorems apply to any subgroup. As it turns out, if I is an ideal,
the group isomorphisms are in fact isomorphisms of rings.

Theorem 5.6.8 (First Isomorphism Theorem for Rings)


If ϕ : R → S is a homomorphism of rings then Ker ϕ is an ideal of R and R/ Ker ϕ is
isomorphic to ϕ(R) as rings. Furthermore, if R is a ring and I is an ideal, then the natural
projection (Definition 5.6.4) onto R/I is a surjective homomorphism and has Ker ϕ = I.

Proof. We already saw that the kernel Ker ϕ is an ideal. The proof of the First Isomorphism
Theorem of groups shows that the association Φ : R/(Ker ϕ) → S given by Φ(r + (Ker ϕ)) = ϕ(r) is
a well-defined function that is injective and that satisfies the homomorphism criteria for addition.
To establish this theorem, we only need to prove that Φ satisfies the homomorphism criteria for
the multiplication. Let r1 , r2 ∈ R/(Ker ϕ). Then

Φ(r1 r2 ) = Φ(r1 r2 ) = ϕ(r1 r2 ) = ϕ(r1 )ϕ(r2 ) = Φ(r1 )Φ(r2 ).

Thus, Φ is an injective ring homomorphism and hence establishes a ring isomorphism between
R/(Ker ϕ) and the subring Im ϕ in S. 

Example 5.6.9. For any ring R, the subset {0} is an ideal of R and R/{0} is isomorphic to R. To
see this using the First Isomorphism Theorem, we use the identity homomorphism i : R → R. It is
obviously surjective and the kernel is Ker i = {0}. Hence, {0} is an ideal with R/{0} ∼
= R. 4

Example 5.6.10 (Augmentation Map Kernel). Let R be a commutative ring and let G be a
finite group with G = {g1 , g2 , . . . , gn }. Let
( n )
X
I= ai gi ∈ R[G] a1 + a2 + · · · + an = 0 .
i=1

We saw that this is an ideal by virtue of being the kernel of the augmentation map. Since the
augmentation map is surjective, by the First Isomorphism Theorem, R[G]/I ∼
= R. 4

An important application of the First Isomorphism Theorem involves the so-called reduction
homomorphism in polynomial rings. Let R be a commutative ring and let I be an ideal in R. The
reduction homomorphism ϕ : R[x] → (R/I)[x] is defined by

ϕ(am xm + · · · + a1 x + a0 ) = ām xm + · · · + ā1 x + ā0 ,

where ā is the coset of a in R/I. Because taking cosets into R/I behaves well with respect to + and
×, it is easy to show that ϕ is in fact a homomorphism as claimed. The kernel of ϕ is Ker ϕ = I[x].
The First Isomorphism Theorem leads to the following result.
254 CHAPTER 5. RINGS

Proposition 5.6.11
Let R be a ring and let I be an ideal. Then the subring I[x] of R[x] is an ideal and

R[x]/I[x] ∼
= (R/I)[x].

Listed below are the Second, Third, and Fourth isomorphism theorems for rings. We list them
without proof and request the reader to prove them in the exercises. The proofs are very similar to
the proofs for the corresponding group isomorphism theorems.

Theorem 5.6.12 (Second Isomorphism Theorem)


Let R be any ring, A a subring and let B be an ideal of R. Then A + B is a subring of R,
A ∩ B is an ideal of A and
(A + B)/B ∼ = A/(A ∩ B).

Theorem 5.6.13 (Third Isomorphism Theorem)


Let R be any ring and let I ⊂ J be ideals of R. Then J/I is an ideal of R/I and

(R/I)/(J/I) ∼
= R/J.

Theorem 5.6.14 (Fourth Isomorphism Theorem)


Let R be a ring and let I be an ideal. The correspondence A ↔ A/I is an inclusion
preserving bijection between the set of subrings of R/I and the set of subrings of R that
contain I. Furthermore, a subring A of R is an ideal if and only if A/I is an ideal of R/I.

Example 5.6.15. A simple application of the Third Isomorphism Theorem for rings arises in mod-
ular arithmetic. Let R = Z and I = (12) = 12Z. Now J = (4) = 4Z is also an ideal of Z. The Third
Isomorphism Theorem for this situation is

(Z/12Z)/(4Z/12Z) ∼
= Z/4Z. 4

5.6.3 – Chinese Remainder Theorem


The Chinese Remainder Theorem is a result in modular arithmetic that generalizes to quotient rings.
In its form applied to modular arithmetic, it first appears circa the 4th century in the work of the
Chinese mathematician Sun Tzu (not to be confused with the military general of the same name,
known for The Art of War ).
We first present the Chinese Remainder Theorem in its form applied to modular arithmetic.

Theorem 5.6.16
Let n1 , n2 , . . . , nk be integers greater than 1 which are pairwise relatively prime, i.e.,
gcd(ni , nj ) = 1 for i 6= j. For any integers a1 , a2 , . . . , ak ∈ Z, the system of congruences


x ≡ a1 (mod n1 )

x ≡ a2

(mod n2 )
..


 .

x ≡ ak (mod nk )

has a unique solution x modulo n = n1 n2 · · · nk .


5.6. QUOTIENT RINGS 255

Proof. For each i, set n0i = n/ni . Now ni and n0i are relatively prime so there exist integers si and
ti such that
si ni + ti n0i = 1.
Then for all i, we have ti n0i ≡ 1 (mod ni ) so ai ti n0i ≡ ai (mod ni ). Consider the integer given by

x = a1 t1 n01 + a2 t2 n02 + · · · + ak tk n0k .

Then, since n0j ≡ 0 (mod ni ) if i 6= j, we have x ≡ ai (mod ni ) for all i. Hence, x satisfies all of the
congruence relations.
If another integer y satisfies all of the congruence conditions, then x − y is congruent to 0 for all
ni , and hence n|(x − y), establishing uniqueness of the solution. 

Example 5.6.17. Find an x such that x ≡ 3 (mod 8), x ≡ 2 (mod 5), and x ≡ 7 (mod 13). The
theorem confirms that there exists a unique solution for x modulo 520. The proof provides a method
to find this solution. We have

n1 = 8, n2 = 5 n3 = 13; and n01 = 65, n02 = 104, n03 = 40.

We need to calculate ti as the inverse of n0i modulo ni . Though we could use group theoretic methods,
of the Extended Euclidean Algorithm to do this, in this example the integers ni are small enough
that trial and error suffices. We find

t1 = 1, t2 = 4, t3 = 1.

Thus, according to the above proof, the solution to the system of congruence equations is

x = a1 t1 n01 + a2 t2 n02 + a3 t3 n03 = 3 × 1 × 65 + 2 × 4 × 104 + 7 × 1 × 40 = 1307.

Hence, the solution to the system is x ≡ 267 (mod 520). 4

To generalize the above theorem to rings, a congruence is tantamount to equality in a quotient


ring and the notion of relatively prime corresponds to comaximality of ideals.

Theorem 5.6.18 (Chinese Remainder Theorem)


Let R be a ring commutative ring with an identity 1 6= 0 and let A1 , A2 , · · · , Ak be ideals
in R. The map

R → R/A1 ⊕ R/A2 ⊕ · · · ⊕ R/Ak


r 7→ (r + A1 , r + A2 , . . . , r + Ak )

is a ring homomorphism with kernel A1 ∩A2 ∩· · ·∩Ak . If the ideals are pairwise comaximal,
then this map is surjective, A1 A2 · · · Ak = A1 ∩ A2 ∩ · · · ∩ Ak , and

R/(A1 A2 · · · Ak ) ∼
= (R/A1 ) ⊕ (R/A2 ) ⊕ · · · ⊕ (R/Ak ),

as a ring isomorphism.

Proof. We prove the result first for k = 2 and then extend by induction.
Let A1 and A2 be two comaximal ideals and consider ϕ : R → R/A1 ⊕ R/A2 defined as in the
statement of the theorem. Since A1 and A2 are comaximal, there exist a1 ∈ A1 and a2 ∈ A2 such
that a1 + a2 = 1. Let r, s ∈ R be arbitrary and let x = ra2 + sa1 . Then

ϕ(x) = (x + A1 , x + A2 ) = (ra2 + A1 , sa1 + A2 )


= (r − ra1 + A1 , s − sa2 + A2 ) = (r + A1 , s + A2 ).
256 CHAPTER 5. RINGS

Hence, ϕ is surjective. The kernel of ϕ is {r ∈ R | r ∈ A1 and r ∈ A2 } = A1 ∩ A2 . Since A1 and


A2 are comaximal, by the result of Exercise 5.5.33, A1 ∩ A2 = A1 A2 . By the First Isomorphism
Theorem (for rings),
R/(Ker ϕ) = R/(A1 A2 ) ∼= (R/A1 ) ⊕ (R/A2 ).
We have proved the theorem for any pair of comaximal ideals.
Now suppose that the theorem holds for some integer k. Suppose that we have A1 , A2 , . . . , Ak+1
are pairwise comaximal. Since Ak+1 is comaximal with Ai for 1 ≤ i ≤ k, then for all i with 1 ≤ i ≤ k,
there exist ai ∈ Ai and bi ∈ Ak+1 such that ai + bi = 1. Then

1 = 1k = (a1 + b1 )(a2 + b2 ) · · · (ak + bk ) = a1 a2 · · · ak + b where b ∈ Ak+1 .

Thus, the ideals A1 A2 · · · Ak and Ak+1 are comaximal. Thus, since the theorem holds for k = 2,
then
R/(A1 A2 · · · Ak+1 ) ∼
= (R/(A1 A2 · · · Ak )) ⊕ (R/Ak+1 ).
By the induction hypothesis, we deduce that

R/(A1 A2 · · · Ak+1 ) ∼
= (R/A1 ) ⊕ (R/A2 ) ⊕ · · · ⊕ (R/Ak+1 ).

By induction, the theorem holds for all integers k ≥ 2. 

This theorem leads to the decomposition for the group of units in modular arithmetic.

Corollary 5.6.19
αk
Let n be a positive integer and let n = pα1 α2
1 p2 · · · pk . Then

Z/nZ ∼
= (Z/pα α2 αk
1 Z) ⊕ (Z/p2 Z) ⊕ · · · ⊕ (Z/pk Z).
1

By Proposition 5.1.14, we have an isomorphism of groups

U (n) ∼
= U (pα α2 αk
1 ) ⊕ U (p2 ) ⊕ · · · ⊕ U (pk ).
1

It is still a number theory problem to determine U (pα ) for various primes p and powers α.

Exercises for Section 5.6


1. Refer to Exercises 5.5.5. For subsets that were ideals, describe the elements and operations in the
corresponding quotient ring.
n o
2. Prove that the quotient ring Z[x]/(5x − 1) is isomorphic to the subring Z[ 15 ] = 5nk n ∈ Z, k ∈ N of
Q.
3. Let [a, b] be an interval in R and consider the ring C 2 ([a, b], R) (real-valued functions over [a, b] whose
zeroeth, first, and second derivatives are continuous). The operations are addition and multiplication
of functions. Let
I = {f (x) ∈ C 2 ([a, b], R) | f (−1) = f 0 (−1) = f 00 (−1) = 0}.
Show that I is an ideal and describe the quotient ring C 2 ([a, b], R)/I.
4. Consider the ideal I = (x + 1, x2 + 1) in the ring R[x]. Prove that 2 ∈ (x + 1, x2 + 1). Deduce that
I = R[x] so that R[x]/I ∼
= {0}, the trivial ring.
5. Consider the ideal I = (x + 1, x2 + 1) in the ring Z[x]. Prove that 2 ∈ (x + 1, x2 + 1). Prove that
I = (x + 1, 2). Prove that Z[x]/I ∼
= Z/2Z.
6. Consider the quotient ring R[x]/(x3 + x − 2).
(a) In this quotient ring, calculate and simplify as much as possible the sum and product of
x2 + 7x − 1 and 2x2 − x + 5.
(b) Prove that this quotient ring is not an integral domain.
7. Consider the quotient ring Z[x]/(2x3 + 5x − 1).
5.6. QUOTIENT RINGS 257

Figure 5.5: A visualization of Z[i]/(2 + i)

(a) In this quotient ring, calculate and simplify as much as possible the sum and product of
4x2 − 5x + 2 and 2x + 7.
(b) Show that x is a unit.

8. Let f (x) = x2 + 2 in F5 [x].

(a) Prove that the quotient ring F5 [x]/(f (x)) has 25 elements.
(b) Prove that F5 [x]/(f (x)) contains no zero divisor.
(c) Deduce that F5 [x]/(f (x)) is a field. [Hint: See Exercise 5.1.23.]

9. Let R be an integral domain and let a(x) ∈ R[x] with deg a(x) = n > 0 and an ∈ U (R).

(a) Prove that, in the quotient ring R[x]/(a(x)), the element x satisfies xn = −a−1
n (an−1 x
n−1
+· · ·+
a1 x + a0 ).
(b) Prove that for every polynomial p(x), there exists a polynomial q(x) that is either 0 or has
deg q(x) < n such that p(x) = q(x) in R[x]/(a(x)). [Hint: Use induction on the degree of p(x).]
(c) Prove that the polynomial q(x) described above is unique.

10. Consider the ring Z[i] and the ideal I = (2 + i). We study the quotient ring Z[i]/(2 + i).

(a) Prove that −1 + 2i, −2 − i, and 1 − 2i are in the ideal (2 + i).


(b) Prove every element a + bi ∈ Z[i] is congruent modulo I to at least one element inside or on the
boundary of the square with 0, 2 + i, 1 + 3i, and −1 + 2i. Prove also that the vertices of the
square are congruent to each other modulo I. [See Figure 5.5.]
(c) Using division in C, show that none of the five elements in 0, i, 2i, 1 + i, 1 + 2i are congruent to
each other and conclude that Z[i]/(2 + i) = {0, i, 2i, 1 + i, 1 + 2i}.
(d) Write down the multiplication table in Z[i]/(2 + i) and deduce that this quotient ring is a field.
(e) Find an explicit isomorphism between Z[i]/(2 + i) and F5 .

11. Some rings with eight elements.

(a) Consider the (quotient) ring R1 = (Z/2Z)[x]/(x3 + 1). List all 8 elements of this ring and
determine whether (and show how) they are units, zero divisors, or neither.
(b) Repeat the same question with the ring R2 = (Z/2Z)[x]/(x3 + x + 1).
(c) Consider also the ring R3 = Z/8Z of modular arithmetic modulo 8. The rings R1 , R2 , and R3
all have 8 elements. Show that none of them are isomorphic to each other.

12. Show that (Z/7Z)[x]/(x2 − 3) is a field.


13. Consider the ring Z[x] and define the ideals Ip = (px − 1), where p is a prime.
258 CHAPTER 5. RINGS

(a) Prove that Z[x]/(I2 I3 I5 · · · Ip ) is isomorphic to


 
n
∈ Q | n ∈ Z and αi ∈ N .
2 α2 3 α3 · · · p αp
(b) Prove that there exists no ideal I ∈ Z[x] such that Z[x]/I ∼
= Q.
14. Prove that every element in the quotient ring R[x, y]/(y − x2 ) can be written uniquely as a(y) + xb(y)
where a(y) and b(y) are any polynomials in y and where x satisfies the relation x2 = y.
15. Let R be a commutative ring and let N (R) be the nilradical of R. Prove that N (R/N (R)) = 0.
16. Let R be a ring and let I and J be ideals in R.
(a) Prove that the function f : R/I → R/(I + J) defined by f (r + I) = r + (I + J) is a well-defined
function.
(b) Show that f is a homomorphism.
(c) Show that Ker f =∼ J/(I ∩ J).

17. Let R be a ring and let I be an ideal of R. Prove that Mn (I) is an ideal of Mn (R) and show that
Mn (R)/Mn (I) ∼
= Mn (R/I). [Hint: First Isomorphism Theorem.]
18. Let Un (R) be the set of upper triangular n × n matrices with coefficients in R. Prove that the subset
I = {A ∈ Un (R) | aii = 0 for all i = 1, 2, . . . , n}
is an ideal in Un (R) and determine Un (R)/I.
19. Prove the Second Isomorphism Theorem for rings. (See Theorem 5.6.12.)
20. Prove the Third Isomorphism Theorem for rings. (See Theorem 5.6.13.)
21. Prove the Fourth Isomorphism Theorem for rings. (See Theorem 5.6.14.)
22. Let R be a ring and let e be an idempotent element in the center C(R). Observe that the ideals
(e) = Re and (1 − e) = R(1 − e) and prove that R ∼
= Re ⊕ R(1 − e).
23. Consider the group ring Z[S3 ]. Show that the set I of elements
α = a1 + a2 (1 2) + a3 (1 3) + a4 (2 3) + a5 (1 2 3) + a6 (1 3 2)
satisfying (
a1 + a2 + a3 + a4 + a5 + a6 = 0
a1 − a2 − a3 − a4 + a5 + a6 = 0
is an ideal and show that Z[S3 ]/I = ∼ Z ⊕ Z. Prove that every element in Z[S3 ]/I can be written
uniquely as a1 1 + a2 (1 2), with a1 , a2 ∈ Z.
24. Let R be a PID and let I be an ideal in R. Prove that R/I is a PID.
25. Consider the subset I = {(2m, 3n) | m, n ∈ Z} in the ring Z ⊕ Z. Prove that I is an ideal in Z ⊕ Z and
that (Z ⊕ Z)/I ∼
= Z/2Z ⊕ Z/3Z.
26. Let R and S be rings with identity 1 6= 0. Prove that every ideal of R ⊕ S is of the form I ⊕ J for
some ideals I ⊆ R and J ⊆ S.
27. Solve the following system of congruences in Z:
(
x ≡ 3 (mod 5)
x ≡ 7 (mod 11).

28. Solve the following system of congruences in Z:


(
x ≡ 4 (mod 10)
x ≡ 17 (mod 21).

29. Solve the following system of congruences in Z:



x ≡ 2 (mod 3)

x ≡ 3 (mod 5)

x ≡ 9 (mod 13).

5.7. MAXIMAL IDEALS AND PRIME IDEALS 259

5.7
Maximal Ideals and Prime Ideals
We now turn to two different classes of ideals in rings. Though it will not be obvious at first, they
both attempt to generalize the notion and properties of prime numbers in Z.
Recall that in elementary number theory there are two equivalent definitions of prime numbers.

(1) An integer p > 1 is a prime number if and only if p is only divisible by 1 and itself. (The
definition of prime.)

(2) An integer p > 1 is a prime number if and only if whenever p|ab, then p|a or p|b. (Euclid’s
Lemma, Proposition 2.1.21.)

In ring theory in general, these two notions are no longer equivalent. The connection between
properties of ideals and integer arithmetic comes from the fact that a|b if and only if b ∈ (a), if and
only if (b) ⊆ (a).

5.7.1 – Maximal Ideals

Definition 5.7.1
An ideal I in a ring R is called maximal if I 6= R and if the only ideals J such that
I ⊆ J ⊆ R are J = I or J = R.

An arbitrary ring need not have maximal ideals. In many specific rings it is obvious or at least
very simple to prove the existence of maximal ideals. However, the following general proposition
proves the existence of maximal ideals in rings with a very few conditions. The proof relies on Zorn’s
Lemma, which is equivalent to the Axiom of Choice.

Theorem 5.7.2 (Krull’s Theorem)


In a ring with an identity, every proper ideal is contained in a maximal ideal.

Proof. Let R be a ring with an identity and let I be any proper ideal. Let S be the set of all proper
ideals that contain I. Then S is a nonempty set (since it contains I), which is partially ordered by
inclusion. Let C be any chain of ideals in S. We show that C has an upper bound.
Define the set [
J= A.
A∈C

We prove that J is an ideal of R. Let a, b ∈ J. Then a ∈ A1 and b ∈ A2 where A1 , A2 ∈ C. But


since C is a chain, either A1 ⊆ A2 or A2 ⊆ A1 . Let’s assume the former without loss of generality.
Then both a and b are in the ideal A2 . Thus, b − a ∈ A2 ⊂ J so J is an additive subgroup of R
by the one-step subgroup test. Now let r ∈ R and let a ∈ J. Then a is an element of some A ∈ C.
Since A is an ideal we have ra ∈ A ⊂ J and ar ∈ A ⊂ J. Thus, we have shown that J is an ideal.
Furthermore, since all ideals A ∈ C are proper, then none of them contains the identity. Thus,
J does not contain the identity and hence J is a proper ideal. In particular, J ∈ S. Thus, every
chain of elements in S has an upper bound. By Zorn’s Lemma, we conclude that S has a maximal
element. A maximal element of S is a maximal ideal that contains I. 

In the integers, suppose that (n) is a maximal ideal in Z. Then any ideal I = (m) satisfying
(n) ⊆ (m) ⊆ R is either (n) = (m) or (m) = R. Expressing this in terms of divisibility, we deduce
that m|n implies m = ±1 or m = ±n. Supposing that m is positive, then m = 1 or m = n. This
260 CHAPTER 5. RINGS

corresponds to the definition of a prime number, listed above in (Criterion 1). Hence, the maximal
ideals in Z correspond to the ideals (p) where p is a prime number.
The next proposition offers a criterion for when an ideal is maximal.

Proposition 5.7.3
Let R be a commutative ring. Then an ideal M is maximal if and only if R/M is a field.

Proof. By the Lattice Isomorphism Theorem for rings, there is a bijective correspondence between
the ideals of R/M and the ideals of R that contain M . But M is a maximal ideal if and only if the
only ideals of R that contain M are M and R itself. Hence, R/M contains precisely two ideals, (1)
and (0). By Proposition 5.5.15, R/M is a field. 

The above proposition offers the strategy to prove that an ideal I is maximal: Calculate the
quotient ring R/I, prove that R/I is a field, and then invoke the proposition. With the integers, we
had previously seen that all ideals are of the form (n) with n nonnegative and that Z/nZ is a field
if and only if n is a prime number. Proposition 5.7.3 shows that (n) is maximal if and only if n is a
prime number, which returns to prime numbers as the motivation for the notion of a maximal ideal.

Example 5.7.4. As a nonobvious example of Proposition 5.7.3, consider the ring of Gaussian inte-
gers Z[i] and the principal ideal (2 + i). Exercise 5.6.10 shows that Z[i]/(2 + i) is isomorphic to F5 ,
which is a field, so (2 + i) is a maximal ideal. 4

5.7.2 – Prime Ideals


In the language of ring theory, the second criteria for primeness in the integers (Criterion 2) can be
restated to say that p ≥ 2 is a prime number if and only if as ideals (ab) ⊆ (p) implies that (a) ⊆ (p)
or (b) ⊆ (p). This motivates the following definition.

Definition 5.7.5
Let R be any ring. An ideal P 6= R is called a prime ideal if whenever two ideals A and B
satisfy AB ⊆ P then A ⊆ P or B ⊆ P .

Many applications that involve prime ideals occur in the context of commutative rings. In
commutative rings, Definition 5.7.5 has an equivalent statement.

Proposition 5.7.6
Let R be a commutative ring with an identity 1 6= 0. An ideal P 6= R is prime if and only
if ab ∈ P implies a ∈ P or b ∈ P .

Proof. Suppose that an ideal P is prime. Let a, b ∈ R. Since R is commutative, then

(ab) = {rab | r ∈ R} and (a)(b) = {r1 r2 ab | r1 ∈ R, r2 ∈ R}.

Obviously (a)(b) ⊆ (ab) but since R has an identity 1 6= 0, by setting r1 = 1 or r2 = 1 as needed,


we see that (ab) ⊆ (a)(b). Hence, (ab) = (a)(b).
Now suppose that ab ∈ P . Then (ab) = (a)(b) ⊆ P and since P is a prime ideal (a) ⊆ P or
(b) ⊆ P . Hence, a ∈ P or b ∈ P .
Conversely, suppose that P satisfies the property that whenever ab ∈ P then a ∈ P or b ∈ P .
Let A and B be two arbitrary ideals such that AB ⊆ P . For all pairs (a, b) ∈ A × B, we have ab ∈ P
and, therefore by hypothesis, a ∈ P or b ∈ P . Assume that A 6⊆ P and B 6⊆ P . Then there exist
a0 ∈ A − P and b0 ∈ B − P such that a0 b0 ∈ P . This leads to a contradiction, so we must conclude
that A ⊆ P or B ⊆ P . Hence, P is a prime ideal. 
5.7. MAXIMAL IDEALS AND PRIME IDEALS 261

Prime ideals possess an equivalent characterization via quotient rings similar to Proposition 5.7.3
for maximal ideals.

Proposition 5.7.7
An ideal P in a commutative ring with an identity 1 6= 0 is prime if and only if R/P is an
integral domain.

Proof. Suppose that P is a prime ideal. By definition, if a, b ∈ R − P then ab ∈ R − P . Passing to


the quotient ring, we see that a and b are not 0 in R/P implies that ab 6= 0 in R/P . Thus, R/P has
no zero divisors. The quotient ring R/P contains an identity 1 and inherits commutativity from R.
Hence, , R/P is an integral domain.
Conversely, suppose that R/P is an integral domain. Let nonzero elements in R/P correspond
to a, b with a, b ∈ R − P . Since R/P is an integral domain, ab 6= 0 so ab ∈
/ P . We have shown that
a∈/ P and b ∈ / P implies that ab ∈/ P . The contrapositive to this is ab ∈ P implies that a ∈ P or
b ∈ P . This proves that P is a prime ideal. 

Corollary 5.7.8
In a commutative ring with an identity 1 6= 0, every maximal ideal is a prime ideal.

Proof. Let R be a commutative ring and let M be a maximal ideal. Then by Proposition 5.7.3,
R/M is a field. Every field is an integral domain. So by Proposition 5.7.7, M is prime. 

By Criteria 2 for primality in the integers, using either the definition or Proposition 5.7.7, we
see that an ideal (n) ⊆ Z is prime if and only if n is a prime number or n = 0. The zero ideal (0)
is the only ideal in Z that is prime but not maximal. However, as the following example illustrates,
this similarity between prime and maximal ideals does not hold for arbitrary rings.

Example 5.7.9. Let R = Z[x] and consider the ideal I1 = (x2 −2). The quotient ring R/I1 ∼ = Z[ 2]
is an integral domain but is not a field so I1 is a prime ideal that is not maximal. 4

Proposition 5.7.10
Let R be a PID. Then every nonzero prime ideal P is a maximal ideal.

Proof. Let P = (p) be a nonzero prime ideal in R and let I = (m) be an ideal containing P .
Since p ∈ (m) then there exists r ∈ R such that mr = p. But then, mr ∈ P so either m ∈ P or
r ∈ P . If m ∈ P , then (m) ⊆ P which implies that (m) = P since we already knew that P ⊆ (m).
Now suppose that r ∈ P so that there exists some s ∈ R such that r = sp. Then from mr = p we
deduce that msp = p. Since R is in an integral domain and p 6= 0, the cancellation law holds, so
ms = 1. Thus, m is a unit and (m) = R. Hence, we have proved that either I = P or I = R. Thus,
we conclude that P is maximal. 

Example 5.7.11. In this example, we revisit Z[x] and show a few different ideals that are prime,
maximal, or neither. Consider the following chain of ideals

(x3 + x2 − 2x − 2) ⊆ (x2 − 2) ⊆ (x2 − 2, x2 + 13) ⊆ (x2 − 2, 5).

(It is not yet obvious that this is a chain but we shall see that it is.) The ideal (x3 + x2 − 2x − 2)
is not prime because (x − 1)(x2 − 2) = x3 + x2 − 2x − 2 is in this ideal whereas neither x − 1 nor
x2 − 2 is in the ideal, since nonzero polynomials in (x3 + x2 − 2x − 2) have √ degree 3 or higher.
The ideal (x2 − 2) is prime but not maximal because Z[x]/(x2 − 2) ∼ = Z[ 2] is an integral domain
but is not a field.
262 CHAPTER 5. RINGS

The ideal I3 = (x2 − 2, x2 + 13) is actually equal to (x2 − 2, 15). We see that 15 ∈ I3 because
x2 + 13 − (x2 − 2) = 15. However, x2 + 13 ∈ (x2 − 2, 15) so I3 = (x2 − 2, 15). This is not a prime
ideal because neither 3 nor 5 are in I3 whereas 15 ∈ I3 .
Finally, the ideal (x2 − 2, 5) is maximal. We can see this because

Z[x]/(x2 − 2, 5) ∼
= Z/5Z[x]/(x2 − 2) ∼
= Z/5Z[x]/(x2 + 3).

However, a 6= 0 is not a zero divisor and if a + bx were a zero divisor with b 6= 0 in Z/5Z[x]/(x2 + 3),
then b−1 a + x would be a zero divisor. Then there would exist x + c and x + d with c, d ∈ Z/5Z
such that (x + c)(x + d) = x2 + 3 in Z/5Z[x]. Then c and d would need to solve c + d = 0 and
cd = 3. Hence, d = −c and −c2 = 3. Checking the five cases, we find that −c2 = 3 has no solutions
in Z/5Z. Hence, Z/5Z[x]/(x2 + 3) has no zero divisors. This quotient ring is commutative and has
an identity so Z/5Z[x]/(x2 + 3) is an integral domain. It is finite so it is a field, so we conclude that
(x2 − 2, 5) is a maximal ideal. 4

Example 5.7.12. Let R = C 0 ([0, 1], R) be the ring of continuous real-valued functions on [0, 1]
and let a be a number in [0, 1]. Let Ma be the set of all functions such that f (a) = 0. Note that
Ma = Ker(eva ) so Ma is an ideal. Also by the surjectivity of eva and the first isomorphism theorem,
we have R/Ma ∼ = R. Since R is a field, by Proposition 5.7.3, Ma is a maximal ideal.
Consider now the ideal

I = {f ∈ C 0 ([0, 1], R) | f (0) = f (1) = 0}.

This ideal is not prime because, for example, the functions g, h : [0, 1] → R given by g(x) = x and
h(x) = 1 − x are such that g, h ∈
/ I but gh ∈ I. Consequently, it is not maximal either. 4

Prime ideals and maximal ideals possess many interesting properties. Properties of prime ideals
in a commutative ring are a central theme in the study of commutative algebra, the branch of
algebra that studies in depth the properties of commutative rings. The section exercises present
some questions that ask the reader to determine whether a given ideal is prime or maximal but also
investigate many of these properties. The reader is encouraged to at least skim the statements of
exercises to acquire some intuition about the properties of prime ideals.

Exercises for Section 5.7


1. Show that the ideal {(2m, 3n) | m, n ∈ Z} is not a prime ideal.
2. Let R be a commutative ring. Consider the ideal (x) in the polynomial ring R[x].
(a) Prove that (x) is a prime ideal if and only if R is an integral domain.
(b) Prove that (x) is a maximal ideal if and only if R is a field.
3. Let R be a ring with an identity 1 6= 0. Prove that if the set of nonunits is an ideal M , then M is
the unique maximal ideal in R. Conversely, prove that if R is a commutative ring that contains a
unique maximal ideal M , then R − M is the set of all units. [Note: A commutative ring with a unique
maximal ideal is called a local ring.]
a
4. Consider the subset R ⊂ Q of fractions b
for which, in reduced form, 19 does not divide b.
(a) Show that R is a subring of Q.
(b) Determine the set of units in R.
(c) Use Exercise 5.7.3 to conclude that the set of nonunits in R is the unique maximal ideal in R.
5. Consider the ring Un (R) of upper triangular n × n matrices with real coefficients. Fix an integer k
with 1 ≤ k ≤ n. Prove that the set

Ik = {A ∈ Un (R) | akk = 0}

is a maximal ideal in Un (R). [Hint: Use the First Isomorphism Theorem.]


5.7. MAXIMAL IDEALS AND PRIME IDEALS 263

6. Consider the ring Un (Z) of upper triangular n × n matrices with integer coefficients. Fix an integer k
with 1 ≤ k ≤ n. Prove that the set

Ik = {A ∈ Un (Z) | akk = 0}

is a prime ideal in Un (Z) that is not maximal. Explain how this differs from the previous exercise.
7. Let R = C 0 (R, R) and consider the set of functions of compact support,

I = {f ∈ C 0 (R, R) | Supp(f ) is closed and bounded }.

[Recall that a subset S of R is bounded if there exists some c such that S ⊆ [−c, c].]
(a) Prove that I is an ideal.
(b) Prove that any maximal ideal that contains I is not equal to any of the ideals Ma described in
Example 5.7.12.
8. This exercise asks the reader to prove the following modification of Proposition 5.7.3. Let R be any
ring. An ideal M is maximal if and only if the quotient R/M is a simple ring. [A simple ring is a ring
that contains no ideals except the 0 ideal and the whole ring.]
9. In noncommutative rings, we call a left ideal a maximal left ideal (and similarly for right ideals) if it
is maximal in the poset of proper left ideals ordered by inclusion. Prove that maximal left ideals (and
maximal right ideals) exist under the same conditions as for Krull’s Theorem.
10. Prove that the set of matrices {A ∈ M2 (R) | a11 = a21 = 0} is a maximal left ideal of M2 (R). (See
Exercise 5.7.9.) Find a maximal right ideal in M2 (R).
11. Find all the prime ideals in Z ⊕ Z.
12. Prove that (y − x2 ) is a prime ideal in R[x, y]. Prove also that it is not maximal.
13. Prove that (y, x2 ) is not prime in R[x, y].
14. Prove that the principal ideal (x2 + y 2 ) is prime in R[x, y] but not in C[x, y].
15. Show that in C 1 (R, R) the ideal I = {f | f (2) = f 0 (2) = 0} is not a prime ideal.
16. Show by example that the intersection of two prime ideals is in general not another prime ideal.
17. Let R be any ring.
(a) Show that the intersection of two prime ideals P1 and P2 is prime if and only if P1 ⊆ P2 or
P2 ⊆ P1 .
(b) Conclude that the intersection of two distinct maximal ideals is never a prime ideal.
18. Let R be a commutative ring. Prove that the nilradical N (R) is a subset of every prime ideal.
19. Show that a commutative ring with an identity 1 6= 0 is an integral domain if and only if {0} is a
prime ideal.
20. Let ϕ : R → S be a ring homomorphism and let Q be a prime ideal in S. Prove that ϕ−1 (Q) is a
prime ideal in R.
21. Let ϕ : R → S be a ring homomorphism and let P be a prime ideal in R. Prove that the ideal
generated by ϕ(P ) is not necessarily a prime ideal in S.
22. Let R be a ring and let S be a subset of R. Prove that there exists an ideal that is maximal (by
inclusion) with respect to the property that it “avoids” (does not contain) the set S.
23. Prove that the nilradical of a commutative ring is equal to the intersection of all the prime ideals of
that ring. [Hint: Use Exercises 5.7.18 and 5.7.22.]

24. Let R be a commutative ring and let I be an ideal. Prove that the radical ideal I is equal to the inter-
section of all prime ideals that contain I. [Hint: Use the isomorphism theorems and Exercise 5.7.23.]
25. Consider the polynomial ring R[x1 , x2 , x3 , . . .] with real coefficients but a countable number of variables.
Define I1 = (x1 ), I2 = (x1 , x2 ), and Ik = (x1 , x2 , . . . , xk ) for all integers k. Prove that Ik is a prime
ideal for all positive integers k and prove that

I1 ( I2 ( · · · ( Ik · · ·
264 CHAPTER 5. RINGS

5.8
Projects
Project I. Roots of x2 + 1 in Z/nZ. Consider the ring of modular arithmetic R = Z/nZ.
The goal of this project is to find the number of solutions to the equation x2 + 1 = 0 in R.
Determine the number of solutions for a large number of different values of n. Try to make
(and if possible prove) a conjecture about the number of roots when n is prime, when n is a
power of 2, when n is the product of two primes, and when n is general.

Project II. The Three-Sphere Group. Recall (see Appendix A.1) that the set {z ∈ C | |z| =
1} is a subgroup of (U (C), ×). Furthermore, this set is isomorphic to the circle S1 where we
locate point via and angle and we consider the addition operation by the addition of angles.
Consider now the subset of the quaternions

S = {α ∈ H | N (α) = 1},

where N (α) is the quaternion norm introduced in Example 5.1.21. This set S is a subgroup
of (U (H), ×). Study this group. Show that geometrically we can view this set of a three-
dimensional unit sphere S3 in R4 . Study the group in comparison to (S1 , +). Are there
subgroups, normal subgroups, etc., in (S, ×)? If there are normal subgroups, identify the
corresponding quotient groups. Are there nontrivial homomorphisms with this group as its
domain? Offer your own investigations or generalizations about this group.

Project III. Inverting Matrices of Quaternions. Consider the ring M2 (H). Attempt to find a
criterion for when a matrix is invertible. Can you find a formula for the inverse of an invertible
matrix in M2 (H)? Does your criteria extend to Mn (H) for n ≥ 3?

Project IV. Matrices of Quaternions. Study the ring M2 (H). Decide if you can find nilpotent
elements, zero divisors, ideals, etc. Discuss solving systems of two equations in two variables
but with variables and coefficients taken from H.

Project V. Commutative Subrings in Mn×n (F ). Let F be a field and fix a matrix A ∈


Mn×n (F ). Define the function ϕ : F [x] → Mn×n (F ) by

ϕ(an xn + an−1 xn−1 + · · · + a1 x + a0 ) = an An + an−1 An−1 + · · · + a1 A + a0 I.

First show that the function ϕ is a ring homomorphism and conclude that Im ϕ is a commuta-
tive subring of Mn×n (F ). The matrices in the image of ϕ commute with A but this may not
account for all matrices that commute with A. Setting adA : Mn×n (F ) → Mn×n (F ) as the
linear transformation such that adA (X) = AX − XA, then Ker(adA ) is the subset of Mn×n (F )
of matrices that commutes with A. What can be said about the relationship between Ker adA
and Im ϕ? Are they always equal for any A? If not, what conditions on A make them equal?
The set Ker adA is a priori just a subspace of Mn×n (F ); is it a subring or even an ideal of
Mn×n (F )? (Consider examples with 2 × 2 rings.)

Project VI. Subset Polynomial Ring. If S is a set, then the power set P(S) has the structure
of a ring when equipped with the operation 4 for addition and ∩ for the multiplication.
Consider the polynomial ring R = P(S)[x]. What are some properties (units, zero divisors,
what ideals may be, what are the maximal ideals, is it an integral domain, is it a PID, etc.) of
this ring? Discuss the same question with a quotient ring of R. For example, you could take
S = {1, 2, 3, 4, 5} and study properties of

R = P(S)[x]/({1, 3, 4}x3 + {1, 2}x + {2, 3, 5}),

but the choice of quotient ring is up to you.


5.8. PROJECTS 265

Project VII. Application of FTFGAG. Consider the group of units in quotient rings of the
form Fp [x]/(n(x)), where p is a prime and n(x) is some polynomial in Fp [x]. Since the quotient
ring will be finite and abelian, the group of units will be finite and abelian, and hence is subject
to the Fundamental Theorem of Finitely Generated Abelian Groups. Try a few examples and
find out as much as you can about such groups. With examples, can you determine the
isomorphism type of U (F5 [x]/(x2 + x + 1)) or U (F2 [x]/(x3 + x + 1))?

Project VIII. Quotient Rings and Calculus. Revisit Example 5.6.6 and in particular how
a product in a quotient ring recovers the product rule of differentiation. Generalize this
observation to higher derivatives or to other function rings such as C n ([a, b], R) or to even
more general function rings. Are there other rules of differentiation that emerge from doing
other operations in an appropriate quotient ring?

Project IX. Convolution Rings. The construction of convolution rings is a general process
with many natural examples subsumed them. Explore properties of convolution rings of your
own construction. Consider using commutative or noncommutative rings, finite or infinite
semigroups. Explore ring properties such as zero divisors, commutativity, units, ideals, and so
on.
6. Divisibility in Commutative Rings

Chapter 2 introduced a few basic properties of the integers that were essential for many of the
earlier topics. Section 2.1 emphasized first the well-ordering of the integers and then properties
following from the notion of divisibility. The well-ordering of the integers was a property of the total
(discrete) order ≤ on Z and implied the principle of mathematical induction on Z. The partial order
of divisibility on the integers led to the concept of primes, greatest common divisor, least common
multiple, modular arithmetic, and many other notions.
Arbitrary rings do not naturally possess a total order that leads to well-ordering or induction.
However, divisibility is an important notion in rings, in particular commutative rings. This chapter
develops the notion of divisibility for rings, with a view of generalizing many of the divisibility
properties of the integers to as general a context as possible. The presentation makes regular
reference to the topics introduced in Section 2.1.
Section 6.1 defines divisibility in rings and discusses how and in what context arbitrary rings
possess similar notions to divisibility, primes, and greatest common divisor. In Section 6.2, we
construct methods to force ring elements to be divisible by other elements, thereby creating rings
of fractions, modeled from the construction of the rational numbers from the integers. Section 6.3
discusses a generalization of the integer division algorithm, while Section 6.4 discusses a general
context in which something akin to the unique prime factorization occurs.
In Section 6.5, the first application section of the chapter, general concepts of divisibility are
brought to bear on polynomial rings and in particular polynomial rings with coefficients in a field.
As a technical application, Section 6.6 introduces the RSA protocol for public key cryptography.
Finally, Section 6.7 offers a brief introduction to algebraic number theory, a branch of mathematics
that studies questions of interest in classical number theory but in extensions of the integers.

6.1
Divisibility in Commutative Rings
6.1.1 – Divisors and Multiples

Definition 6.1.1
Let R be a commutative ring. We say that a nonzero element a divides an element b, and
write a|b, if there exists r ∈ R such that b = ar. The element a is called a divisor of b and
b is called a multiple of a.

The reader may (and should) wonder why this definition is phrased in the context of commutative
rings and not arbitrary rings. If a ring R is not commutative, then given elements a, b ∈ R it could
be possible that there exists r ∈ R such that b = ar but that there does not exist s ∈ R such that
b = sa. Consequently, we would need to introduce the notions of right-divisibility if ∃r ∈ R, b = ar
and left-divisibility if ∃r ∈ R, b = ra. In the theory of noncommutative rings, one must take care to
distinguish these relations and develop appropriate theorems. This section restricts the attention to
commutative rings, but the exercises investigate some properties that apply to arbitrary rings.
When we say that an integer a is divisible by an integer b, there is an assumption that the k ∈ Z
such that b = ak is unique. The uniqueness follows immediately from Definition 2.1.5 for divisibility
over the integers. Consider now an arbitrary commutative ring R and suppose that a = bk and
a = bk 0 in R. Then 0 = b(k − k 0 ). One way that 0 = b(k − k 0 ) could hold is if b = 0, which would

267
268 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS

imply that a = 0. This is why Definition 6.1.1 imposed the condition a 6= 0. However, if b is a zero
divisor, then there exist distinct k and k 0 such that b(k − k 0 ) = 0. Hence, uniqueness of the factor k
does not hold in rings with zero divisors.
In Section 2.1.2, we pointed out that (N∗ , |) is a partially ordered set. This remark inspires us
to investigate whether and in what sense divisibility is close to being a partial order on a ring.

Proposition 6.1.2
Let R be a commutative ring. Divisibility is a transitive relation on R. Furthermore,
divisibility is a reflexive relation on R if and only if R has an identity.

Proof. Suppose that a, b, c ∈ R with a and b nonzero and a|b and b|c. Then there exist r, s ∈ R with
b = ar and c = bs. Then by associativity, c = a(rs) so a|c.
For reflexivity, note that a|a for all a ∈ R if and only if there exists r ∈ R such that a = ar. This
is precisely the definition of a multiplicative identity. 
A discussion about antisymmetry is more involved, but we try to follow the reasoning in Proposi-
tion 2.1.6(4) that led to the result that | is antisymmetric on N∗ . Suppose that a and b are elements
in a commutative ring R such that a|b and b|a. Then there exist r, s ∈ R with b = ar and a = bs.
Hence, a = ars. Without more information about the ring, there is not much else to be said. If
the ring R has an identity 1 6= 0, then a1 = ars, which implies that a(1 − rs) = 0. If the ring R
has zero divisors, then the cancellation law does not necessarily exist and it does not follow that
rs = 1. However, if R contains no zero divisors, then a|b and b|a is equivalent to a and b differing by
multiplication by a unit. We observe that in any ring with an identity, −1 is a unit so divisibility is
never antisymmetric.
The above discussion shows that for the usual notion of divisibility, it is preferable that the
commutative ring have an identity 1 6= 0 and not include any zero divisors. These are precisely the
properties of an integral domain. Consequently, from now on, unless we explicitly say so, we will
discuss the notion of divisibility only in the context of integral domains.

Definition 6.1.3
Let R be an integral domain. Two elements a and b are called associates if there exists a
unit u such that a = bu.

It is not hard to prove that the relation of associate is an equivalence relation on R. (See
Exercise 6.1.6.) In this text, we will consistently use the following relation symbol on an integral
domain
a'b
to mean that a and b are associates. Notice that with this relation symbol, r ' 1 if and only if r is
in the group of units U (R).
We can now see in what sense divisibility is a partial order.

Proposition 6.1.4
Setting [a] | [b] in the quotient set (R − {0})/ ' if and only if a|b in R defines a relation on
(R − {0})/ '. Furthermore, | is a partial order on (R − {0})/ '.

Proof. We must first verify that setting [a] | [b] is well-defined. Let a and a0 be associates and let
b and b0 be associates so that a = a0 u and b = b0 v for u, v ∈ U (R). But a = br is equivalent to
a0 = b0 (vru−1 ) and a0 = b0 r0 is equivalent to a = bv −1 r0 u. Hence, the choice of representatives
from [a] and [b] is irrelevant. Hence, our definition for divisibility on the set of associate classes is
well-defined.
From Proposition 6.1.2, we already know that | is transitive and reflexive on (R − {0})/ '.
Furthermore, we saw that a|b and b|a in R if and only if a and b are associates. Hence, if [a] | [b]
and [b] | [a] on (R − {0})/ ', then [a] = [b]. Hence, | is antisymmetric on (R − {0})/ '. 
6.1. DIVISIBILITY IN COMMUTATIVE RINGS 269


Figure 6.1: The elements and associates in Z[(−1 + i 3)/2]

Example 6.1.5. Consider the polynomial equation x3 − 1 = 0. By the identity given in Exer-
cise 5.1.26, this equation is equivalent to

(x − 1)(x2 + x + 1) = 0.
√ √
The roots of x2 + x + 1 = 0 are ω = −1+i2
3
and ω = −1−i2
3
.
Consider the ring Z[ω]. As a subring of C that includes the identity, it is an integral domain. It
is not hard to show that in C,
a + bω
(a + bω)−1 = 2 .
a − ab + b2
In order for a + bω to be a unit in Z[ω], it is not hard to show that we need a2 − ab + b2 = ±1. This
has six solutions, namely (a, b) = (±1, 0), (0, ±1), ±(1, 1), or in other words
√ √ √ √
1 + i 3 −1 + i 3 −1 − i 3 1 − i 3
1, , , −1, , .
2 2 2 2
In polar coordinates, the units are
 π  π
cos k + i sin k for k = 0, 1, . . . , 5.
3 3

These correspond to the 6 distinct powers of ζ k with ζ = 1+i2 3 .
In Figure 6.1, the dots represent the elements in Z[ω]. Each element is an associate to a unique
element in the sector defined in polar coordinates by r > 0 and 0 ≤ θ < π3 , represented by . So
according to Proposition 6.1.4, divisibility is a partial order on the elements in Z[ω] in this sector.4

6.1.2 – Norms on Rings


In the study of rings, a comparison to the integers or other familiar rings is often fruitful. We
commented above that most rings do not possess a natural total order that behaves well with
270 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS

respect to the operations of the ring. However, for some integral domains, the concept of a norm
allows us to leverage properties of the integers to deduce results about the ring.

Definition 6.1.6
• Let R be an integral domain. Any function N : R → N with N (0) = 0 is called a
norm on R.

• A norm is called positive if N (a) > 0 for all a ∈ R − {0}.


• A norm is called multiplicative if N (ab) = N (a)N (b).

An important class of rings that have norms are the rings of the form Z[ D], where D is a
square-free
√ integer (i.e., it is not divisible by a square integer greater than 1). Consider the function
N : Z[ D] → N defined by √
N (a + b D) = |a2 − Db2 |. (6.1)

This is obviously a norm. Furthermore, since D is not a square, D is not a rational number so
there exist no pairs (a, b) ∈ Z2 with (a, b) 6= (0, 0) such that a2 − Db2 = 0. Consequently, √ the
norm N is√a positive norm. We now show that N is a multiplicative norm. Let α = a + b D and
β = c + d D, then

N (α)N (β) = |a2 − Db2 | |c2 − Dd2 | = |a2 c2 − Da2 d2 − Db2 c2 + D2 b2 d2 |.



Furthermore, αβ = (ac + Dbd) + (ad + bc) D so

N (αβ) = |a2 c2 + 2Dabcd + D2 b2 d2 − D(a2 d2 + 2abcd + b2 c2 )|


= |a2 c2 − Da2 d2 − Db2 c2 + D2 b2 d2 |.

Consequently, for all α, β ∈ Z[ D], N (αβ) = N (α)N (β).
A multiplicative norm on a ring is particularly useful when discussing divisibility because if a
divides b in R, then N (a) divides N (b) as positive integers. If in addition the norm N is positive,
then N (1) = N (12 ) = N (1)N (1), which implies that N (1) = 1, since N (1) 6= 0. Then any unit
u ∈ U (R) satisfies N (u) = 1 because if uv = 1, then N (u)N (v) = 1 and the only multiplicative
inverses in N is 1. √
With the norm N in (6.1) defined on Z[ D], a stronger result holds.

Proposition 6.1.7

The element α ∈ Z[ D] is a unit if and only if N (α) = 1.

Proof. We already know that if α is a unit then N (α) = 1. Conversely, if N (α) = 1, then a2 − Db2 =
±1 and so √
−1 a−b D √
α = 2 ∈ Z[ D].
a − Db 2 

6.1.3 – Irreducible and Prime Elements


The introduction to Section 5.7 remarked that the usual two equivalent criteria for primeness in Z
are not necessarily equivalent in arbitrary rings. From those two criteria, Section 5.7 developed the
notions of maximal ideals and prime ideals.
The reader may wonder why we generalized the criteria of primeness in the context of ideals
as opposed to rings elements. Primality for ideals is more general. The comments made at the
beginning of this section show that the relation of divisibility has the properties that are similar to
the integers only when the ring is an integral domain. On the other hand, the relation of containment
between the ideals in a ring is a partial order that is defined on arbitrary rings.
6.1. DIVISIBILITY IN COMMUTATIVE RINGS 271

Though divisibility is a partial order on an integral domain in the sense of Proposition 6.1.4, the
two criteria for primeness in Z are still not necessarily equivalent in integral domains. Hence, we
need two separate definitions.

Definition 6.1.8
Let R be an integral domain.
(1) Suppose that r is nonzero and not a unit. Then r is called irreducible if whenever
r = ab either a or b is a unit. Otherwise, r is said to be reducible.

(2) A nonzero element p is called prime if p | ab implies that p | a or p | b.

Note that p is a prime element if and only if the principal ideal (p) is a prime ideal. However,
keep in mind that not all prime ideals are necessarily principal.

Proposition 6.1.9
In an integral domain, a prime element is always irreducible.

Proof. Suppose that (p) is a nonzero prime ideal and that p = ab. Then a or b is in (p). Without
loss of generality, suppose that a ∈ (p) so that a = cp for some c. Then p = pcb and hence 1 = cb.
Thus, b is a unit in R. 

We show by way of example that the converse is not true.


Example 6.1.10. We mention at the outset that √ this example does not use the simplest possible
situation to illustrate the point. In the ring Z[ 10] consider the four elements
√ √ √ √
α = 6 + 2 10, β = −15 + 5 10, γ = 60 + 19 10, δ = −60 + 19 10.

It is easy to calculate that αβ = 10 = γδ. It is also easy to calculate that using the norm N in (6.1),
we have
N (α) = 4, N (β) = 25, N (γ) = 10, N (δ) = 10.

However, we claim √ that there are no elements in Z[ 10] of norm either 2 or 5. Assume there exists
an element a + b 10 of norm 2. Then a2 − 10b2 = ±2. Modulo 10, this gives a2 ≡ ±2 (mod 10).
The squares modulo 10 are 0, 1, 4, 9, 6, 5. Hence, we arrive at a contradiction and √ so we conclude
there exists no element of norm 2. Assume now that there exists an element a + b 10 of norm 5.
Then a2 − 10b2 = ±5. This implies that 5|a2 and thus that 5|a. Writing a = 5c leads to the equation
5c2 − 2b2 = ±1. In modulo 5, this equation is 3b2 ≡ ±1 (mod 5) which is equivalent to b2 ≡ ±2
(mod 5). The squares in modular arithmetic modulo 5 are 0, 1, 4. Therefore, the assumption leads
to a contradiction and thus there exists no element of norm 5.
Suppose that α = ab. Then N (ab) = 4 so the pair (N (a), N (b)) is (1, 4), (2, 2), or (4, 1). Since
no element exists of norm 2, either N (a) = 1 or N (b) = 1. By Proposition 6.1.7, either a or b is
a unit. Hence, α is irreducible. By a similar reasoning, we can establish that β, γ, and δ are all
irreducible elements. However, none of them are prime elements.
Notice that α divides 10 = γδ. But N (α) = 4 does not divide N (γ) = 10 or N (δ) = 10, so α
divides neither γ nor δ. Hence, α is an element that is irreducible but not prime. With a similar
reasoning, we can show that β, γ, and δ are not prime elements either.
We point out that we have made√ the √ above example
√ more complicated than √ necessary to illustrate

−1
a point. Note√ that 10 = 2 · 5 = 10 · 10,
√ −1 that 3 + 10√is a unit with (3 + 10) = −3 + 10 and
that 19 + 6 10 is a unit with (19 + 6 10) = 19 − 6 10. We chose α, β, γ, δ by
√ √ √ √ √ √
α = 2(3 + 10), β = 5(−3 + 10), γ = 10(19 + 6 10), δ = 10(19 − 6 10).

Hence,
√ α, β, γ, and δ are respectively associates with the much simpler numbers 2, 3, 10, and
10. 4
272 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS

6.1.4 – Greatest Common Divisors


The definition provided in Section 2.1.3 for the greatest common divisor between two integers applies
verbatim for greatest common divisors in an integral domain.

Definition 6.1.11
Let R be an integral domain. If a, b ∈ R with (a, b) 6= (0, 0), a greatest common divisor of
a and b is an element d ∈ R such that:
• d|a and d|b (d is a common divisor);
• if d0 |a and d0 |b (d0 is another common divisor), then d0 |d.

Proposition 6.1.12
If two elements a and b in an integral domain R have a greatest common divisor d, then
any other greatest common divisor d0 is an associate of d.

Proof. If both d and d0 are greatest common divisors to a and b, then d|d0 and d0 |d. Hence, there
exist elements u, v ∈ R such that d0 = ud and d = vd0 . Then d0 = uvd0 and since R is an integral
domain, 1 = uv, which implies that u and v are units. Thus, d ' d0 . 
The hypothesis of Proposition 6.1.12 was careful to say, “if a and b have a greatest common
divisor.” In contrast to what happens in the ring of integers, two elements in arbitrary integral
domains need not possess a greatest common divisor.

Example 6.1.13. The ring Z[ 10] again offers an example √ of the nonexistence of greatest common
√ a = 12 and b = 24 + 6 10. It is easy to see that 6 is a common
divisors. Consider the elements
divisor to a and b but 8 + 2 10 is also because
√ √ √ √
12 = (4 − 10)(8 + 2 10) and 24 + 6 10 = 3(8 + 2 10).
√ √
Since neither N (6) = 36 nor N (8 + 2 10) = 24 divides the other, then neither 6 nor√8 + 2 10
divides the other. Hence, neither of them is a greatest common divisor of 12 and 24 + 6 10.
Assume
√ √ d to a and b. Then d is a multiple of 6 and of
there exists a greatest common divisor
8 + 2 10 while it is a divisor of 12 and 24 + 6 10. Hence, N √(6) = 36 divides N (d) which in turn
divides N (12) = 144. In Example 6.1.10, we showed that Z[ 10] has no elements with norm 2 so
N (d)/N (6) = 1 or 4. If N (d)
√ = N (6), then d is an associate of 6 which leads to a contradiction since
6 is not a multiple of 8 + 2 10. If N√(d) = 144, then d is an associate of√12 which is a contradiction
since 12 is not a divisor of 24 + 6 10. Consequently, 12 and 24 + 6 10 do not have a greatest
common divisor. 4
We conclude the section with two definitions that adapt some more terminology of elementary
number theory to integral domains. Results related to these concepts are left in the exercises.

Definition 6.1.14
Two elements a and b in an integral domain are said to be relatively prime if the only
common divisors are units.

Definition 6.1.15
Let R be an integral domain. If a, b ∈ R with (a, b) 6= (0, 0), a least common multiple is an
element m ∈ R such that:

• a|m and b|m (m is a common multiple);


• if a|m0 and b|m0 , then m|m0 .
6.1. DIVISIBILITY IN COMMUTATIVE RINGS 273

Proposition 2.1.16 for least common multiples in Z carries over with only minor adjustments to
any integral domain.

Proposition 6.1.16
Let R be an integral domain. Two nonzero elements a and b possess a greatest common
divisor if and only if they possess a least common multiple.

Proof. (Left as an exercise for the reader. See Exercise 6.1.22.) 

The reader may have noticed with some dissatisfaction that this section did not show how to
determine if certain elements in an integral domain are irreducible or prime, or how to find a greatest
common divisor of two elements if they exist. Such questions are much more difficult in arbitrary
integral domains than they are in Z. As subsequent sections attempt to generalize the results of
elementary number theory to commutative rings, we will introduce subclasses of integral domains
in which such question are tractable. In the meantime, we give the following definition.

Definition 6.1.17
An integral domain in which every two nonzero elements have a greatest common divisor
is called a gcd-domain.

Exercises for Section 6.1


1. List all the divisors of (6, 14) in Z ⊕ Z.
2. Prove that if a|b and a|c in a commutative ring, then a|(b + c).
3. Let R be a noncommutative ring. Prove that the relation on R of left-divisible is a transitive relation.
4. Prove that a subring of an integral domain that contains the identity is again an integral domain.
5. Let R be a commutative ring with a 1 6= 0. Prove or disprove that if R/I is an integral domain, then
R is an integral domain (and I is prime).
6. Let R be an integral domain. Prove that the relation a ' b defined by a is an associate to b is an
equivalence relation.

7. Let D be a square-free integer and consider Z[ D] along√ with the norm defined in (6.1). Prove that
if N (α) is a prime number, then α is irreducible in Z[ D].
8. Use Exercise 6.1.7 to find 5 irreducible elements in Z[i] that are not associates to each other.

9. Use Exercise 6.1.7 to find 5 irreducible elements in Z[ 3] that are not associates to each other.

10. Find all the divisors of 21 in Z[ −3].
11. Consider the ring Z[i].
(a) Prove that if a + bi and c + di are such that (a + bi)(c + di) ∈ Z, then c + di is ±(a − bi).
(b) Prove that for all pairs (a, b) ∈ Z2 , the expression a2 + b2 is never congruent to 3 modulo 4.
(c) Prove that 2 is not irreducible.
(d) Deduce that for a prime number p, the element p = p + 0i is irreducible in Z[i] if p ≡ 3 (mod 4).
√ √
12. Prove that Z[ 3 2] is an integral domain. Prove that if a, b ∈ Z such that a3 − 2b3 = ±1 then a + b 3 2
is a unit.
√ √
13. Consider the ideal I = (3, 2 + −5) in the ring Z[ −5].
(a) Prove that 1 ∈/ I so you can conclude that I is a proper ideal of the ring.

(b) Use the √norm defined in (6.1) to show that the only elements α ∈ Z[ −5] such α | 3 and
α | (2 + −5) are units.
(c) Deduce that I is not a principal ideal.
14. Let ϕ : R → S be an injective homomorphism between integral domains.
(a) Prove that if s ∈ Im ϕ is an irreducible element in S, then r = ϕ−1 (s) is an irreducible element
in R.
274 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS

(b) Prove by a counterexample that if r is an irreducible element in R, then ϕ(r) is not necessarily
irreducible in S.
√ √
15. Prove that there are no elements α √∈ Z[ 10] with N (α) = 3. Conclude that the elements 7 + 2 10
and 3 are irreducible elements in Z[ 10].
16. Let R be an integral domain. Prove that, if it exists, a least common multiple of a, b ∈ R is a generator
for the (unique) largest principle ideal containing (a) ∩ (b). Conclude that in a PID, if (a) ∩ (b) = (m),
then m is a least common multiple of a and b.
17. Prove that in a PID, every irreducible element is prime.
18. Let R be an integral domain and let a, b ∈ R. Prove that a and b have a greatest common divisor d
with a = dk and b = d`, then k and ` are relatively prime.

19. Prove that Proposition 2.1.14 does not hold when we replace Z with Z[ 10].
20. Let p1 , p2 , q1 , q2 be irreducible elements in an integral domain R such that none are associates to any
of the others and p1 p2 = q1 q2 . Prove that p1 q1 q2 and p1 p2 q1 do not have a greatest common divisor.
21. Prove that if a least common multiple m of a and b exists in an integral domain, then (m) is the
largest principal ideal contained in (a) ∩ (b).
22. Prove Proposition 6.1.16. [Hint: For one direction of the if and only if statement, see Proposi-
tion 2.1.16.]

In Exercises 6.1.23 through 6.1.28 the ring R is a gcd-domain. Furthermore, for two elements a, b ∈ R, we
define gcd(a, b) as a greatest common divisor, well-defined up to multiplication of a unit and we also define
lcm(a, b) as a least common multiple, well-defined up to multiplication of a unit.
23. Prove that gcd(a, b) lcm(a, b) ' ab for all nonzero a, b ∈ R.
24. Prove that gcd(a, gcd(b, c)) ' gcd(gcd(a, b), c) and also that lcm(a, lcm(b, c)) ' lcm(lcm(a, b), c) for
all nonzero a, b, c ∈ R.
25. Prove that gcd(ac, bc) ' gcd(a, b)c for all nonzero a, b, c ∈ R. Prove also that lcm(ac, bc) ' lcm(a, b)c.
26. Prove that gcd(a, b) ' 1 and gcd(a, c) ' 1 if and only if gcd(a, bc) ' 1.
27. Prove that if gcd(a, b) ' 1 and a|bc, then a|c.
28. Prove that if gcd(a, b) ' 1, a|c, and b|c, then ab|c.
29. A Bézout domain is an integral domain in which the sum of any two principal ideals is again a principal
ideal. Prove that a Bézout domain is a gcd-domain.

6.2
Rings of Fractions
One way to deal with questions of divisibility in a ring is to force certain elements to be units. We
already encountered this process in the definition of the rational numbers in reference to the integers.
Most people first encounter fractions so early in their education that a precise construction was not
appropriate at that time. We give one here.
A fraction r = ab consists of a pair of integers (a, b) ∈ Z × Z∗ . However, some fractions are
considered equivalent. For example, since we use a fraction to represent a ratio, we must have

(an, bn) ∼ (a, b) for all n ∈ Z∗ .

This is not yet a good definition for a relation since it does not give a criteria for when two arbitrary
pairs are in relation. A complete expression of the equivalence relation ∼ is

(a, b) ∼ (c, d) ⇐⇒ ad = bc. (6.2)


6.2. RINGS OF FRACTIONS 275

One can see this definition follows from the requirement from the simple calculation

(a, b) ∼ (ac, bc) ∼ (ac, ad) ∼ (c, d).

Criterion (6.2) is the cross-multiplication method to identify equal fractions.


The set Q of rational numbers is defined as the equivalence classes of pairs Z × Z∗ with the
equivalence relation ∼ given in (6.2). We write ab as the equivalence class of the pair (a, b). Without
belaboring the reasons for the arithmetic operations, the operations on fractions are
a c ad + bc ac ac
+ = and = . (6.3)
b d bd bd bd

6.2.1 – Issues with Dividing by Zero


Before we attempt to generalize the construction of Q from Z as much as possible to arbitrary
commutative rings, we discuss a few issues with dividing by 0 or even defining a division by 0.
We begin with two important comments as to why we do not divide by 0 or even force a division by
0 in fractions of integers. Suppose that we attempted to make sense of fractions ab with (a, b) ∈ Z×Z.
We would have an equality of fractions
a c
=
0 d
if and only if ad = 0. This leads to two cases.
If a = 0, then we would have a situation in which 00 = dc for all c, d ∈ Z. This would be
undesirable (or at least uninteresting) since then every fraction would be equivalent to the fraction
0
0 and thus every fraction would be equivalent to every other fraction and the set of fractions would
consist of the single element 00 . Consequently, we do not bother trying to construct fractions 00 .
If d = 0 in the product ad = 0, then we would have a0 = 0b for all a, b ∈ Z∗ . Allowing or forcing
this division is not completely uninteresting. In fact, this is reminiscent of real projective space
discussed in Example 1.3.9. We could define an equivalence relation ∼ on Z2 − {(0, 0)} by

(a, b) ∼ (c, d) ⇐⇒ ad = bc.

Then the set of equivalence classes consists of all fractions along with one more element, the equiva-
lence class of (1, 0), which contains every pair (a, 0). This is sometimes called the integral projective
line and instead of writing the equivalence class of (a, b) with the fraction notation, they are written
as (a : b). We can also think of this as the set of all integer ratios. Even if this set may be interesting
for certain applications, it does not carry a ring structure with the usual addition and multiplication
because multiplication is not defined for (1 : 0) × (0 : 1).
Consequently, any construction of fractions that allows a 0 in the denominator either reduces the
ring down to the trivial ring of one element or does not produce a ring structure. Despite this, it is
possible to define rings of fractions in which the denominators are zero divisors.

6.2.2 – Rings of Fractions


Let R be a commutative ring and let D be any nonempty subset of R that does not contain 0 and is
closed under multiplication. D may contain zero divisors, but since 0 ∈ / D, it cannot contain both
a and b when ab = 0. The set D will be the set of denominators.
We define a relation ∼ on R × D such that (a, d) ∼ (au, du) for any u ∈ D. As mentioned in
the introduction to this section, this does not directly define the relation between any two pairs in
R × D. Given any two pairs (a, d1 ) and (b, d2 ) we have

(a, d1 ) ∼ (ad2 , d1 d2 ) and (b, d2 ) ∼ (bd1 , d1 d2 ).

Because the cancellation law does not apply in rings with zero divisors, it is possible for ad2 u = bd1 u
for some u without ad2 = bd1 . So the relation ∼ on R × D can be defined symmetrically by

(a, d1 ) ∼ (b, d2 ) ⇐⇒ (ad2 − bd1 )u = 0 for some u ∈ D. (6.4)


276 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS

Proposition 6.2.1
The relation ∼ given in (6.4) is an equivalence relation on R × D.

Proof. For all (a, d) ∈ R × D, ad − ad = 0 so there does exist a u ∈ D (any u in D) such that
(ad − ad)u = 0. Hence, ∼ is reflexive.
Suppose that (a, d1 ) ∼ (b, d2 ). Then (ad2 − bd1 )u = 0 for some u ∈ D. Then

−(ad2 − bd1 )u = (bd1 − ad2 )u = 0

so (b, d2 ) ∼ (a, d1 ), which shows that ∼ is symmetric.


Now suppose that (a, d1 ) ∼ (b, d2 ) and (b, d2 ) ∼ (c, d3 ). Then for some u, v ∈ D,

(ad2 − bd1 )u = 0 and (bd3 − cd2 )v = 0.

Hence, multiplying the first equation by vd3 and the second by ud1 , we get (ad2 d3 − bd1 d3 )uv = 0
and (bd3 d1 − cd2 d1 )uv = 0. Adding these, and canceling bd1 d2 uv, we deduce that

0 = (ad2 d3 − cd2 d1 )uv = (ad3 − cd1 )d2 uv.

Since d2 uv ∈ D, we deduce that (a, d1 ) ∼ (c, d3 ). 

Definition 6.2.2
Let R be a commutative ring. Let D be any nonempty subset of R that does not contain
0 and is closed under multiplication. The set of fractions of R with denominators in D
is the set of ∼-equivalence classes on R × D. We denote this set by D−1 R and write the
a
equivalence class for (a, d) by .
d

Theorem 6.2.3
Let R be a commutative ring and D a nonempty multiplicatively closed set that does not
contain 0. The operations defined in (6.3) are well-defined on D−1 R (i.e., are independent
of choice of representative for a given equivalence class). Furthermore, these operations
give D−1 R the structure of a commutative ring with a 1 6= 0.

Proof. Suppose that da1 = db2 and dr3 = ds4 are elements in D−1 R with (ad2 − bd1 )u = 0 and
(rd4 − sd3 )v = 0 for some u, v ∈ D. Adding the fractions gives
a r ad3 + rd1 b s bd4 + sd2
+ = and + = .
d1 d3 d1 d3 d2 d4 d2 d4
To see that these two fractions are equal, note that

(ad3 + rd1 )d2 d4 − (bd4 + sd2 )d1 d3 uv
= (ad2 − bd1 )ud3 d4 v + (rd4 − sd3 )vd1 d2 u
= 0d3 d4 v + 0d1 d2 u = 0.

Multiplying the two fractions gives


a r ar b s bs
× = and × = .
d1 d3 d1 d3 d2 d4 d2 d4
To see that these two fractions are equal, note that

(ard2 d4 − bsd1 d3 )uv = (ad2 u)(rd4 v) − (bd1 u)(sd3 v)


= (bd1 u)(sd3 v) − (bd1 u)(sd3 v) = 0.
6.2. RINGS OF FRACTIONS 277

That addition and multiplication are commutative on D−1 R follows from the commutativity of
addition and multiplication on R. The addition of fractions is associative with
a b c ad2 d3 + bd1 d3 + cd1 d2
+ + =
d1 d2 d3 d1 d2 d3
regardless of which + is performed first. Similarly, the multiplication of fractions is associative.
For any d ∈ D, an element of the form d0 is the additive unit in D−1 R because
a 0 ad + 0 ad a
0
+ = 0
= 0 = 0.
d d dd dd d
a −a d
The additive inverse to d is just d . Furthermore, any element of the form d satisfies
a d ad a
× = 0 = 0
d0 d dd d
for all da0 ∈ D−1 R so dd is a multiplicative unit.
The only remaining axiom, distributivity of × over +, is left as an exercise for the reader. (See
Exercise 6.2.1.) 

Definition 6.2.4
We call D−1 R, equipped with + and × as given in (6.3), the ring of fractions of R with
denominators in D.

We emphasize that the construction given in Theorem 6.2.3 directly generalizes the construction
of Q from Z where R = Z and D = Z∗ . It is not hard to show that if we had taken D = Z>0 , we
would get a ring of fractions that is isomorphic to Q.
There is always a natural homomorphism ϕ : R → D−1 R given by ϕ(r) = rd d for some d ∈ D.
rd rd0 0
By the equivalence of fractions, d = d0 for any d, d ∈ D so the choice of d ∈ D is irrelevant for the
definition of ϕ. In the case of Z and Q, the homomorphism is ϕ(n) = n1 . However, in the general
situation, we must define ϕ as above because D does not necessarily contain 1.
Though the natural homomorphism Z → Q is injective, the function ϕ is not necessarily injective
sd0
in the case of arbitrary commutative rings. For r, s ∈ R, ϕ(r) = ϕ(s) gives rd d = d0 for some
d, d0 ∈ D, which implies that
(rdd0 − sd0 d)u = 0 ⇐⇒ (r − s)dd0 u = 0
for some u ∈ D. If D contains no zero divisors, then ϕ(r) = ϕ(s) implies that r − s = 0 so r = s
and hence ϕ is injective. Conversely, if D does contain a zero divisor d with bd = 0 and b 6= 0, then
bd 0
ϕ(b) = = = ϕ(0)
d d
and ϕ is not injective. This discussion establishes the following lemma.

Lemma 6.2.5
The function ϕ : R → D−1 R defined by ϕ(r) = rd d for some d ∈ D, is injective if and only
if the multiplicatively closed subset D contains no zero divisors.

Note that when R is an integral domain, the condition in Lemma 6.2.5 is always satisfied.

Proposition 6.2.6
Let R be a commutative ring and D a multiplicatively closed subset that does not contain
0 or any zero divisors. Then D−1 R contains a subring isomorphic to R. Furthermore, in
this embedding of R in D−1 R, every element of D is a unit in D−1 R.
278 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS

Proof. By Lemma 6.2.5, the function ϕ is injective so by the First Isomorphism Theorem, R is
2
isomorphic to Im ϕ. Let d be any element in D. We can view d in D−1 R as the element dd . But
2 3
D−1 R contains the element dd2 and it is easy to see that dd × dd2 = dd3 = 1 in D−1 R. 

Example 6.2.7. Let R = Z and let D = {1, a, a2 , a3 , . . .} for some positive integer a. Then D−1 R
consists of all the fractions
n
with n ∈ Z and k ∈ N.
ak
This is a ring that is also a subring of Q. Recall that we denote this ring by Z a1 .
 
4

Example 6.2.8. Let R = Z[x] and let D = {(1 + x)n }n≥0 . The ring D−1 R consists of all rational
expressions of the form
p(x)
(1 + x)n
where p(x) ∈ Z[x] and n ∈ N. The units in this ring are all polynomials of the form ±(1 + x)k for
k ∈ Z. The ring has no zero divisors. 4

  construction. If R is a commutative
Examples 6.2.7 and 6.2.8 are particular examples of a general
ring and a is an element that is not nilpotent, then R a1 is the ring of fractions D−1 R where
D = {a, a2 , a3 , . . .}.
If R is an integral domain then the set D = R − {0} is multiplicatively closed and does not
contain 0. In D−1 R, every nonzero element of R becomes a unit, so D−1 R is a field.

Definition 6.2.9
If R is an integral domain, and if D = R − 0, then D−1 R is called field of fractions of R.

Since the natural homomorphism ϕ : R → D−1 R is an injection when R is an integral domain,


we regularly view R as a subring of its field of fractions F .
The motivating example for this section is the construction of the rational numbers. Note that
the set of rational numbers Q is the field of fractions of the integral domain Z.
√ √
Example 6.2.10. Consider the integral domain Z[ 2]. The field of fractions F of Z[ 2] consists
of expressions of the form √
a+b 2
√ with a, b, c, d ∈ Z.
c+d 2
It is clear that F is a subring of R. Notice that with integers a, b, c, d ∈ Z,
√ √
ad + bc 2 ad bc 2 a c√
= + = + 2.
bd bd bd b d

Hence, Q[ 2] ⊆ F . On the other hand, by equivalences of fractions,
√ √ √ √
a+b 2 (a + b 2)(c − d 2) (ac − 2bd) + (bc − ad) 2
√ = = .
c+d 2 c2 − 2d2 c2 − 2d2
√ √
Thus, F ⊆ Q[ 2], so in fact F = Q[ 2]. 4

Example 6.2.11 (Rational Expressions). Let R be an integral domain. By Proposition 5.2.6,


R[x] is an integral domain as well. The field of fractions of R[x] is the set of rational expressions,
 
p(x)
p(x) ∈ R[x] and q(x) ∈ R[x] − {0} .
q(x)

This field of rational expressions over R is usually denoted by R(x), in contrast to R[x]. 4
6.2. RINGS OF FRACTIONS 279

Viewing an integral domain R as a subring in its field of fractions F , for a, b ∈ R with a 6= 0, the
element a divides b if and only if ab ∈ R.
Though some texts only discuss rings of fractions in the context of integral domains, the defi-
nitions provided in this section only require R to be commutative. The following example presents
a ring of fractions in which the denominator allows for zero divisors. Note, however, that since D
must be multiplicatively closed and not contain 0, then D cannot contain nilpotent elements.

Example 6.2.12. Let R be the quotient ring R = Z[x]/(x2 − 1). (For simplicity, we omit the over-
line in a + bx notation.) We see that x − 1 and x + 1 are zero divisors. Consider the multiplicatively
closed set D = {(1 + x)k | 0 ≤ k}. In R, we have x2 − 1 = 0, so x2 = 1. Then

(1 + x)2 = x2 + 2x + 1 = 1 + 2x + 1 = 2(x + 1).

Consequently, (1 + x)k = 2k−1 (x + 1) for k ≥ 1.


a + bx a + bx
Then the ring of fractions D−1 R consists of expressions or n for n ≥ 0.
1 2 (1 + x)
In D−1 R, we have the unexpected equality of fractions

x−1 x2 − 1 0 0
= = = .
(x + 1)n (x + 1)n+1 (x + 1)n+1 1

Then, for any fraction in D−1 R, we have

a + bx a + b + b(x − 1) a+b b(x − 1) a+b


k
= k
= k
+ k
= .
(x + 1) (x + 1) (x + 1) (x + 1) (x + 1)k

Now consider the function f : Z 12 → D−1 R given by f 2an = (x+1) a


  
n for all n ∈ N. Then

2n a + 2 m b 2n a + 2m b 2n a(x + 1) + 2m b(x + 1)
   
a b
f m
+ n =f = =
2 2 2m+n (x + 1)m+n (x + 1)m+n+1
n m
2 a(x + 1) 2 b(x + 1) a(x + 1)n+1 b(x + 1)m+1
= + = +
(x + 1)m+n+1 (x + 1)m+n+1 (x + 1)m+n+1 (x + 1)m+n+1
 
a b  a  b
= m
+ n
=f m +f .
(x + 1) (x + 1) 2 2n

Furthermore,
     a   b 
a b ab ab a b
f =f = = =f m f
2m 2n 2m+n (x + 1)m+n (x + 1)m (x + 1)n 2 2n
a+bx a+b
so f is a ring homomorphism. Since (x+1) k = (x+1)k , then f is surjective. However, the kernel of f

consists of all a/2k such that a/(x + 1)k = 0/1 in D−1 R, which means a = 0. Thus, Ker f = {0} so
f is injective and thus an isomorphism. We have shown that D−1 R is isomorphic to Z 12 .
Even though R contains zero divisors, the ring of fractions D−1 R = Z 12 is an integral domain.
 

By Proposition 5.1.12, zero divisors are not units in a ring. However, taking the ring of fractions
construction forced the zero divisor (x + 1) to become a unit. In the process, the element x − 1
became 0 under the usual function ϕ : R → D−1 R. 4

The above example illustrates that if D contains a zero divisor a with ab = 0 for some element
b, in the usual homomorphism ϕ : R → D−1 R, ϕ(a) is a unit and ϕ(b) = 0.
The last example of the section is an important case of ring of fractions.

Example 6.2.13 (Localization). Let R be a commutative ring and let P be a prime ideal. The
subset D = R − P is multiplicatively closed. Indeed, since P is a prime ideal ab ∈ P implies a ∈ P or
b ∈ P . Taking the contrapositive of this implication statement gives (and careful to use DeMorgan’s
280 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS

Law), a ∈ / P and b ∈ / P implies ab ∈ / P . This means precisely a ∈ D and b ∈ D implies ab ∈ D,


which means that D is multiplicatively closed.
The ring of fractions created from D−1 R, where D = R − P and P is a prime ideal in R, is called
the localization of R by P and is denoted RP .
In the exercises, the reader is guided to prove that RP contains a unique maximal ideal, which
means that RP is a local ring. (See Exercise 6.2.14.)
As an explicit example, the ring
na o
Z(19) = a, b ∈ Z with 19 - b
b
is the localization of the integers Z by the prime ideal (19). 4

Exercises for Section 6.2


1. Prove that × is distributive over + in any ring of fractions D−1 R.
2. Let D = {2a 3b | a, b ∈ N} as a subset of Z. Prove that D−1 Z is isomorphic to Z 16 even though
 

D 6= {1, 6, 62 , . . .}.
3. Let D = {2a 3b 5c | a, b, c ∈ N} as a subset of Z. Prove that D−1 Z is isomorphic to Z 30
1
.
4. Let D be the subset in Z of all positive integers that are products of powers of primes of the form
4k + 1. Prove that D is multiplicatively closed. Prove also that D−1 Z is neither isomorphic to Z n1
 

for any integer n nor isomorphic to Z(p) for a prime number p.


5. Let R = Z/12Z and let D = {3, 9}. Determine the number of elements in the ring of fractions D−1 R
and exhibit a unique representative for all the fractions in D−1 R.
a
6. Let R = Z/100Z and let D = {2 | a ≥ 1}. Determine the number of elements in the ring of fractions
D R and describe a unique representative for all the fractions in D−1 R.
−1

7. Prove that if D contains only units, then D−1 R is isomorphic to R.


8. Let R be an integral domain and let F be its field of fractions. Prove that as rings of rational
expressions F (x) = R(x).
9. Let R be a commutative ring and let a ∈ R − {0} be an element that is not nilpotent, i.e., am 6= 0 for
all m ∈ N. Set
D1 = {a, a2 , a3 , . . .} and D2 = {ak , ak+1 , ak+2 , . . .},
rak−1
where k is any positive integer. Prove that D1−1 R is isomorphic to D2−1 R. [Hint: Map r
am
to am+k−1
.]
10. Let R = Z[x]/(x2 − (m + n)x k
h + imn) for some integers m 6= n. Let D = {(x − n) | k ∈ N}. Prove that
D−1 R is isomorphic to Z 1
m−n
.
11. Let D be a multiplicatively closed subset of a commutative ring R, that does not contain the 0.
Suppose that a is a zero divisor in D with ab = 0 for some element b ∈ R. If ϕ : R → D−1 R is the
standard mapping of a ring into the ring of fractions D−1 R, show that ϕ(a) is a unit and ϕ(b) = 0.
12. Prove that the ideals of D−1 R are in bijection with the ideals of R that do not intersect D.
13. Let ϕ : R → D−1 R be the standard homomorphism of a commutative ring into a ring of fractions.
Prove that if I is a principal ideal in D−1 R, then ϕ−1 (I) is also a principal ideal.
14. Let R be a commutative ring and let P be a prime ideal. (a) Prove that the set of nonunits in RP is
the ideal PP . (b) Deduce that RP has a unique maximal ideal.
15. Consider the ring R = R[x, y] and the prime ideal P = (x, y). Prove that the elements in the
localization RP are rational expressions of the form
r(x, y)
for p(x, y), q(x, y), r(x, y) ∈ R[x, y].
1 + xp(x, y) + yq(x, y)
16. Let F [[x]] be the ring of formal power series with coefficients in a field F . We denote the field of
fractions of F [[x]] by F ((x)). Prove that the elements of F ((x)) can be written as
X
ak xk for some integer N .
k≥−N

[Such series are called formal Laurent series and F ((x)) is called the field of formal Laurent series
over F . See also Exercise 5.4.29.]
6.3. EUCLIDEAN DOMAINS 281

6.3
Euclidean Domains
As mentioned in its introduction, this chapter gathers together topics related to divisibility. In the
previous section, we discussed rings of fractions, a construction that forces certain elements to be
units. In particular, if R is an integral domain, a nonzero element a divides b if and only if, in the
field of fractions, ab is in the subring R. However, much of the theory of divisibility of integers does
not rely on the ability to take fractions of any nonzero elements.
This section introduces Euclidean domains, rings in which it is possible to perform something
akin to the integer division algorithm.

6.3.1 – Definition

Definition 6.3.1
Let R be an integral domain. A Euclidean function on R is any function d : R − {0} → N
such that
(1) For all a, b ∈ R with a 6= 0, there exist q, r ∈ R such that

b = aq + r with r = 0 or d(r) < d(a). (6.5)

(2) For all nonzero a, b ∈ R, d(b) ≤ d(ab).


R is called a Euclidean domain if it has a Euclidean function d.

We call any expression of the form (6.5) a Euclidean division of b by a. It is not uncommon to
call q a quotient and r a remainder in the Euclidean division.
The Integer Division Theorem (Theorem 2.1.7) states that for a, b ∈ Z with a 6= 0 there exist
unique q, r such b = aq + r and 0 ≤ r < |a|. The above definition does not require uniqueness of q
and r, simply existence. The Integer Division Theorem establishes that the integers are a Euclidean
domain with d(x) = |x|. However, according to Definition 6.3.1, for any given a and b either b = aq
or, the two possibilities of b = aq + r with 0 < r < |a| or b = aq 0 + r0 with −|a| < r0 < 0. In these
latter two possibilities, r0 = r − |a| and q 0 = q + sign(a).
Example 6.3.2 (Gaussian Integers). This example shows that the ring of Gaussian integers Z[i]
is a Euclidean domain with Euclidean function d(z) = |z|2 .
First note that for all nonzero α, β ∈ Z[i],
d(αβ) = |αβ|2 = |α|2 |β|2 = d(α)d(β).
Since d(α) ≥ 1 for all nonzero α, then d(β) ≤ d(αβ).
Let α, β ∈ Z[i] with α 6= 0. The ring Z[i] is a subring of the field C. When we divide β by α as
complex numbers, the result is of the form r + si where r, s ∈ Q. Let p and q be the closest integers
to the r and s respectively so that |p − r| ≤ 21 and |q − s| ≤ 12 . Then
β = (p + qi)α + ρ
β 1 1 1
where ρ ∈ Z[i]. Let θ = α − (p + qi) as an element in Q[i]. Then αθ = ρ and also d(θ) ≤ + = .
4 4 2
Hence, d(ρ) = d(αθ) = d(θ)d(α) ≤ 12 d(α). In particular, d(ρ) < d(α).
This establishes that d(z) = |z|2 is a Euclidean function and that Z[i] is a Euclidean domain.
We illustrate this Gaussian integer division with two explicit examples. Let β = 19 − 23i and
α = 5 + 3i. In Q[i], we have
β 13 86
= − i.
α 17 17
282 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS

The closest integers to the real and imaginary parts of this ratio are p = 1 and q = −5. Then

ρ = β − α(p + qi) = 19 − 32i − (5 + 3i)(1 − 5i) = −2 − i

and we observe that d(−2 − i) = 5 < d(5 + 3i) = 34.


As another example, let γ = 23 + 24i and keep α = 5 + 3i. In Q[i], we have
γ 11 3
= + i.
α 2 2
In this case, we have two options for p and two options for q. All four options give us the following
four possible Euclidean divisions:

23 + 24i = (5 + 3i)(5 + i) + (1 + 4i),


23 + 24i = (5 + 3i)(5 + 2i) + (4 − i),
23 + 24i = (5 + 3i)(6 + i) + (−4 + i),
23 + 24i = (5 + 3i)(6 + 2i) + (4 − i).

In the four possibilities, the different remainders are the four associates to 1 + 4i and have the
Euclidean function d(ρ) = 17 < d(α) = 34. 4

The Euclidean function in a Euclidean domain R offers some connection between R and the ring
of integers. This connection, as loose as it is, is enough to establish some similar properties between
R and Z. The following proposition is one such example.

Proposition 6.3.3
Let I be an ideal in a Euclidean domain R. Then I = (a) where a is an element of minimum
Euclidean function value in the ideal. In particular, every Euclidean domain is a principal
ideal domain.

Proof. Let d be the Euclidean function of R and consider S = {d(c) | c ∈ I}. By the well-ordering
principle, since S is a subset of N, it contains a least element n. Let a ∈ I be an element such that
d(a) = n.
Clearly (a) ⊆ I since a ∈ I. Now let b be any element of I and consider the Euclidean division
of b by a. We have b = aq + r where r = 0 or d(r) < d(a). Since a, b ∈ I, then r = b − aq ∈ I so
by the minimality of d(a) in S it is not possible for d(r) < d(a). Hence, r = 0, which implies that
b = aq and hence b ∈ (a). This shows that I ⊆ (a) and thus I = (a). 

6.3.2 – Euclidean Algorithm


Euclidean domains derive their name from the ability to perform the Euclidean algorithm between
two elements a, b ∈ R. The Euclidean algorithm involves performing successive Euclidean divisions
in the following way. Set b = r0 and a = r1 . Then

b = q1 a + r2 , where r2 = 0 or d(r2 ) < d(a)


..
.
ri−1 = qi ri + ri+1 , where ri+1 = 0 or d(ri+1 ) < d(ri ) (6.6)
..
.
rn−1 = qn rn .

This process must terminate because d(r1 ) = d(a) is finite and because the sequence d(ri ) is a
strictly decreasing sequence of nonnegative integers. It is possible that rn−2 = qn−1 rn−1 + rn with
d(rn ) = 0 but rn 6= 0; in this case, the axioms of Euclidean domain force rn+1 = 0.
6.3. EUCLIDEAN DOMAINS 283

Unlike the Euclidean Algorithm on the integers, there is no condition on the uniqueness of the
elements that are involved in the Euclidean divisions in (6.6). As with integers, the Euclidean
Algorithm leads to the following important theorem.

Theorem 6.3.4
Let R be a Euclidean domain and let a and b be nonzero elements of R. Let r be the last
nonzero remainder in the Euclidean algorithm. Then r is a greatest common divisor of a
and b. Furthermore, r can be written as r = ax + by where x, y ∈ R.

Proof. Let r = rn be the final nonzero remainder in the Euclidean Algorithm. Then from the final
step r|rn−1 . Suppose that r|rn−i and r|rn−(i+1) . Then since
rn−(i+2) = qn−(i+1) rn−(i+1) + rn−i ,
by Exercise 6.1.2, r divides rn−(i+2) . By induction, r divides rn−i for all i = 0, 1, . . . , n. Thus, r
divides both a and b.
Now let s be any common divisors of a and b. Repeating an induction argument but starting at
the beginning of the Euclidean Algorithm, it is easy to see that s divides rk for all k = 0, 1, . . . , n.
In particular, s divides r. This proves that r is a greatest common divisor of a and b.
Again using an induction argument, starting from the beginning of the Euclidean Algorithm, it
is easy to see that rk ∈ (a, b), the ideal generated by a and b. Hence, r ∈ (a, b) and thus r = ax + by
for some x, y ∈ R. 
Example 2.1.13 illustrated the use of the Extended Euclidean Algorithm in Z to find x, y ∈ Z
such that r = ax + by. The process described there carries over identically to Euclidean domains.

6.3.3 – Polynomial Rings over a Field


Another important example of Euclidean domains are polynomial rings over a field F . The following
theorem not only establishes that F [x] is a Euclidean domain with the degree deg : F [x] − {0} → N
as the Euclidean function, but that the quotient and remainder of the Euclidean division are unique.

Theorem 6.3.5
Let F be a field. Then for all a(x), b(x) ∈ F [x] with b(x) 6= 0, there exist unique polynomials
q(x) and r(x) in F [x] such that

b(x) = a(x)q(x) + r(x) with r(x) = 0 or deg r(x) < deg a(x). (6.7)

Proof. First note that if b(x) = 0 then q(x) = r(x) = 0 and we are done. We need to show
that this is the unique solution to (6.7). If q(x) 6= 0, then a(x)q(x) 6= 0. Since r(x) = 0 or
deg r(x) < deg a(x) ≤ deg(a(x)q(x)), then adding r(x) to a(x)q(x) cannot cancel out the leading
term of a(x)q(x). Hence, a(x)q(x) + r(x) 6= 0. This is a contradiction, so we must have q(x) = 0
and thus r(x) = 0 also. From now on, assume that b(x) 6= 0.
Suppose that deg b(x) < deg a(x). Obviously, the choice of polynomials of q(x) = 0 and r(x) =
b(x) satisfies (6.7) but we need to show that this solution is unique. Since deg q(x)a(x) = deg q(x) +
deg a(x) ≥ deg a(x) and since we impose the condition that r(x) = 0 or that deg r(x) < deg a(x), then
for any polynomials q(x) and r(x) with q(x) 6= 0, we have deg(q(x)a(x)+r(x)) ≥ deg a(x) > deg b(x).
Hence, if deg b(x) < deg a(x) then we must have q(x) = 0 and b(x) = r(x).
Suppose now that deg b(x) ≥ deg a(x). Consider polynomials of the form b(x) − a(x)q(x). Then
for distinct polynomial q1 (x) 6= q2 (x) polynomials in F [x] we have
(b(x) − a(x)q1 (x)) − (b(x) − a(x)q2 (x)) = a(x)(q2 (x) − q1 (x)).
Hence, any two distinct polynomials of the form b(x) − a(x)q(x) differ by a polynomial with degree
at least deg a(x).
284 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS

If a(x) | b(x) then there exists d(x) with b(x) = a(x)d(x). This would mean that r(x) = 0. So
q(x) = d(x) and r(x) = 0 is a solution to (6.7). We need to show that it is unique. If we can write
b(x) = a(x)q(x) − r(x) in some other way, then
r(x) = (b(x) − a(x)q(x)) − (b(x) − a(x)d(x))
would be a polynomial of degree at least deg a(x). This contradicts the conditions in (6.7). Hence,
when a(x) | b(x) the solution for (6.7) is unique.
Now assume that deg b(x) ≥ deg a(x) and a(x) does not divide b(x). Consider the set of non-
negative integers {deg(b(x) − q(x)a(x)) | q(x) ∈ F [x]}. By the Well-Ordering Principle, this set has
a least element. Let r0 (x) = b(x) − a(x)q0 (x) be a polynomial of the form b(x) − q(x)a(x) of least
degree. Suppose that deg r0 (x) ≥ deg a(x). Then r0 (x) − LT(r0 (x))/LT(a(x))a(x) has degree lower
than r0 (x) because the subtraction cancels the leading term of r0 (x). Thus,
 LT(r0 (x)) 
r2 (x) = b(x) − q(x) + a(x)
LT(a(x))
has degree strictly lower than r0 (x), which contradicts the condition that r0 (x) has minimal degree
among polynomials of the form b(x) − a(x)q(x). Hence, we can conclude that the polynomial r0 (x)
has degree strictly lower than a(x).
Suppose that r1 (x) and r2 (x) are two distinct polynomials of minimal degree of the form b(x) −
a(x)q(x). We must have deg r1 (x) = deg r2 (x). Write ri (x) = b(x) − a(x)qi (x). Suppose that k is
the highest degree where the terms of r1 (x) and r2 (x) differ. Let ci be the coefficient of the kth
degree term of ri (x). Then
   
c2 b(x) − a(x)q1 (x) − c1 b(x) − a(x)q2 (x) = c2 r1 (x) − c1 r2 (x)

has degree strictly lower than deg r1 (x) = deg r2 (x). Then
1  1 
b(x) − a(x) c2 q1 (x) − c1 q2 (x) = c2 r1 (x) − c1 r2 (x) .
c2 − c1 c2 − c1
Thus, we have shown that there is a unique polynomial r0 (x) of the form b(x) − a(x)q(x) of least
degree. Furthermore, the corresponding q0 (x) = q(x) is unique also. Since any two polynomials of
the form b(x) − a(x)q(x) differ by a polynomial of degree at least deg a(x), this polynomial r0 (x)
is in fact the unique polynomial of the form b(x) − a(x)q(x) with degree strictly less than deg a(x).
The theorem follows. 

Corollary 6.3.6
For any field F , the polynomial ring F [x] is a Euclidean domain with deg as the Euclidean
function.

Proof. The only thing that Theorem 6.3.5 did not establish is that for any two nonzero polynomials
a(x), b(x) ∈ F [x], deg b(x) ≤ deg(a(x)b(x)). However, this follows from the fact that the degree of a
nonzero polynomial is nonnegative and that in F [x],
deg(a(x)b(x)) = deg a(x) + deg b(x). 
Theorem 6.3.3 establishes that the polynomial ring F [x] is also a principal ideal domain. In
contrast, note that in the ring Z[x], the ideal (2, x) is not principal so also by Theorem 6.3.3, the
ring Z[x] is not a Euclidean domain.
An important consequence of this proposition is that it characterizes irreducible polynomials.

Proposition 6.3.7
Let F be a field and p(x) ∈ F [x]. The polynomial p(x) is irreducible if and only if
F [x]/(p(x)) is a field.
6.3. EUCLIDEAN DOMAINS 285

Proof. (Left as an exercise for the reader. See Exercise 6.3.18.) 

The Euclidean division in F [x] is called polynomial division. The proof of Theorem 6.3.5 is
nonconstructive; the existence of q(x) and r(x) follow from the Well-Ordering Principle of the integers
but the proof does not illustrate how to find q(x) and r(x). Polynomial division is sometimes taught
in high school algebra courses but without justification of why it works. We review polynomial
division here for completeness.
Constructive polynomial division relies on the following fact. If a(x), b(x) ∈ F [x] are such that
deg b(x) ≥ deg a(x), then LT(b(x))/LT(a(x)) is a monomial and

LT(b(x))
p1 (x) = b(x) − a(x)
LT(a(x))

has a degree lower than b(x) because the leading term of b(x) cancels out. By repeating this process
on p1 (x) to obtain p2 (x) and so on, we ultimately obtain a polynomial that is 0 or is of degree less
than a(x). We illustrate the division with two examples.

Example 6.3.8. Consider the polynomials a(x) = 2x2 + 1 and b(x) = 3x4 − 2x + 7 in Q[x]. At the
first step, we find a monomial by which to multiply a(x) in order to obtain the leading term of b(x).
This is 23 x2 . As with long division in base 10, we put the monomial on top of the quotient bar:

3 2
2x
 .
2x2 + 1 3x4 −2x+7

Then we subtract 23 x2 (2x2 + 1) from b(x):

3 2
 2x

2x2 + 1 3x4 −2x+7


.
− 3x4 + 23 x 2


− 32 x2 −2x+7

At this stage, we multiply a(x) by the monomial LT(− 32 x2 − 2x + 7)/LT(a(x)) = − 43 to get a


polynomial whose leading term is the same as the leading term of − 32 x2 − 2x + 7. We add the
monomial − 34 to the quotient terms:

3 2
2x − 34

2x2 + 1 3x4 −2x+7
− 3x4 + 32 x 2

.
− 23 x2 −2x+7
− − 23 x2 − 34


−2x+ 31
4

This polynomial division algorithm ends at this stage since deg −2x + 31 < deg(2x2 + 1). This

4
work shows that    
3 2 3 31
3x4 − 2x + 7 = (2x2 + 1) x − + −2x +
2 4 4
is the polynomial division of b(x) by a(x). 4
286 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS

Example 6.3.9. As another example, we perform the polynomial division of 4x4 + x3 + 2x2 + 3 by
3x2 + 4x + 1 in F5 [x]:

 3x2 +3x +4
2
3x + 4x + 1 4x +x +2x2
4 3
+3
−(4x4 +2x3 +3x2 )
4x3 +4x2 +3
3 2 .
−(4x +2x +3x)
2x2 +2x +3
−(2x2 +x +4)
x +4

We read this as 4x4 + x3 + 2x2 + 3 divided by 3x2 + 4x + 1 in F5 [x] has quotient q(x) = 3x2 + 3x + 4
and remainder r(x) = x + 4. 4

Exercises for Section 6.3


1. Perform the Euclidean division for β = 32 + 8i divided by α = 3 + 8i.
2. Perform the Euclidean division for β = 719 − 423i divided by α = 24 − 38i.
3. Perform the Euclidean Algorithm on β = 24 + 17i and α = 13 − 16i. Deduce the generator d of the
ideal I = (α, β).
4. Perform the Euclidean Algorithm on β = 14 + 23i and α = 42 + 3i. Deduce the generator d of the
ideal I = (α, β).
5. Perform the Extended Euclidean Algorithm associated to Exercise 6.3.3 to find a linear combination
of α and β that give a greatest common divisor of α and β.
6. Perform the Extended Euclidean Algorithm associated to Exercise 6.3.4 to find a linear combination
of α and β that give a greatest common divisor of α and β.
7. Perform polynomial division of 4x4 + 3x3 + 2x2 + x + 1 by 2x2 + x + 1 in Q[x].
8. Perform polynomial division of 4x4 + 3x3 + 2x2 + x + 1 by 2x2 + x + 1 in F7 [x].
9. Perform polynomial division of 4x4 + 3x3 + 2x2 + x + 1 by 2x2 + x + 1 in F13 [x].
10. Use the Euclidean Algorithm in Q[x] to find a generator of the principal ideal

I = (2x4 + 7x3 + 4x2 + 13x − 10, 3x4 + 5x3 − 16x2 + 14x − 4).
√ √
11. Prove that Z[ 2] is a Euclidean domain with the norm N (a + b 2) = |a2 − 2b2 | as the Euclidean
function.
√ √
12. Perform the Euclidean division of β = 10 + 13 2 by α = 2 + 3 2. (See Exercise 6.3.11.)
√ √
13. Perform the Euclidean division of β = 25 − 3 2 by α = −1 + 13 2. (See Exercise 6.3.11.)
√ √
14. Prove that Z[ −2] is a Euclidean domain with the norm N (a + b −2) = a2 + 2b2 as the Euclidean
function.
15. Let R be an integral domain. Prove that R[x] is an Euclidean domain if and only if R is a field.
16. In Q[x], determine the monic greatest common divisor d(x) of a(x) = x4 −2x+1 and b(x) = x2 −3x+2.
Using the Extended Euclidean Algorithm, write d(x) as a Q[x]-linear combination of a(x) and b(x).
17. In F2 [x], determine the greatest common divisor d(x) of a(x) = x4 + x3 + x + 1 and b(x) = x2 + 1.
Using the Extended Euclidean Algorithm, write d(x) as a F2 [x]-linear combination of a(x) and b(x).
18. Prove Proposition 6.3.7. [Hint: In a PID, an element is prime if and only if it is irreducible.] Conclude
that for all nonzero polynomials p(x) ∈ F [x], the quotient ring F [x]/(p(x)) is either a field or is not
an integral domain.
19. Let R be a Euclidean domain with Euclidean function d. Let n be the least element in the set
S = {d(r) | r ∈ R − {0}}. By the well-ordering of Z, S has a least element n. Show that all elements
s ∈ R such that d(s) = n are units.
6.4. UNIQUE FACTORIZATION DOMAINS 287

20. Least Common Multiples. Let R be a Euclidean domain with Euclidean function d.
(a) Prove that in R, any two nonzero elements a and b have a least common multiple.
ab
(b) Prove that least common multiples of a and b have the form where d is a greatest common
d
divisor.

6.4
Unique Factorization Domains
The Fundamental Theorem of Arithmetic states that any integer n ≥ 2 can be written as a prod-
uct of positive prime numbers and that any such product or primes is unique up to reordering.
This property can also be stated by saying that integers have unique prime factorizations. The
Fundamental Theorem of Arithmetic is taught early in a student’s education, as soon as students
know what prime numbers are. However, like many other algebraic properties of the integers, since
students usually are not shown a proof of the Fundamental Theorem of Arithmetic it comes as a
surprise that unique factorization does not hold in every integral domain.
As we did in Section 6.3, it is common in ring theory and in all of algebra more generally, to
define a class of rings with specific properties and explore what further properties follow from the
defining characteristics. This section introduces unique factorization domains: rings that possess a
property like the Fundamental Theorem of Arithmetic.

6.4.1 – Definition and Examples

Definition 6.4.1
A unique factorization domain (abbreviated as UFD) is an integral domain R in which
every nonzero, nonunit element r ∈ R has the following two properties:
• r can be written as a finite product of irreducible elements r = p1 p2 · · · pn (not
necessarily distinct);

• and for any other product into irreducibles, r = q1 q2 · · · qm , then m = n and there is
a reordering of the qi such that each qi is an associate to pi .

A unique factorization domain is alternatively called a factorial domain.


In a UFD, we say that the factorization of an element r into irreducible elements is unique “up
to reordering and associates.” The condition of uniqueness can be restated more precisely as follows.
If, as in the above definition,
r = p1 p2 · · · pn = q1 q2 · · · qm
then m = n and there exists a permutation π ∈ Sn such that qi ' pπ(i) for all i = 1, 2, . . . , n (i.e., qi
and pπ(i) are associates).

Proposition 6.4.2
In any UFD, a nonzero element is prime if and only if it is irreducible.

Proof. By Proposition 6.1.9, every prime element is irreducible. We prove the converse in a UFD.
Let R be a UFD and let r ∈ R be an irreducible element. Suppose that a, b ∈ R and that r|ab.
Then by definition of divisibility, there exists c ∈ R such that rc = ab. Suppose that a, b, and c
have the following factorizations into irreducible elements
a = p1 p2 · · · pm , b = q1 q2 · · · qn , c = r1 r2 · · · rk .
288 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS

By definition of a UFD, since ab = rc, we have k = m + n − 1 and there is a reordering of the list
(p1 , p2 , . . . , pm , q1 , q2 , . . . , qn )
into
(r, r1 , r2 , . . . , rk )
possibly up to multiplication of units. Thus, there exists r | pi for some i = 1, . . . , m or r | qj for
some j = 1, . . . , n. If r divides some pi , then r | a and if r divides some qj , then r | b. Hence, r is a
prime element. 
Because of this proposition, in a UFD we call the factorization of an element r into irreducible
elements a prime factorization of r. Furthermore, the irreducible factors of an element are called
prime factors.
Example 6.4.3. Every field is a UFD trivially since every nonzero element is a unit. 4
Example 6.4.4. The Fundamental Theorem of Arithmetic (Theorem 2.1.22) is precisely the state-
ment that Z is a UFD. 4
Example 6.4.5. We will soon see that every Euclidean domain√is a unique factorization domain.
Hence, Z[i] is a UFD. The norm function on rings of the form Z[ D], where D is square-free, helps
in determining the prime factorization of elements. Recall that in such rings: (1) γ is a unit if and
only if N (γ) = 1, and (2) γ is irreducible if N (γ) is prime. (Note that (2) is not an if-and-only-if
statement.) With this in mind, we propose to find a prime factorization of α = 6 + 5i and then of
β = 7 − 11i.
First of all, note that N (α) = 62 + 52 = 61 is a prime so α is irreducible and we are done.
For β, we have N (β) = 72 + 112 = 170 = 2 × 5 × 17. This is not sufficient to conclude that β is
irreducible or not but it does tell us that an irreducible factor must have a norm that is a divisor
of 170. We can try to find prime factors by trial and error. Note that N (1 + i) = 2, so 1 + i is
irreducible and could possibly be a factor. Dividing β by 1 + i gives
β = (1 + i)(−2 − 9i)
so 1 + i is indeed a prime factor. The norm N (−2 − 9i) = 65 gives some possibilities on the prime
factors of −2 − 9i. We observe that N (2 + i) = 5. However,
−2 − 9i 13 16
=− − i
2+i 5 5
so 2 + i is not a prime factor. On the other hand, 2 − i, which is not an associate of 2 + i, has norm
5 and we find that
−2 − 9i
= 1 − 4i.
2−i
Hence, 2 − i is a prime factor, as is 1 − 4i since N (1 − 4i) = 17. Hence, a prime factorization of β is
7 − 11i = (1 + i)(2 − i)(1 − 4i). 4

Example 6.4.6. Example 6.1.10 shows that Z[ 10] is not a UFD. The example established that
the elements α, β, γ, δ are all irreducible, not associates of each other and that 10 = αβ = γδ. 4
Example 6.4.7. As a result of the main theorem (Theorem 6.5.5) in the next section, we will see
that F [x, y], where F is a field, is a UFD. However, it is rather easy to construct examples of subrings
of F [x, y] that are not UFDs. For example, consider the ring R = F [x2 , xy, y 2 ]. All the constants
in F − {0} are units in R and R contains no polynomials of degree 1. Consequently, x2 , xy, and y 2
are irreducible elements in R. However, x2 y 2 can be factored into irreducible elements as
(x2 )(y 2 ) = x2 y 2 = (xy)(xy).
Furthermore, xy is not an associate of either x2 or y 2 . (If it were, x or y would need to be a unit,
which is not the case.) Hence, this gives two nonequivalent factorizations of x2 y 2 . 4
6.4. UNIQUE FACTORIZATION DOMAINS 289

√ √ √
2 −5 3 (1 − −5) (1 + −5)

Figure 6.2: A visual interpretation of non-UFD


Example 6.4.8. For √ an easier example of a ring that is not a√UFD, consider R = Z[ −5]. Recall
that an element a + b −5 ∈ R is a unit if and only if N (a + b −5) = a2 + 5b2 = 1. It is not hard
to see that the only units in R are 1 and −1. It is also easy to see that there is no element γ ∈ R
such that N (γ) = 2 or N (γ) = 3. Now consider the following two factorizations of the element 6:
√ √
6 = 2 × 3 = (1 + −5)(1 − −5).
√ √ √
We have N (2) = 4, N (3) = 9, and N (1 ± −5) = 6. If 2, 3, 1 + −5 or 1 − −5 was reducible,
then there would have to exist an element γ ∈ R of norm 2 or 3. Since that is not the case, all
four elements are irreducible.
√ Furthermore,
√ since the only units are 1 and −1, neither 2 nor 3 is an
associate to either
√ 1 + −5 or 1 − −5. Hence, we have displayed two distinct factorizations of 6
and hence Z[ −5] is not a UFD. 4

The ring Z[ −5] in the above example offers an opportunity to provide a visual interpretation
of the criteria of a UFD.
Let R be an integral domain and consider the equivalence classes of associate elements. By
Proposition 6.1.4, the set of equivalence classes (R − {0})/ ' becomes a partial order under the
relation of divisibility. Consider the Hasse diagram of this partial order and define ~v (a) as the
displacement vector (in R2 ) from the location of 1 to the element a ∈ (R − {0})/ '. In particular,
~v (1) = ~0. Suppose all the irreducible elements are on a first level (a horizontal line) above the least
element 1.
If R is a UFD, then the Hasse diagram of the poset ((R − {0})/ ∼, |) can be constructed in such
a way that if a ∈ (R − {0})/ ' has a factorization into irreducibles as a = r1 r2 · · · rm , then the
vector from 1 to a is the vector

~v (a) = ~v (r1 ) + ~v (r2 ) + · · · + ~v (rm ),

regardless of the choice of vectors ~v (r) for each irreducible r. This claim holds because (1) the
Hasse diagram is for classes of associates and (2) the reordering of irreducibles is tantamount to the
commutativity of vector addition. Then each element a ∈ R is located with respect to 1 by a vector
~v (a) that is an integral linear combination of vectors of the form ~v (r) where r is an irreducible.
If R is not a UFD, then two nonequivalent factorizations of an element a ∈ R leads to an integral
linear combination of the vectors corresponding to irreducible elements. Thus, the Hasse diagram
cannot be constructed as described above unless certain linear combinations hold between the vectors
~v (ri ) corresponding to irreducible elements ri . √
For example, Figure 6.2 shows a few of the irreducible elements in Z[ −5]. Furthermore, we
placed these irreducible√elements “randomly.”
√ The edges are dashed and grayed to represent mul-
tiplication by 2, 3, 1√+ −5, or√1 − −5. The diagram has 6 located so that ~v (6) = ~v (2) + ~v (3).
However, 6 = (1 − −5)(1 + −5) is another factorization in irreducible elements, but in this
diagram √ √
~v (6) 6= ~v (1 − −5) + ~v (1 + −5).
This can be seen directly by the fact that the dashed edges do not form a parallelogram.
290 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS

6.4.2 – UFDs and Other Classes of Integral Domains


For the following important proposition, we first need a lemma about PIDs. The lemma involves
the ascending chain condition that we discuss at length in Section 12.1.

Lemma 6.4.9
If R is a PID, then every chain of ideals I1 ⊆ I2 ⊆ · · · ⊆ Ik ⊆ · · · eventually terminates,
i.e., there exists a k such that for all n ≥ k, In = Ik .

Proof. Let R be a PID and let I1 ⊆ I2 ⊆ · · · ⊆ Ik ⊆ · · · be an ascending chain of ideals. By


Exercise 5.5.30, the set

[
I= Ik
k=1

is an ideal of R. Since R is a PID, then I = (a) for some a. However, a ∈ Ik for some k. But then
I = (a) ⊂ In for all n ≥ k but since In ⊂ I we must have In = (a) = I for all n ≥ k. Hence, the
chain terminates. 

Proposition 6.4.10
Every PID is a UFD.

Proof. We first show by contradiction that every PID satisfies the first axiom for a UFD. Suppose
that r ∈ R cannot be written as a finite product of irreducibles. This tells us first that r is not
irreducible so we can write r = r1 b1 where neither factor is a unit. The assumption also implies that
at least one of these is not irreducible, say r1 . Then as ideals (r) ( (r1 ). Furthermore, we can write
r1 = r2 b2 and once again at least one of these is not an irreducible element, say r2 . Continuing with
this reasoning, we define an ascending chain of ideals

(r) ( (r1 ) ( (r2 ) ( · · ·

which never terminates. This is a contradiction by Lemma 6.4.9. Hence, we conclude that every
element in a PID can be written as a finite product of irreducible elements.
We now need to show the second axiom of UFD. One proceeds by induction on the minimum
number of elements in a decomposition of an element r. Suppose that r has a factorization into
irreducibles that has a single factor. Then r itself is an irreducible element. By the definition of
an irreducible element, if r = ab, then either a or b is a unit and hence the other element is an
associate of r. Hence, if there is a factorization with 1 element, all factorizations into irreducibles
have 1 element.
For the induction hypothesis, suppose that if r has a factorization into n irreducibles then all of
its factorizations into irreducibles involve n irreducibles and that the irreducibles can be rearranged
to be unique up to associates. Consider now an element that requires a minimum of n + 1 irreducible
factors for a factorization. Assume that we have two factorizations

r = p1 p2 · · · pn+1 = q1 q2 · · · qm ,

with m ≥ n + 1. Now p1 divides q1 q2 · · · qm or in other words, q1 q2 · · · qm ∈ (p1 ). However, in


any PID, an irreducible is a prime element so since this is a prime ideal, we must have one of the
qj ∈ (p1 ), which means qj | p1 . After renumbering we can assume that q1 divides p1 . Thus, q1 a = p1
but since p1 is irreducible and q1 is not a unit, then a is a unit. We now have

p1 p2 · · · pn+1 = ap1 q2 · · · qm =⇒ p2 p3 · · · pn+1 = aq2 q3 · · · qm

since we are in an integral domain. We apply the induction hypothesis on the factorizations
p2 · · · pn+1 = aq2 · · · qm and can conclude that the second criterion for UFD holds for n + 1. By
induction, the second axiom for UFDs hold for all n ∈ N∗ and the proposition follows. 
6.4. UNIQUE FACTORIZATION DOMAINS 291

Proposition 6.3.3 along with Proposition 6.4.10 show that some of the classes of integral domains
that we have introduced are subclasses of one another. We can summarize the containment of some
of the ring classes introduced so far with the diagram

fields ( Euclidean domains ( PIDs ( UFDs ( integral domains. (6.8)

The propositions and examples provided so far have not shown strict containment between Euclidean
domains and PIDs or between PIDs and UFDs. Example 6.5.7 gives examples of UFDs that are not
PIDs.
As a consequence of Proposition 6.4.2, prime elements and irreducible elements are equivalent in
Euclidean domains and in PIDs. Furthermore, as particular examples, since every Euclidean domain
is a unique factorization domain, the ring of Gaussian integers Z[i] and the polynomial ring F [x]
over a field F are UFDs.
In the next section, we will establish a necessary and sufficient condition on R for R[x] to be a
UFD, but we can already give a characterization of when R[x] is a PID.

Proposition 6.4.11
If R is a commutative ring such that the polynomial ring R[x] is a PID, then R is necessarily
a field.

Proof. Suppose that R[x] is a PID. Then the subring R is an integral domain. By Exercise 5.7.2,
we see that (x) is a nonzero prime ideal. We can also see this by pointing out that two polynomials
a(x) and b(x) with nonzero constant terms a0 and b0 are such that their product a(x)b(x) has the
nonzero constant term a0 b0 . Thus, a(x) ∈ / (x) and b(x) ∈ / (x) implies that a(x)b(x) ∈
/ (x). The
contrapositive of this last statement establishes that (x) is a prime ideal.
By Proposition 5.7.10, since R[x] is a PID then the nonzero prime ideal (x) is in fact a maximal
ideal. Therefore, R = R[x]/(x) is a field. 

6.4.3 – UFDs and Greatest Common Divisors


In Euclidean domains, we can find the greatest common divisor between two elements a and b by
performing the Euclidean algorithm. In a PID, if a and b are any two elements, then if (a)+(b) = (d),
then d is a greatest common divisor of a and b. Greatest common divisors also exist in UFDs and
a different method is necessary to find a greatest common divisor of two elements. Before proving
this claim, we introduce the ordr function.

Definition 6.4.12
Let R be an integral domain and let r ∈ R be an irreducible element. The function
ordr : R − {0} → N is defined by ordr (a) = n whenever n is the least positive integer such
that rn+1 does not divide a. In other words, ordr (a) = n means that rk | a for 0 ≤ k ≤ n
and rk - a if k > n.

Lemma 6.4.13
Let a and b be nonzero elements in a UFD R. Then a | b if and only if ordr (a) ≤ ordr (b)
for all irreducible elements r ∈ R.

Proof. First suppose that a | b. Let r be any irreducible element of R and let k = ordr (a). Then
rk | a and by transitivity of divisibility, rk | b. Hence, ordr (a) ≤ ordr (b).
Conversely, suppose that ordr (a) ≤ ordr (b) for all irreducible elements r. Let a = p1 p2 · · · pm
and b = q1 q2 · · · qn be factorizations into irreducible elements. Let a = p1 p2 · · · pm and b = q1 q2 · · · qn
be factorizations into irreducible elements. Let S = {r1 , r2 , . . . , r` } be a (finite) set of irreducible
292 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS

elements, none of which are associates to each other and such that for each pi and each qj is associate
to some element in S. Then we can write

a = ur1α1 r2α2 · · · r`α` and b = vr1β1 r2β2 · · · r`β` (6.9)

for some units u, v ∈ U (R) and where αk = ordrk (a) and βk = ordrk (b). By hypothesis, αk ≤ βk for
all k so
b = a (u−1 v)r1β1 −α1 r2β2 −α2 · · · r`β` −β`


and thus a | b. 

Some explanations behind the above proof are in order.


When working with factorizations into irreducibles in a UFD, it is convenient to write two
elements a and b as in (6.9). In this expression, the integers αk for 1 ≤ k ≤ ` may be 0 if rk is an
associate to no pi and just to a qj . Similarly for βk .
When considering prime factorization in Z, we typically only use the positive prime numbers
and the units can only be 1 or −1. However, in an arbitrary UFD, there may exist an infinite,
possibly uncountably infinite, number of units. Consequently, there may not be a natural manner
to select a preferred element out of each ' equivalence class. In fact, doing so would require the use
of a choice function, which invokes the Axiom of Choice, an altogether heavy-handed approach for
such a simple proof. Instead, the set S in the proof of Lemma 6.4.13 is a complete set of distinct
'-representatives on the set of irreducibles in the factorizations of a and b, which by definition is a
finite set. Hence, the above proof has no need to invoke the Axiom of Choice.

Proposition 6.4.14
Let R be a unique factorization domain. For all a, b ∈ R, there exists a greatest common
divisor of a and b.

Proof. Write a and b as in (6.9). By Lemma 6.4.13, the element


min(α1 ,β1 ) min(α2 ,β2 ) min(α` ,β` )
d = r1 r2 · · · r`

is a divisor to both a and b. Let d0 be any other common divisor of a and b. Each irreducible in a
factorization of d0 must be an associate to some pi in the factorization of a. Hence, we can write

d0 = wr1γ1 r2γ2 · · · r`γ` ,

where w is a unit. Furthermore, by Lemma 6.4.13, γk ≤ αk and γk ≤ βk . Consequently, γk ≤


min(αk , βk ) and by Lemma 6.4.13 again, d0 |d. Thus, d is a greatest common divisor. 

By Proposition 6.1.16, any two elements a and b in a UFD have a least common multiple. In a
parallel fashion, using the expressions in (6.9), it is easy to prove that
max(α1 ,β1 ) max(α2 ,β2 ) max(α` ,β` )
m = r1 r2 · · · r`

is a least common multiple of a and b. (See Exercise 6.4.7.)


Proposition 6.4.14 states that UFDs, and consequently Euclidean domains and PIDs, are a
subclass of gcd-domains. It turns out that there exist integral domains that are not UFDs but are
gcd-domains. Hence, the class of UFDs is a strict subclass of gcd-domains.

6.4.4 – Irreducible Elements in Z[i]


The ring of Gaussian integers, Z[i], is important for its applications to number theory. In this section,
we study algebraic properties of Z[i] in further depth, in particular characterize the irreducible
elements, and give a consequence for number theory.
6.4. UNIQUE FACTORIZATION DOMAINS 293

First, note that since, when taking complex conjugation, zw = z w, then an element in Z[i] is
irreducible if and only if its complex conjugate is irreducible.

Lemma 6.4.15
If an element a + bi ∈ Z[i] with ab 6= 0 is prime, then a2 + b2 is a prime number in Z.

Proof. The element a2 + b2 = (a + bi)(a − bi) is in the prime ideal (a + bi). Assume that a2 + b2 = mn
as a composite integer with factors m, n ≥ 2. Then since (a + bi) is a prime ideal, m ∈ (a + bi) or
n ∈ (a + bi). Without loss of generality, suppose that m ∈ (a + bi). The integer m cannot be an
associate of a + bi. Furthermore, there exists z ∈ Z[i] with (a + bi)z = m so zn = (a − bi). But n
is not a unit either which makes a − bi reducible. This is a contradiction and hence the assumption
that a2 + b2 is a composite integer is false. The lemma follows. 

Lemma 6.4.16
Let a + bi ∈ Z[i] with ab 6= 0. Then a + bi is prime if and only a2 + b2 = p is a prime
number with p ≡ 1 (mod 4) or p = 2.

Proof. Suppose that a + bi is prime in Z[i]. Using modular arithmetic modulo 4, it is easy to that
for a, b ∈ Z, the sum a2 + b2 is never congruent to 3 modulo 4. (See Exercise 2.2.8.) Furthermore, if
n ≡ 0 (mod 4), then n is divisible by 4, and hence is composite. By Lemma 6.4.15, we deduce that
a2 + b2 is a prime integer congruent to 1 or 2 modulo 4. However, the only prime number that is
congruent to 2 modulo 4 is 2.
Conversely, suppose that a2 + b2 = p is a prime number. Then assume that a + bi = αβ for some
α, β ∈ Z[i]. Then p = N (αβ) = N (α)N (β) and hence either N (α) = 1 or N (β) = 1. Hence, by
Proposition 6.1.7, either α or β is a unit and we deduce that a + bi is irreducible. 

Note that if a + bi is a prime element with a2 + b2 = 2, then a + bi is one of the four elements
±1 ± i.
Now we consider elements of the form a or bi. Obviously, bi is an associate to the integer b
so without loss of generality we only consider elements a + 0i ∈ Z[i]. Note that if an integer n is
composite in Z, it cannot be prime in Z[i] either. Hence, we restrict our attention to prime numbers.

Lemma 6.4.17
If p is a prime number in Z with p ≡ 3 (mod 4), then p is also prime in Z[i].

Proof. Assume p not irreducible in Z[i]. Then p = αβ with neither α not β a unit. We have
p2 = N (p) = N (α)N (β) so N (α) = N (β) = p since neither N (α) nor N (β) can be 1. However,
N (α) is the sum of two squares but (by Exercise 2.2.8) the sum of two squares cannot be congruent
to 3 modulo 4. Hence, the assumption leads to a contradiction and the lemma follows. 

Lemma 6.4.18
In Z[i], the integer 2 factors into two irreducibles 2 = (1 + i)(1 − i).

The last lemma turns out to be the most difficult and requires a reference to a result in number
theory.

Lemma 6.4.19
If p is a prime number in Z with p ≡ 1 (mod 4), then p is not prime in Z[i] but has the
prime factorization of the form p = (a + bi)(a − bi).
294 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS

Proof. In 1770, Lagrange proved that if p = 4n + 1 is a prime number in Z, then the congruence
equation x2 ≡ −1 (mod p) has a solution. In other words, there exists an integer m such that
m2 + 1 is divisible by p. ([53, Theorem 11.5]) Now consider such a p as an element on Z[i]. Note
that m2 + 1 = (m + i)(m − i). The integer p cannot divide m + i or m − i, because otherwise p
would have to divide the imaginary part of m + i or m − i, which is ±1. Hence, since Z[i] is a unique
factorization domain, p cannot be prime in Z[i].
The norm of p is N (p) = p2 and since p is not prime in Z[i] then p must factor into p = αβ with
√ √
N (α) = N (β) = p. Writing α and β in polar coordinates, α = peθ and β = peφ . Since αβ = p,
we have θ + φ = 2πk so φ = −θ up to a multiple of 2π. Thus, β is the complex conjugate α. 
The above five lemmas culminate in the following proposition.

Proposition 6.4.20
The prime (irreducible) elements in Z[i] are
• associates of integers of the form p + 0i, where p is prime with p ≡ 3 (mod 4);
• associates of 1 + i or 1 − i;

• elements a + bi such that a2 + b2 is a prime number congruent to 1 modulo 4.

Exercises for Section 6.4


1. In the ring Z[i], write a factorization of 11 + 16i.
2. In the ring Z[i], write a factorization of 9 − 19i.
√ √
3. Write a factorization of 12 + 3 2 in the UFD Z[ 2].
√ √
4. Write a factorization of −10 + 13 2 in the UFD Z[ 2].

5. Consider factorizations in Z[ −5]. (See Example 6.4.8.)

(a) Prove that there is no element α ∈ Z[ −5] such that N (α) = 7, N (α) = 11 or N (α) = 13.
√ √
(b) Deduce that 7, 11, 3 + −5, 1 + 2 −5 are irreducible elements.

(c) Prove that 9 + 4 −5 is an irreducible element.
6. Let R and S be two factorization domains and let f : R − {0} → S − {0} be a surjective multiplicative
function (f (ab) = f (a)f (b) for all a, b ∈ R) such that f (1) = 1.
(a) Prove that for all a ∈ R, a is a unit in R if and only if f (a) is a unit in S.
(b) Prove that if f (a) is irreducible in S, then a is irreducible in R.
(c) Find an example in which a is irreducible but f (a) is not.
(d) Explain how a prime factorization of f (a) may help in determining a prime factorization of a.
7. Use the factorizations in (6.9) to show that
max(α1 ,β1 ) max(α2 ,β2 ) max(α` ,β` )
m = r1 r2 · · · r`
is a least common multiple of a and b.
8. Let p be a prime number in Z. Recall that the localization of Z by the prime ideal (p), denoted by
Z(p) , consists of all fractions whose denominator is not divisible by p. Discuss what variations can
exist in the factorizations of elements in Z(p) .
9. Consider the quotient ring R = R[x, y]/(x3 − y 5 ). Prove that x̄ and ȳ are irreducible elements in R.
Deduce that R is not a UFD.
10. Consider the quotient ring R = R[x, y, z]/(x2 − yz). Prove that x̄, ȳ, and z̄ are irreducible elements
in R. Deduce that R is not a UFD.
11. Let F be a field. Show that the subring F [x4 , x2 y, y 2 ] of the polynomial ring F [x, y] is not a UFD.
12. Let R be a UFD and let p(x) ∈ R[x]. Prove that if p(ci ) = 0 for n distinct constants c1 , c2 , . . . , cn ∈ R,
then p(x) = 0 or deg p(x) ≥ n. Conclude that if p(x) is a polynomial that is 0 or of degree less than
n that has n distinct roots, then p(x) is the 0 polynomial. [Hint: Work in the field of fractions of R
and then use Gauss’ Lemma.]
6.5. FACTORIZATION OF POLYNOMIALS 295

13. Let R be a UFD and let D be a multiplicatively closed set that does not contain 0. Prove that the
ring of fractions D−1 R is a UFD.
14. Let R be an integral domain and let r ∈ R be an irreducible element. Prove that ordr (ab) = ordr (a) +
ordr (b) for all a, b ∈ R − {0}.
15. Let R and r be as in Exercise 6.4.14. Let (R − {0})/ ' be the set of equivalence classes of associate
nonzero elements in R.
def
(a) Prove that ordr : (R − {0})/ ' → N given by ordr ([a]) = ordr (a) for all a ∈ R is a well-defined
function.
(b) Prove that ordr : (R − {0})/ ' → N is a monotonic function between the posets ((R − {0})/ ', |)
and (N, ≤).
16. Let R be a UFD and let a ∈ R − {0}. Let Sa be a set of irreducible elements that divide a and such
that no two elements in Sa are associates. Prove that Sa is a finite set and that
Y ord (a)
a' r r .
r∈Sa

17. Let R be an integral domain. Suppose that for all a ∈ R − {0}, any set S of irreducible elements
that divide a in which no two elements are associates is finite. Prove that R is a unique factorization
domain if and only if for all a ∈ R − {0} and all such sets S,
Y ord (a)
a' r r .
r∈S

[Hint: Use Exercise 6.4.16.]

6.5
Factorization of Polynomials
Early on, students of mathematics learn strategies to factor polynomials or to find roots of a poly-
nomial. Many of the theorems introduced in elementary algebra assume the coefficients are in Z,
Q or R. In this section we review many theorems concerning factorization or irreducibility in R[x]
where R is an integral domain.

6.5.1 – Polynomial Rings and Unique Factorization


It does not make sense to discuss factorization of polynomials unless R[x] is a unique factorization
domain. The first proposition begins with this condition.

Lemma 6.5.1
If R[x] is a UFD, then R is a UFD.

Proof. The ring R of coefficients is a subring of R[x]. By degree considerations and Proposition
5.2.6, if p(x) = c is a constant polynomial and if p(x) = a(x)b(x), then both a(x) and b(x) have
degree 0. Thus, if R[x] is a UFD, then every c ∈ R has a unique factorization into irreducible
elements of R[x] but each of these elements has to have degree 0. Thus, R is a UFD. 
Because of this lemma, we henceforth let R be a unique factorization domain.

Proposition 6.5.2 (Gauss’ Lemma)


Let R be a UFD with field of fractions F . If p(x) ∈ F [x] is reducible in F [x], then p(x)
is reducible in R[x]. Furthermore, if p(x) = A(x)B(x) in F [x], then in R[x], we have
p(x) = a(x)b(x), where a(x) = uA(x) and b(x) = vB(x), where u, v ∈ F .
296 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS

Proof. Consider the equation p(x) = A(x)B(x) where A(x) and B(x) are elements in F [x]. Let dA
be a least common multiple of all the denominators appearing in A(x) and similarly for dB . Set
a0 (x) = dA A(x) and b0 (x) = dB B(x), which are polynomials in R[x]. Then dA dB p(x) = a0 (x)b0 (x)
where now a0 (x), b0 (x) ∈ R[x]. Setting d = dA dB , we have dp(x) ∈ R[x].
If d is a unit in R, then we are done by taking a(x) = d−1 a0 (x) and b(x) = b0 (x). So this gives
p(x) = a(x)b(x) with a(x), b(x) ∈ R[x].
If d is not a unit, consider the prime factorization of d ∈ R, namely d = p1 p2 · · · pn . For each
i ∈ {1, 2, . . . , n}, the ideal pi R[x] is a prime ideal in R[x] since pi is irreducible in R and therefore
irreducible in the UFD R[x]. By Proposition 5.6.11, R[x]/(pi R[x]) ∼ = (R/pi R)[x] and is an integral
domain, since pi R[x] is a prime ideal. Considering the expression dp(x) = a0 (x)b0 (x) reduced in the
quotient ring, we have
0̄ = a0 (x) b0 (x).
Thus, one of these polynomials in the quotient ring is 0. Therefore, it is possible to partition
{1, 2, . . . , n} = I into two subsets Ia and Ib such that if i ∈ Ia , then a0 (x) = 0 in (R/pi R)[x] and if
i ∈ Ib , then b0 (x) = 0 in (R/pi R)[x]. Then all the coefficients of a0 (x) are multiples of
Y
pi
i∈Ia

and similarly for b0 (x). Thus, the polynomials


!−1 !−1
Y Y
0
a(x) = pi a (x) = dA pi A(x)
i∈Ia i∈Ia
!−1 !−1
Y Y
0
and b(x) = pi b (x) = dB pi B(x),
i∈Ib i∈Ib

satisfy a(x), b(x) ∈ R[x] and a(x)b(x) = dd A(x)B(x) = A(x)B(x). 

An immediate consequence of Gauss’ Lemma is that if a polynomial with integer coefficients


factors over Q, then it factors over Z as well.

Corollary 6.5.3
Let R be a UFD and let F be its field of fractions. Let p(x) ∈ R[x] be such that its
coefficients have a greatest common divisor of 1. Then p(x) is irreducible in R[x] if and
only if it is irreducible in F [x]. In particular, a monic polynomial is irreducible in R[x] if
and only if it is irreducible in F [x].

Proof. (Left as an exercise for the reader. See Exercise 6.5.10.) 

Example 6.5.4. Consider the polynomial p(x) = 6x2 − x − 1 in Z[x]. If we consider p(x) as an
element of the bigger ring Q[x], it can be factored as
 1
p(x) = x − (6x + 2).
2
This factorization is not in R[x] but it can be changed to p(x) = (2x − 1)(3x + 1) in Z[x]. Note that
in Q[x], the unique factorization of p(x) is
 1  1
p(x) = 6 x − x+
2 3
where 6 is a unit in Q. 4
6.5. FACTORIZATION OF POLYNOMIALS 297

Theorem 6.5.5
R is a UFD if and only if R[x] is a UFD.

Proof. Lemma 6.5.1 already gave one direction of this proof. Gauss’ Lemma allows us to prove the
converse.
Suppose now that R is a UFD and let F be its field of fractions. Recall that F [x] is a Euclidean
domain so it is a UFD. Let p(x) ∈ R[x] and let d be the greatest common divisor of the coefficients
of p(x) so that p(x) = dp2 (x) where the coefficients of p2 (x) have a greatest common divisor of 1.
Since R is a UFD, and d can be factored uniquely into irreducibles in R, it suffices to prove that
p2 (x) can be factored uniquely into irreducibles in R[x].
Since F [x] is a UFD, p2 (x) can be factored uniquely into irreducibles in F [x] and by Gauss’
Lemma, there is a factorization of p2 (x) in R[x]. Since the greatest common divisor of coefficients
of p2 (x) is 1, then the greatest common divisor of each of the factors of p2 (x) in R[x] is 1. By
Corollary 6.5.3, the factors of p2 (x) in R[x] are irreducible. Thus, p(x) can be written as a product
of irreducible elements in R[x].
Since R[x] is a subring of F [x], then the factorization of p2 (x) in R[x] is a factorization into
irreducible elements in F [x], which is unique up to rearrangement and multiplication by units.
There exist fewer units in R than in F so the uniqueness of the factorization also holds in R[x]. 

Theorem 6.5.5 establishes that the algebraic context in which to discuss factorization of poly-
nomials is when the ring of coefficients is itself a UFD. The theorem also has consequences for
multivariable polynomial rings.

Corollary 6.5.6
If R is a UFD, then the polynomial ring R[x1 , x2 , . . . , xn ] with a finite number of variables
is a UFD.

Proof. Theorem 6.5.5 establishes the induction step from R[x1 , x2 , . . . , xn−1 ] to R[x1 , x2 , . . . , xn ] for
all n ≥ 1 and hence the corollary follows by induction on n. 

Example 6.5.7. Note that Z[x], Z[x, y], etc. are UFDs by the above theorems. However, they are
not PIDs and thus give simple examples of rings that are UFDs but not PIDs. 4

6.5.2 – Irreducibility Tests


Unless otherwise stated, in the rest of the section, R is a UFD.
No discussion about factorization of polynomials is complete without some comment about how
to determine if polynomials are irreducible. Determining if a polynomial p(x) ∈ R[x] is irreducible is
a challenging problem. Consequently, this brief section can only offer a few comments for polynomials
of low degree and some strategies for polynomials of degree 4 or higher.
We first deal with irreducible factors of degree 0.

Definition 6.5.8
A polynomial p(x) ∈ R[x] is called primitive if the coefficients of p(x) are relatively prime.

In the language of unique factorization domains, a polynomial is primitive if and only if it does
not have irreducible factors of degree 0. Note that if F is a field, then every polynomial in F [x] is
primitive.
With polynomials in Z[x], the content of a polynomial p(x), denoted by c(p) is defined as the
greatest common divisor of the coefficients of p(x) multiplied by the sign of the leading coefficient.
Similarly, we may refer to the content of a polynomial p(x) ∈ R[x], though this is only well-defined up
to multiplication by a unit. Consequently, a polynomial p(x) ∈ R[x] can be written p(x) = c(p)p0 (x)
where p0 (x) is a primitive polynomial and is called the primitive part of p(x).
298 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS

Proposition 6.5.9
Let p(x) ∈ R[x] and let F be the field of fractions of R. The polynomial p(x) has a
factor of degree 1 if and only if it has a root in F . Furthermore, if the root is rs , then
p(x) = (sx − r)q(x) for some q(x) ∈ R[x].

Proof. If p(x) has a factor of degree 1, say (sx − r), then rs is a root of p(x) when viewed as an
element in F [x].
For the converse, suppose that p(x) has a root α = rs in F . Consider the polynomial division
of p(x) by d(x) = x − α in the Euclidean domain F [x]. Since the remainder must have degree less
than deg d(x) = 1, so there exist unique q(x) ∈ F [x] and r ∈ F such that p(x) = (x − α)q(x) + r.
However, p(α) = 0 so 0 = 0q(α) + r and thus r = 0. Therefore, p(x) = (x − α)q(x) in F [x].
By Gauss’ Lemma, p(x) = (sx − r)q(x) for some polynomial q(x) ∈ R[x]. 

Corollary 6.5.10
Let p(x) be a nonzero polynomial in R[x]. The number of distinct roots in the field of
fractions is less than or equal to the degree of p(x).

Proof. Let F be the field of fractions of R. If α1 , α2 , . . . , αm are distinct roots of p(x) in F with
αi = ri /si , then
p(x) = (s1 x − r1 )(s2 x − r2 ) · · · (sm x − rm )q(x)
for some q(x) ∈ R[x]. Hence, deg p(x) ≥ m. 

The following proposition is sometimes presented in elementary algebra courses in the context of
polynomials with integer coefficients. The proposition provides a short list of all the possible linear
irreducible factors of a polynomial.

Proposition 6.5.11 (Rational Root Theorem)


Let F be the field of fractions of R. Suppose that p(x) ∈ R[x] is written

p(x) = an xn + · · · + a1 x + a0 .

Then the only roots of p(x) in F [x] are u de where u is a unit, e divides a0 and d divides an .

Proof. Let rs be a roots of p(x) in F (with p(x) viewed as an element of F [x]) and suppose that r
and s have no common divisor. Then
rn rn−1 r
an n
+ a n−1 n−1
+ · · · + a1 + a0 = 0
s s s
which, after multiplying by sn , gives

an rn + an−1 srn−1 + · · · + a1 sn−1 s + a0 sn = 0. (6.10)

Then
s(an−1 rn−1 + · · · + a1 sn−2 s + a0 sn−1 ) = −an rn .
By unique factorization in R, all the prime factors of s divide an rn but since s and r are relatively
prime, all the prime factors of s must be associates to the prime factors of an . Thus, s|an . With an
identical argument applied to (6.10), we can show that r|a0 . The result follows. 

Proposition 6.5.11 also makes it simple to determine if a given quadratic or cubic polynomial is
irreducible.
6.5. FACTORIZATION OF POLYNOMIALS 299

Proposition 6.5.12
A primitive polynomial p(x) ∈ R[x] of degree 2 or 3 is reducible in R[x] if and only if it has
a root in the field of fractions F .

Proof. Suppose a primitive polynomial p(x) of degree 2 or 3 is reducible. Then p(x) = a(x)b(x) with
a(x), b(x) ∈ R[x], not units. Since deg a(x), deg b(x) ≥ 1 and since deg a(x)+deg b(x) = deg p(x) ≤ 3,
then deg a(x) = 1 or deg b(x) = 1. By Proposition 6.5.9, p(x) has a root in F .
Conversely, suppose that p(x) has a root rs ∈ F . Then by Proposition 6.5.9, p(x) = (sx − r)q(x).
Since p(x) is of degree 2 or 3, then deg q(x) is equal to 1 or 2, and hence q(x) is not a unit. Thus,
p(x) is reducible. 

Example 6.5.13. We show that p(x) = 2x3 − 7x + 3 is irreducible in Z[x]. By Gauss’ Lemma, p(x)
can factor over Q if and only if it can factor over Z. We just need to check if it has roots in Q to
determine whether it factors over Q. The only possible roots according to Proposition 6.5.11 are
1 3
±1, ±3, ± , ± .
2 2
It is easy to verify that none of these eight fractions are roots of p(x). Then by Proposition 6.5.12,
p(x) is irreducible in Z[x]. 4

Example 6.5.14. Consider the polynomial x3 + 2x2 + 2x + 3 in F5 [x]. As mentioned earlier, a


field is trivially a UFD. Furthermore, the field of fractions of a field F is itself. Consequently,
Proposition 6.5.12 still applies. In this situation, we could still refer to Proposition 6.5.11 to look for
roots of the polynomial but, since every nonzero elements in a field is a unit, applying the proposition
implies that we must simply check all elements of F5 to see if they are roots. Calculating directly,

03 + 2 · 02 + 2 · 0 + 3 = 3 6= 0,
13 + 2 · 12 + 2 · 1 + 3 = 3 6= 0,
23 + 2 · 22 + 2 · 2 + 3 = 3 6= 0,
33 + 2 · 32 + 2 · 3 + 3 = 4 6= 0,
43 + 2 · 42 + 2 · 4 + 3 = 2 6= 0.

We observe that no element of the field is a root of the polynomial. So by Proposition 6.5.12, since
the polynomial is a cubic and has no roots, then the polynomial is irreducible. 4

Up to now, the propositions in this subsection have shown how to quickly determine if a polyno-
mial of degree 3 or less is irreducible. For polynomials of degree 4 or more in R[x], Proposition 6.5.11
helps quickly determine if a polynomial has a factor of degree 1. However, it requires more work
to determine if a polynomial has an irreducible factor of degree 2 or more. The following examples
illustrate what can be done for polynomials with coefficients in Z or in the finite field Fp .
Example 6.5.15. We propose to show that p(x) = x4 + x + 2 is irreducible as a polynomial in
F3 [x]. By checking the three field elements 0, 1, 2 ∈ F3 , it easy to see that none of them are roots.
Hence, p(x) has no linear factors. Assume that p(x) is reducible. Then by degree considerations,
p(x) is the product of two quadratic polynomials. A priori, by considering the leading coefficient of
p(x), there appear to be two cases:

p(x) = (x2 + ax + b)(x2 + cx + d) and p(x) = (2x2 + ax + b)(2x2 + cx + d).

However, by factoring out a 2 from each of the terms of the second case, we see that it is equivalent
to the first case. Hence, we only need to consider the first situation. Expanding the product for
p(x) gives

p(x) = (x2 + ax + b)(x2 + cx + d) = x4 + (a + c)x3 + (d + ac + b)x2 + (ad + bc)x + bd,


300 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS

so the coefficients must satisfy



a + c = 0


b + d + ac = 0
(6.11)
ad + bc = 1


bd = 2.

The last of the four conditions gives (b, d) = (1, 2) or (2, 1). Applied to the second equation, we see
that ac = 0, so a = 0 or c = 0. This last result applied to the first equation, shows that a = c = 0.
But then in the third equation, we deduce that 0 + 0 = 1, which is a contradiction. We conclude
that p(x) is irreducible. 4

Example 6.5.16. We repeat the above example, but with q(x) = x4 + x + 2 ∈ Z[x]. By Proposi-
tion 6.5.11, if q(x) has a linear factor, then it has one of the following roots: ±1 or ±2. A quick
calculation shows that none of these four numbers is a root of q(x), so q(x) has no factors of degree
1. Assume that q(x) is reducible. Then

q(x) = (ax2 + bx + c)(dx2 + ex + f ).

From ad = 1, we deduce that (a, d) = (1, 1) or (−1, −1). As in the previous example, the case
(a, d) = (−1, −1) can be made equivalent to the case (a, d) = (1, 1), by multiplying both quadratic
factors by −1. Consequently, we again have equations similar to (6.11) but in Z:
 

 b+e=0 
 e = −b

c + f + be = 0  c + f − b2 = 0

=⇒


 bf + ce = 1 

 b(f − c) = 1
cf = 2, cf = 2.
 

From the third equation, f − c = 1 or −1. Furthermore, b = ±1 and hence the second equation gives
c + f = 1. Since we are in Z, the fourth equation implies that (c, f ) can be one of the four pairs
(1, 2), (2, 1), (−1, −2), or (−2, −1), but none of these four options satisfies c + f = 1. This implies
a contradiction and hence we deduce that q(x) is irreducible in Z[x]. 4

The above two examples illustrate a similar strategy to checking if a quartic polynomial is
irreducible but applied to different coefficient rings. However, we could have immediately deduced
the result of Example 6.5.16 from Example 6.5.15 without as much work. The following proposition
generalizes this observation.

Proposition 6.5.17
Let I be a proper ideal in the integral domain R and let p(x) be a nonconstant monic
polynomial in R[x]. If the image of p(x) in (R/I)[x] under the reduction homomorphism is
an irreducible element, then p(x) is irreducible in R[x].

Proof. Suppose that p(x) = a(x)b(x), where a(x) and b(x) are not units in R[x]. Then the degrees
of a(x) and b(x) must be positive since if either a(x) or b(x) were constant, then since LC(p(x)) =
1 = LC(a(x))LC(b(x)), the constant polynomial would need to be a unit in R.
Now let ϕ : R[x] → (R/I)[x] be the reduction homomorphism. (Recall that ϕ maps the coeffi-
cients of p(x) to its images in R/I.) Since ϕ is a homomorphism, then ϕ(p(x)) = ϕ(a(x))ϕ(b(x)).
We already established that the leading coefficients of a(x) and b(x) must be units, since the leading
coefficient of p(x) is 1. However, since I is a proper ideal of R, it contains no units. We deduce that
deg ϕ(a(x)) = deg a(x) ≥ 1 and deg ϕ(b(x)) = deg b(x) ≥ 1. Hence, neither ϕ(a(x)) nor ϕ(b(x)) is a
unit in (R/I)[x] and thus ϕ(p(x)) is reducible.
We have proven that if p(x) is reducible, then ϕ(p(x)) is reducible. The proposition is precisely
the contrapositive of this statement. 
6.5. FACTORIZATION OF POLYNOMIALS 301

The following example gives another application of Proposition 6.5.17 even as it illustrates some
more reasoning with factorization of polynomials.
Example 6.5.18. We show that q(x) = x4 + 3x3 + 22x2 − 8x + 3 is irreducible in Z. We consider
the polynomial modulo 2: q(x) = x4 + x3 + 1 ∈ F2 [x]. It is obvious that neither 0 nor 1 in F2 are
roots of q(x). Hence, if q(x) is reducible in F2 [x], then it must be a product of two quadratics since
it does not have a factor of degree 1. However, we point out that F2 [x] only has one irreducible
polynomial of degree 2, namely x2 + x + 1. (There are only 3 other quadratic polynomials in F2 [x],
namely x2 , x2 + 1, and x2 + x, each of which is reducible.) The only quartic polynomial that is the
product of two irreducible quadratic polynomials is
(x2 + x + 1)(x2 + x + 1) = x4 + x2 + 1.
Hence, x4 + x3 + 1 is irreducible. We point out that our reasoning also establishes that x4 + x + 1
is irreducible in F2 [x]. 4
For nearly all integral domains R, determining whether a polynomial in R[x] is irreducible is
a generally a difficult problem. Proposition 6.5.17 offers a sufficient condition for deciding if a
polynomial is irreducible. Indeed, many theorems provide sufficient but not necessary conditions
that are relatively easy to verify. We conclude the section with one more sufficient condition for
irreducibility.

Theorem 6.5.19 (Eisenstein’s Criterion)


Let P be a prime ideal in an integral domain R and let

f (x) = xn + an−1 xn−1 + · · · a1 x + a0

/ P 2 . Then f (x) is
be a polynomial in R[x]. Suppose that ai ∈ P for all i < n and a0 ∈
irreducible in R[x].

/ P 2 , then a0 6= 0 and hence


Proof. Suppose that f (x) can be written as f (x) = a(x)b(x). Since a0 ∈
both of a(x) and b(x) have nonzero constant terms.
The reduction homomorphism, R[x] → R[x]/P [x] = (R/P )[x] gives
xn = a(x)b(x)
in (R/P )[x]. We prove by contradiction the a(x) and b(x) must each have a single term. Assume
that a(x) or b(x) has more than one term. Since R/P is an integral domain, the product of leading
terms and the product of the terms of minimal degree in a(x) and b(x) give two (nonzero) terms of
different degrees in the product a(x)b(x). This contradicts the fact that a(x)b(x) is a polynomial of
a single term. Consequently, both a(x) and b(x) are polynomials of a single term. We conclude that
all the nonleading terms of a(x) and b(x) are in P .
Since f (x) is monic, the leading coefficients of a(x) and b(x) are units in R. Assume that a(x)
and b(x) nonconstant polynomials. Then both have a unit as a leading coefficient and a constant
term in P . But then, the constant term of a(x)b(x) is in P 2 which contradicts the hypothesis of the
theorem. Hence, a(x) or b(x) is a constant polynomial. Since the leading term of both of them is a
unit, then a(x) or b(x) is a unit in R[x]. Thus, f (x) is irreducible. 
Example 6.5.20. As an easy application of Eisenstein’s Theorem, note that f (x) = x7 + 2x6 +
12x4 + 10 ∈ Z[x] satisfies the condition of the theorem with the ideal P = (2). Hence, f (x) is
irreducible. Obviously, it would have been difficult to eliminate all possible factorizations as we did
in Examples 6.5.15 and 6.5.16. 4
Example 6.5.21. Consider the polynomial f (x) = x5 + (−1 + 7i)x3 + (4 + 7i)x + 5 in Z[i][x]. Since
−1 + 7i = (2 + i)(1 + 3i),
4 + 7i = (2 + i)(3 + 2i),
5 = (2 + i)(2 − i),
302 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS

and 2 − i is not an associate of 2 + i. Hence, all the nonleading coefficients of f (x) are in the ideal
/ ((2 + i)2 ). Hence, f (x) is irreducible in Z[i][x].
(2 + i) and 5 ∈ 4

6.5.3 – Summary of Results


We conclude this section with a brief summary of results about polynomials rings over a ring. The
following lists relevant connection between R and R[x] for increasingly strict condition on R[x].

• R[x] is an integral domain if and only if R is an integral domain. (Proposition 5.2.6)

• R[x] is a UFD if and only if R is a UFD. (Theorem 6.5.5)

• R[x] is a PID only if R is a field. (Proposition 6.4.11)

• If F is a field, then F [x] is not only a PID but a Euclidean domain. (Corollary 6.3.6)

6.5.4 – Useful CAS Commands


Many CAS have many commands for working with polynomials. For example, there are a variety of
commands for sorting, expanding products of polynomials, and collecting like terms with a stated
variable as a reference. However, we list here below a few commands that utilize powerful techniques
to perform the stated function. These commands are in the basic implementation.

Maple Function
irreduc(a); Tests whether a single variable or multivariate polynomial over an algebraic
number field is irreducible.
factor(a); Computes the factorization of a single variable or multivariate polynomial with
integer, rational, numeric, or algebraic number coefficients.

Exercises for Section 6.5


1. Prove that the following polynomials in Z[x] are irreducible.
(a) 9x2 − 11x + 1
(b) 2x2 + 5x + 7
(c) x3 + 4x + 3
2. Prove that the following polynomials in Z[x] are irreducible.
(a) x3 + 2x2 + 3x + 4
(b) x4 + x3 + 1
(c) x4 + 3x2 − 6
3. For each of the following polynomials in F5 [x], decide if it is irreducible and if it is reducible give a
complete factorization.
(a) x2 + 3x + 4
(b) x3 + x2 + 2
(c) x4 + 3x3 + x2 + 3
4. For each of the following polynomials in F7 [x], decide if it is irreducible and if it is reducible give a
complete factorization.
(a) x2 + 3x + 4
(b) x3 + x2 + 2
(c) x4 + 3x3 + x2 + 3
5. Let p be a prime number. Prove that in Z[x], the polynomial x3 + nx + p is irreducible for all but at
most four values of n.
6. Prove that in Z[x], the polynomial x3 + px + q, where p and q are odd primes, is irreducible.
7. Find all quadratic, cubic, and quartic irreducible polynomials in F2 [x].
6.5. FACTORIZATION OF POLYNOMIALS 303

8. Let F be a finite field with q elements. By determining all the monic reducible polynomials of degree
2, prove that there are 21 (q 2 − q) monic irreducible quadratic polynomials in F [x].
9. List all irreducible monic quadratic polynomials in F5 [x].
10. Prove Corollary 6.5.3.
11. Let F be a field and let a be a nonzero element in F . Prove that if f (ax) is irreducible, then f (x) is
irreducible.
12. Prove a modification of Proposition 6.5.17 in which I is a prime ideal P in R and the polynomial p(x)
satisfies LC(p(x)) ∈
/ P.
13. Prove that f (x) = x3 + (2 + i)x + (1 + i) is irreducible in Z[i][x].
14. Let R be a UFD and let a ∈ R. Then p(x) ∈ R[x] is irreducible if and only if p(x + a) is irreducible.
15. Let p be a prime number in Z. Use Exercise 6.5.14 to prove that the polynomial xp−1 +xp−2 +· · ·+x+1
is irreducible in Z[x].
16. Consider the polynomial p(x) = x3 + 3x2 + 5x + 5 in Z[x]. Find a shift of the variable x so that you
can then use Eisenstein’s Criterion to show that p(x) is irreducible.
17. Let c1 , c2 , . . . , cn ∈ Z be distinct integers. Consider the polynomial

p(x) = (x − c1 )(x − c2 ) · · · (x − cn ) − 1

in Z[x].
(a) Prove that if p(x) = a(x)b(x), then a(x) + b(x) evaluates to 0 at ci for i = 1, 2, . . . , n.
(b) Deduce that if a(x) and b(x) are nonconstant, then a(x) + b(x) is the 0 polynomial in Z[x].
(c) Deduce that p(x) is irreducible.
[Hint: Exercise 6.4.12.]
18. Prove the following generalization of Eisenstein’s Criterion. Let P be a prime ideal in an integral
domain R and let
f (x) = an xn + an−1 xn−1 + · · · a1 x + a0
be a polynomial in R[x]. Suppose that: (1) an ∈ / P 2 . Then
/ P ; (2) ai ∈ P for all i < n; and (3) a0 ∈
f (x) is not the product of two nonconstant polynomials.
19. Let R be a UFD and let F be its field of fractions. Suppose that p(x), q(x) ∈ F [x] and that p(x)q(x)
is in the subring R[x]. Prove that the product of any coefficient of p(x) with any coefficient of q(x) is
an element of R.
20. Let F be a finite field of order |F | = q and let p(x) ∈ F [x] with deg p(x) = n. Prove that F [x]/(p(x))
has q n elements. [Hint: Exercise 5.6.9.]
21. Prove that for all primes p, there is a field with p2 elements.
22. Let p(x) = x4 + Ax3 + Bx2 + Cx + D be a monic quartic (degree 4) polynomial in Z[x].
(a) Suppose that p(x) factors into two quadratics p(x) = (x2 + ax + b)(x2 + cx + d). Prove that if
a = 0 or c = 0, then ABC = A2 D + C 2 .
(b) Prove that if p(x) factors into two quadratics and if ABC 6= A2 D + C 2 , then there exists an
integer a 6= 0 such that C 2 − 4aD(A − a) is a square integer.
(c) Deduce that if D < 0, there are only a finite number of possibilities for a.
(d) Use the previous two parts to prove that p(x) = x4 + 3x3 − 2x − 7 is irreducible in Z[x].
23. Let F be a field. Consider the derivative function D : F [x] → F [x] defined by D(a0 ) = 0 and

D(an xn + · · · + a1 x + a0 ) = (n · an )xn−1 + ((n − 1) · an−1 )xn−2 + · · · + (2 · a2 )x + a1 .

(a) From this definition, prove that D satisfies the differentiation rules

D(p(x) + q(x)) = D(p(x)) + D(q(x)) and D(p(x)q(x)) = D(p(x))q(x) + p(x)D(q(x))

for all p(x), q(x) ∈ F [x].


(b) Prove that if d(x) is an irreducible polynomial that is a common divisor to p(x) and D(p(x)),
then d(x)2 divides p(x).
304 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS

24. Let R be a UFD and let P (x) ∈ R[x]. Let n ∈ N∗ and define for this exercise P n (x) to be the
polynomial P (x) iterated n times, i.e.,
n times
z }| {
n
P (x) = P (P (· · · P (x) · · · )).

Prove that if d | n in N∗ , then P d (x) − x divides P n (x) − x.

6.6
RSA Cryptography
In Section 3.10, we presented the idea of public key cryptography: a protocol for two parties who
communicate entirely publicly to select a key that will nonetheless stay secret (not easily obtainable).
The Diffie-Hellman protocol relied on Fast Exponentiation to quickly calculate powers of group
elements, whereas determining a from g and g a is relatively slow. The RSA public key protocol,
named after Ron Rivest, Adi Shamir, and Leonard Adleman, is a protocol between two parties A
and B in which party B allows party A to generate a key that will allow party A to send a secret
message to party B. Rivest, Shamir, and Adelman first introduced the protocol in the context of the
integers but it can be generalized to other rings. We will also first describe the protocol over Z and
then generalize.

6.6.1 – RSA over Z


As we did with Diffie-Hellman, we introduce three parties—Alice, Bob, and Eve—to describe the
protocol. Alice wants to communicate secretly with Bob while Eve is attempting to intercept the
communication. For the moment, Alice aims for the more humble goal of simply getting some
information, possibly small, to Bob secretly. We indicate this message by m.
In the following diagram, that which is boxed stays secret to the individual and that which is
not boxed is heard by everyone, including Eve.

(1) I want to talk secretly

Send n = pq
(2) Choose primes p 6= q

Send e (3) Choose e ∈ N with


gcd(e, (p − 1)(q − 1)) = 1

(4) Alice selects (5) Send c = me


m ∈ Z/nZ
(6) Calculate d = e−1 in
Z/(p − 1)(q − 1)Z

(7) Decrypt cd = med =


m in Z/nZ
6.6. RSA CRYPTOGRAPHY 305

(1) The protocol starts with Alice initiating and telling Bob that she wants to talk secretly.

(2) Bob secretly chooses two prime numbers p and q but sends the product n = pq to Alice.

(3) Bob also chooses an integer e that is relatively prime to (p − 1)(q − 1) and sends this to Alice.
Together, the pair (n, e) form the public key of the protocol.

(4) The message Alice will send to Bob consists of an element m ∈ Z/nZ.

(5) Alice sends to Bob the ciphertext of c = me ∈ Z/nZ, where the power is performed with fast
exponentiation.

(6) Bob calculates the inverse d of e modulo Z/(p − 1)(q − 1)Z so that ed ≡ 1 (mod (p − 1)(q − 1)).
Since we are always concerned with implementing the calculations quickly, Bob can use the
Extended Euclidean Algorithm to determine if e is invertible in Z/(p − 1)(q − 1)Z and to
calculate the inverse d. (See Example 2.2.11.)

(7) Then Bob calculates (using fast exponentiation) cd = med in Z/nZ. By the Chinese Remainder
Theorem, there is an isomorphism

ϕ : Z/pqZ → (Z/pZ) ⊕ (Z/qZ).

Hence, by Proposition 5.1.14, U (pq) ∼ = U (p) ⊕ U (q) and |U (pq)| = (p − 1)(q − 1). If m and
(p − 1)(q − 1)) are relatively prime, then m ∈ U (Z/nZ) = U (n) so

med = m1+k|U (pq)| for some k ∈ Z


= m,

by Lagrange’s Theorem. If gcd(m, (p − 1)(q − 1)) 6= 1, we still have the following cases. If
m = 0 in Z/nZ, then c = me = 0 and cd = med = 0, so again cd = m. If p | m but m 6= 0 in
Z/nZ, then under the isomorphism ϕ(m) = (0, h) for some h ∈ Z/qZ. Then
 
(0, h)1+k(p−1)(q−1) = 0, h1 (hq−1 )k(p−1) = (0, h 1k(p−1) ) = (0, h).

Hence, again med = m in Z/nZ. Similarly, if q | m, but m 6= 0 in Z/nZ, then med = m in


Z/nZ. This allows Bob to recover m.

Because of their roles, the integer e is called the encryption key and d is called the decryption
key.
From Eve’s perspective, she knows everything that Bob does except what p and q are separately.
If p and q are very large prime numbers, then it is very quick for Bob to calculate the product n = pq
but it is very slow for Eve to find the prime factorization, namely p × q of n. Furthermore, she would
need to know p and q separately in order to calculate (p − 1)(q − 1), which she needs to determine d.
As opposed to Diffie-Hellman, in which both Alice and Bob know the secret key g ab , only Bob
knows the secret keys p and q. Diffie-Hellman is symmetric in the sense that Alice and Bob could
both use g ab for communication. In RSA, Alice is in the same situation as Eve in that she cannot
easily determine d. Hence, the RSA protocol sets up a one-way secret communication: Only Alice
can send a secret message to Bob.
Again, as mentioned in the Diffie-Hellman protocol, it may seem unsatisfactory that it is theo-
retically possible for Eve to find d from the information passed in the clear. However, if the primes
p and q are large enough, it may take over 100 years with current technology to find the prime
factorization of n and hence determine p and q separately. Very few secrets need to remain secret
for a century so the protocol is secure in this sense.
Even if n is large, it is not likely that a long communication can be encoded into a single element
m ∈ Z/nZ. Alice and Bob can use two strategies to allow for Alice to send a long communication
to Bob.
306 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS

One strategy involves deciding upon an injective function H from the message space M into the
set of finite sequences of elements in Z/nZ. Then the message, regardless of alphabet, can be encoded
as a string of elements (m1 , m2 , . . . , m` ) in Z/nZ and Alice sends the sequence (me1 , me2 , . . . , me` ) to
Bob. Bob then decodes each element in the string as described above to recover (m1 , m2 , . . . , m` ).
Then, since H is an injective function, Bob can find the unique preimage of (m1 , m2 , . . . , m` ) under
H and thereby recover the message in the message space M.
A second strategy uses the RSA protocol only to exchange a key for some subsequent encryption
algorithm. In other words, Alice and Bob agree (publicly) on some other encryption algorithm that
requires a secret key and they use RSA to decide what key to use for that algorithm. Essentially,
Alice is saying to Bob: “Let’s use m as the secret key.”
As one last comment about speed of the protocol, prime factorization is always slow so whenever
Bob needs to calculate the greatest common divisor of two integers, he uses the Euclidean Algorithm
and when he calculates the inverse of e in Z/(p − 1)(q − 1)Z he uses the Extended Euclidean
Algorithm. The astute reader might wonder why we might not use Lagrange’s Theorem to calculate
d. Since gcd(e, (p − 1)(q − 1)) = 1, then e ∈ U ((p − 1)(q − 1)) so d = e|U ((p−1)(q−1))|−1 . However,
|U ((p − 1)(q − 1))| = φ((p − 1)(q − 1)), where φ is Euler’s totient function. Calculating the totient
function of an integer requires one to perform the prime factorization of that integer.

Example 6.6.1. For a first example, we use small prime numbers and illustrate the full process.
In practice, of course, one uses computer programs to execute these calculations.
Suppose that Alice wants to communicate secretly with Bob and they agree to use RSA. Bob
selects p = 1759 and q = 2347 and sends n = 4128373 to Alice. He also sends the encryption key
e = 72569. Alice and Bob agree to encode strings of characters in the following way. Each character
will correspond to a digit by

{ , A, B, . . . , Z, ’.’,’,’} −→ {0, 1, 2, . . . , 26, 27, 28}.

Observe that the first character of the alphabet here is the space character. Note, they only use
an alphabet of 29 characters. Since 294 = 707281 < n, then each quadruple of four characters
(c1 , c2 , c3 , c4 ) is converted to the integer

c4 × 293 + c3 × 292 + c2 × 29 + c1

and then viewed as an element of Z/4128373Z. If necessary, a message can be padded at the end
with spaces so that the message has 4k characters in it and then corresponds to a sequence of length
k of elements in Z/4128373Z.
Alice wants to tell Bob “Hello there.” Her string of character numbers is

(8, 5, 12, 12, 15, 0, 20, 8, 5, 18, 5, 27)

and her message as a string of elements in Z/4128373Z is (m1 , m2 , m3 ), where

m1 = 12 × 293 + 12 × 292 + 5 × 29 + 8 = 302913,


m2 = 8 × 293 + 20 × 292 + 0 × 29 + 15 = 211947,
m3 = 27 × 293 + 2 × 292 + 18 × 29 + 5 = 660712.

After using fast modular exponentiation, Alice sends to Bob

(c1 , c2 , c3 ) = (me1 , me2 , me3 ) = (1318767, 3245763, 2570792).

For completeness, we show all the steps in performing the Extended Euclidean Algorithm to
find d. The first sequence of integer divisions performs the Euclidean Algorithm until we obtain a
6.6. RSA CRYPTOGRAPHY 307

remainder of 1:
4124268 = 72569 × 56 + 60404
72569 = 60404 × 1 + 12165
60404 = 12165 × 4 + 11744
12165 = 11744 × 1 + 421
11744 = 421 × 27 + 377
421 = 377 × 1 + 44
377 = 44 × 8 + 25
44 = 25 × 1 + 19
25 = 19 × 1 + 6
19 = 6 × 3 + 1.
If Bob picked e at random, then it is possible that gcd(e, (p − 1)(q − 1)) > 1. In this case, Bob would
identify this error during the Euclidean Algorithm and he would simply pick another e. When we
get a remainder of 1, we are done since there is no smaller positive integer. Going backwards, we
get the following linear combinations of the intermediate remainders:
1 = 19 − 6 × 3
1 = 19 − (25 − 19 × 1) × 3 = 19 × 4 − 25 × 3
1 = (44 − 25 × 1) × 4 − 25 × 3 = 44 × 4 − 25 × 7
1 = 44 × 4 − (377 − 44 × 8) × 7 = 44 × 60 − 377 × 7
1 = (421 − 377) × 60 − 377 × 7 = 421 × 60 − 377 × 67
1 = 421 × 60 − (11744 − 421 × 27) × 67 = 421 × 1869 − 11744 × 67
1 = (12165 − 11744 × 1) × 1869 − 11744 × 67 = 12165 × 1869 − 11744 × 1936
1 = 12165 × 1869 − (60404 − 12165 × 4) × 1936 = 12165 × 9613 − 60404 × 1936
1 = (72569 − 60404 × 1) × 9613 − 60404 × 1936 = 72569 × 9613 − 60404 × 11549
1 = 72569 × 9613 − (4124268 − 72569 × 56) × 11549 = 72549 × 656357 − 4124268 × 11549.
Considering the last expression modulo 4154268, we get 1 ≡ 72549 × 656357 (mod 4124268) so
656357 is the inverse of 72549 modulo 4124268. Thus, Bob calculated that d = 656357.
In order to decrypt Alice’s message, using fast modular exponentiation, Bob calculates modulo
Z/4128373Z that
(cd1 , cd2 , cd3 ) = (302913, 211947, 660712).
Knowing the process by which Alice compressed her message into a string of elements in Z/4128373Z,
Bob can recover Alice’s message of “HELLO THERE.” 4

6.6.2 – RSA in Other Rings


We would like to generalize the RSA protocol to other rings. To do so, we must study the protocol
to determine what procedures are involved and what is the proper algebraic context in which we
can perform such procedures.
The concept of irreducible elements does not have an equivalent in group theory so we work
first of all in the context of a commutative ring R. Also, the concept of irreducibility requires the
existence of an identity. It might be possible to use a ring with zero divisors but the protocol assumes
that for some m there is a power k > 1 high enough that mk = m. Consequently, nilpotent elements
would cause problems for this property and we are inclined to require that R be an integral domain.
In the algorithm, in order for mk = m for high enough k in R/I, we required that R/(p) be a
finite ring for the prime p and similarly for q. In attempting to generalize RSA, we could either
base the definition on irreducible elements or maximal ideals. In this textbook, we use the following
definition.
308 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS

Definition 6.6.2
An integral domain that is not a field is called an RSA domain if for every maximal ideal
M ⊆ R, the quotient ring R/M is a finite field.

The ring of integers Z is an RSA domain but R[x] is not. Indeed, for the irreducible element
x2 + 1, R/(x2 + 1) ∼ = C, which is not finite. On the other hand, as Exercise 5.6.10 hints, for
any prime element π in Z[i], the elements in Z[i]/(π) are the point in C with integral components
that are contained in a square with edge along the segment from 0 to π. Hence, Z[i]/(π) is finite.
Furthermore, for every finite field F , any maximal ideal M in F [x] is principal so, by Exercise 6.6.5,
F [x]/M is finite.
With the notion of an RSA domain, we adapt the RSA algorithm as follows.

(1) I want to talk secretly


Let’s use RSA domain R
Send ideal I = M1 M2 (2) Choose maximal
ideals M1 6= M2

(3) Choose e ∈ N
Send e
with gcd(e, (|R/M1 | −
1)(|R/M2 | − 1)) = 1
(4) Alice selects (5) Send c = me
m ∈ R/I
(6) Calculate d = e−1 in
Z/(|R/M1 |−1)(|R/M2 |−1)Z
(7) Decrypt cd = med = m
in R/I

(1) The protocol starts with Alice initiating and telling Bob that she wants to talk secretly.
(2) Bob secretly chooses two maximal ideals M1 and M2 but sends the product ideal I = M1 M2 to
Alice. If R is a PID (as all of the above examples are), then Bob can work with the generating
elements of any of the ideals.
(3) Bob also chooses an integer e that is relatively prime to (|R/M1 | − 1)(|R/M2 | − 1) and sends
this to Alice. Together, the pair (I, e) form the public key of the protocol.
(4) The message Alice will send to Bob consists of an element m ∈ R/I.
(5) Alice sends to Bob the ciphertext of c = me ∈ R/I, where the power is performed with fast
exponentiation.
(6) Bob calculates the inverse d of e modulo Z/(|R/M1 | − 1)(|R/M2 | − 1)Z so that ed ≡ 1
(mod (|R/M1 | − 1)(|R/M2 | − 1)). Since we are always concerned with implementing the cal-
culations quickly, Bob can use the Extended Euclidean Algorithm.
(7) Then Bob calculates (using fast exponentiation) cd = med in R/I. Since M1 and M2 are
distinct maximal ideals, then M1 + M2 = R and hence they are comaximal. By the Chinese
Remainder Theorem, there is an isomorphism
ϕ : R/I → (R/M1 ) ⊕ (R/M2 ).
6.6. RSA CRYPTOGRAPHY 309

Hence, by Proposition 5.1.14, U (R/I) ∼= U (R/M1 ) ⊕ U (R/M2 ) and since R/M1 and R/M2 are
fields, we have |U (R/I)| = (|R/M1 | − 1)(|R/M2 | − 1). If m = 0 ∈ R/I, then c = me = 0 and
cd = 0 = m. If m ∈ U (R/I), then

cd = med = m1+k|U (R/I)| = m for some k ∈ Z.

By Lagrange’s Theorem, we deduce that med = m. If ϕ(m) = (0, h) with h ∈ U (R/M2 ), then
 
ϕ(m)ed = (0, h)1+k(|R/M1 |−1)(|R/M2 |−1) = 0, h(h|R/M2 |−1 )k(R/M1 |−1)
= (0, h 1k(|R/M1 |−1) ) = (0, h) = ϕ(m).

Hence, again med = m in R/I and similarly, if ϕ(m) = (h, 0) with h ∈ U (R/M1 ). This allows
Bob to recover m.

In practice, it is convenient to work in an algebraic context in which there is a simple and perhaps
unique way to determine a representative of cosets in R/I. In particular, when calculating the powers
me or cd in R/I using fast exponentiation, it is convenient for memory storage to constantly reduce
the power to a smallest equivalent expression in R/I. Euclidean domains offer such a context. Since
Euclidean domains are PIDs, every ideal I is equal to (a) for some element a. Then, while performing
the fast exponentiation algorithm, each time we take a power of m, we replace it with the remainder
when we divide by a from the Euclidean division.
Some RSA domains that are also Euclidean domains include Z[i] and F [x], where F is a finite
field.
As an example, we adapt Example 6.6.1 to a scenario in Z[i] with different prime numbers. To
simplify the example, we leave off the issue of encoding characters into the quotient ring R/I.

Example 6.6.3. Alice indicates that she wants to communicate secretly with Bob and they agree
to use RSA over Z[i]. Bob selects p = 15 + 22i and q = 7 + 20i. According to Example 6.4.4, since
N (p) = 709 and N (q) = 449 are prime, p and q are primes in the Gaussian integers. Bob sends
n = pq = −335 + 454i to Alice as well as the encryption key e = 2221 = (100010101101)2 , where
the latter expression is in binary for use with the fast exponentiation algorithm (Section 3.10.2).
Alice wants Bob to receive the number m = 67+232i. Running the fast exponentiation algorithm,
Alice sets the power variable π of m ∈ Z[i]/(n) initially as π = 1. Note that 11 is the highest nonzero
power of 2 in the binary expansion of e = 2221. Also, in the following calculations, though we do
not use the bar a notation, numbers are understood as elements in Z[i]/(n).

• b11 = 1 so π := π 2 m = 67 + 232i.

• b10 = 0 so π := π 2 = (67 + 232i)2 = −49335 + 31088i = 77 + 234i, where the last equality
holds after performing the Euclidean division of π 2 by n in Z[i].

• b9 = 0 so π := π 2 = −48827 + 36036i = 206 − 6i.

• b8 = 0 so π := π 2 = 42400 − 2472i = −12 − 110i.

• b7 = 1 so π := π 2 m = −1413532 − 2596912i = 154 + 67i.

• b6 = 0 so π := π 2 = 19227 + 20636i = −4 + 135i.

• b5 = 1 so π := π 2 m = −969443 − 4296848i = −207 + 24i.

• b4 = 0 so π := π 2 = 42273 − 9936i = −192 − 100i.

• b3 = 1 so π := π 2 m = 10708688 + 3659648i = 96 + 143i.

• b2 = 1 so π := π 2 m = −7122403 − 766504i = −77 − 72i.

• b1 = 0 so π := π 2 = 745 + 11088i = −132 − 77i.


310 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS

• b0 = 1 so π := π 2 m = −3945931 + 4028816i = 51 + 104i.

This is the ciphertext c = me = 51 + 104i in Z[i]/(−335 + 454i) that Alice sends in the clear to Bob.
On Bob’s side, in order to calculate the decryption key d, he first needs to determine |Z[i]/(p)|
and |Z[i]/(q)|. By Exercise 6.6.6, the order of each of these quotient rings is |p|2 and |q|2 , respectively.
Hence, d is the inverse of e = 2221 modulo

(|p|2 − 1)(|q|2 − 1) = 317184.

Performing the extended Euclidean algorithm, Bob finds that d = 208933. In order to recover m,
he will calculate cd using the fast modular exponentiation in the finite ring Z[i]/(n), during which,
like Alice, he takes Euclidean remainders when divided by n at each stage of the algorithm. Since
in binary d = (110011000000100101)2 , the for loop in the algorithm with take only 18 steps. 4

Exercises for Section 6.6


1. Implementing RSA over Z, take the role of Alice. Bob sends you n = 28852217 and e = 33. Suppose
that you wish to send the plaintext number of m = 45678 to Bob. Calculate the corresponding
ciphertext c = me in Z/nZ.
2. Implementing RSA over Z, take the role of Alice. Bob sends you n = 5352499 and e = 451. Suppose
that you wish to send the plaintext number of m = 87542 to Bob. Calculate the corresponding
ciphertext c = me in Z/nZ.
3. Implementing RSA over Z, take the role of Alice. Bob sends you n = 5352499 and e = 451. Suppose
that you wish to send the message “FLY AT NIGHT” to Bob using the string-to-number sequence
encoding as given in Example 6.6.1. Show that you can use 3 blocks of 4 letters. Then computer the
three ciphertext numbers in Z/nZ.
4. Implementing RSA over Z, take the role of Bob. You select p = 131 and q = 211 so that n = 27641.
(a) Show that e = 191 is an acceptable encryption key for the RSA algorithm.
(b) Find (expressed 0 ≤ d < (p − 1)(q − 1) = 27300 the multiplicative inverse of e in Z/27300Z.
(c) Suppose that Alice sends a message using 29 characters as in Example 6.6.1, also compressing
strings of 3 letters in elements in Z/27641Z as in the Example. From the following ciphertext

(26799, 26841, 22169, 9764, 3426)

recover the plaintext and interpret the result into English.


5. Let F be a finite field. Prove that for every q(x) ∈ F [x], the order of the quotient ring F [x]/(q(x)) is
|F |deg q(x) .
6. Consider the Euclidean domain Z[i] and let I = (z) be a proper nontrivial ideal. (Since Z[i] is a
Euclidean domain, it is a PID.
(a) Prove that
{m + ni | m, n ∈ Z with 0 ≤ m, n ≤ |z|}
is a complete set of distinct representatives for the cosets in Z[i]/I.
(b) Conclude that the quotient ring contains |Z[i]/I| = zz = |z|2 elements.
7. Implementing RSA over Z[i], you take the role of Alice. Bob sends you n = 236 − 325i and e = 14.
(a) What is the ciphertext c for the message m = 101 + 3i?
(b) Suppose that you wish to send the message “FLY AT NIGHT” to Bob. You agree with Bob
to convert strings of letters to strings of Gaussian integers by first mapping the characters
in { , A,B,C, . . . , Y,Z,0 ,0 ,0 .0 } to the integers 0 through 28, and then mapping any pair of such
integers (a, b) to a+bi. Convert the English message to strings of Gaussian integers and determine
the string of ciphertext Gaussian integers.
8. Implementing RSA over F31 [x], you take the role of Bob. Suppose that you pick polynomials f (x) =
x3 + x + 12 and g(x) = x2 + 4x2 + 8.
6.7. ALGEBRAIC INTEGERS 311

(a) First prove that f (x) and g(x) are irreducible.


(b) Prove that any number e that is relatively prime to 2, 3, 5, and 331 is a suitable encryption key.
(c) Find the decryption key d given the encryption key e = 17.
9. Implementing RSA over F31 [x], you take the role of Alice. Suppose that Bob has sent you

n(x) = x5 + 4x4 + 9x3 + 16x2 + 25x + 3,

where all the coefficients are understood to be in F31 . Let m(x) = x3 + 2x. Calculate m(x)3 in
F31 [x]/(n(x)).

6.7
Algebraic Integers
From a historical perspective, various other branches of mathematics—geometry, number theory,
analysis, and so forth—shaped various areas in algebra. Number theory motivated considerable
investigation in ring theory, and in particular topics related to divisibility. It is difficult to briefly
summarize the goal of number theory since it includes many different directions of investigations:
approximations of real numbers by rationals, distribution of the prime numbers in Z, multivariable
equations where we seek only integer solutions (Diophantine equations), and so on. In current termi-
nology, algebraic number theory is a branch of algebra applied particularly to studying Diophantine
equations and related generalizations.
This section offers a glimpse into algebraic number theory as an application of algebra internal
to mathematics. A more complete introduction to algebraic number theory requires field theory
(Chapter 7) including Galois theory (Chapter 11). For further study in algebraic number theory,
the reader may consult [46].
Since algebraic number theory is a vast area, in this section, we content ourselves to introduce
algebraic integers and illustrate how unique factorization in the Gaussian integers answers an oth-
erwise challenging problem in number theory. In so doing, we will show the historical motivation
behind certain topics in ring theory.

6.7.1 – Algebraic Integers


Consider the integers Z as a subring in its field of fractions Q. Now consider a larger field K
containing Q. We would like to imagine a ring R in K that includes Z and, in some intuitive way,
serves the role in K that Z serves in Q. Obviously, the integers Z form a subring of K. The field of
fractions of Z is Q so, among other things, we would like R to be an integral domain whose field of
fractions is K.

Q
R

We typically do not study this construction where K is any field but we often restrict our
attention to subfields of C.
312 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS

Definition 6.7.1
A field K that is a subfield of C and contains Q as a subfield is called a number field.

As we will see more in depth in Section 7.2, the adjective “algebraic” pertains to an element being
the solution to a polynomial. So an algebraic relationship between Z would have to do with how
integers arise as roots of polynomials in Q[x]. Now the solution set to any polynomial p(x) ∈ Q[x]
is equal to the set of solutions of q(x) = cp(x) ∈ Z[x] where c is the least common multiple of all
the denominators of the coefficients in p(x). Then Proposition 6.5.11 shows that if q(x) has only
integers for roots, then its leading coefficient is 1 or −1. This is tantamount to being monic. This
motivates the following definition.

Definition 6.7.2
Let K be a number field. The algebraic integers in K, denoted OK , is the set of elements
in K that are solutions to monic polynomials p(x) ∈ Z[x].

It is not at all obvious from this definition that OK is a ring. We first establish an alternate
characterization of algebraic integers in a number field K before establishing this key result.

Lemma 6.7.3
Suppose α ∈ K. Then α ∈ OK if and only if (Z[α], +) is a finitely generated free abelian
group.

Proof. First suppose that α ∈ OK . Then there exists a monic polynomial p(x) ∈ Z[x] such that
p(α) = 0. Then, there exists a positive integer n and c1 , c2 , . . . , cn ∈ Z such that
αn = −(cn−1 αn−1 + · · · + c1 α + c0 ).
Then by an induction argument, every power k ≥ n of α can be written as a Z-linear combination
of 1, α, α2 , . . . , αn−1 . Thus,
Z[α] ⊆ {cn−1 αn−1 + · · · + c1 α + c0 | c1 , c2 , . . . , cn−1 ∈ Z}.
The right-hand side is a finitely generated free abelian group so by Theorem 4.5.9 is also a finitely
generated free abelian group.
Conversely, suppose that Z[α] is a finitely generated free abelian group. Then there exist
a1 , a2 , . . . , an ∈ Z[α] such that every element p(α) ∈ Z[α] can be written as a linear combination
p(α) = c1 a1 + c2 a2 + · · · + cn an with ci ∈ Z.
Note that for all ai , the element αai ∈ Z[α]. Hence, for each i = 1, 2, . . . , n, there exist integers mij
such that
Xn
αai = mij aj . (6.12)
j=1
n
Viewing M = (mij ) as a matrix and ~a ∈ C is the vector whose ith coordinate is ai , then (6.12) can
be written as
α~a = M~a.
~
Since ~a 6= 0, then ~a is an eigenvector of the matrix M and α is an eigenvalue. Thus, α is the root of
the equation det(xI − M ) = 0. Since M is a matrix of integers, then det(xI − M ) is a polynomial in
Z[x]. By considering the Lagrange expansion of the determinant, it is easy to see that det(xI − M )
is monic. Hence, α ∈ OK . 

Theorem 6.7.4
The set OK of algebraic integers in K is a subring of K.
6.7. ALGEBRAIC INTEGERS 313

Proof. Let α, β be arbitrary elements in OK . From Lemma 6.7.3, let a1 , a2 , . . . , an ∈ K and


b1 , b2 , . . . , bm ∈ K be such that every element in Z[α] is an integer linear combination of the
a1 , a2 , . . . , an and every element in Z[β] is an integer linear combination of b1 , b2 , . . . , bm . Then
every element in the ring Z[α, β] can be written as

qn (α)β n + qn−1 (α)β n−1 + · · · + q1 (α)β + q0 (α)

for some qi (x) ∈ Z[x]. Also by Lemma 6.7.3, we deduce that every element in Z[α, β] is an integer
linear combination of the mn elements ai bj with 1 ≤ i ≤ n and 1 ≤ j ≤ m. In particular, every
element in Z[α − β] and in Z[αβ] can be written as a linear combination of ai bj . Hence, both α − β
and αβ are in OK and so OK is a subring of K. 

By virtue of Theorem 6.7.4, we call OK the ring of algebraic


√ integers in K.
To illustrate
√ Lemma 6.7.3, consider the element φ = (1 + 5)/2, ofter called the golden ratio,
in the ring Q( 5). One of the historical defining properties of the golden ratio is that φ1 = φ − 1,
which can be rewritten as φ2 = φ + 1. It is easy to show that

φn = fn φ + fn−1 ,

where fn is the nth term of the Fibonacci sequence. Consequently, for every polynomial p(x) ∈ Z[x],
the element p(φ) can be written as

p(φ) = c1 + c2 φ, for some c1 , c2 ∈ Z.

Thus, Z[φ] is a finitely generated free


√ abelian group of rank 2. Lemma 6.7.3 allows us to conclude
that φ is an algebraic integer in Q( 5).

6.7.2 – Quadratic Integer Rings


Definition 6.7.2 and Lemma 6.7.3 do not offer a tractable way of determining OK . We now determine
all integer rings in quadratic extensions of Q. √ √ √
Let D be a square-free integer and consider
√ the ring Q[ D]. If α = a + b D ∈ Q[ D], we call
the conjugate of α the element ᾱ = a − b D. Note that αᾱ = a2 − Db2 √ ∈ Q. Since D is square-free,
we know that αᾱ 6= 0 if α 6= 0. Therefore, every nonzero element in Q[ D] is invertible via

ᾱ
α−1 = .
αᾱ
√ √
This shows that√Q[ D] is a field so we√write Q( D).
The ring Z[ D] is a subring of Q( D). However, just because
√ √
Z[ D] = {a + b D | a, b ∈ Z}

does not imply that it is the ring of integers in Q( D).

Theorem 6.7.5
Let D be a square-free integer. Then

OQ(√D) = Z[ω] = {a + bω | a, b ∈ Z},

where (√
D if D ≡ 2, 3 (mod 4)
ω= √
1+ D
2 if D ≡ 1 (mod 4).
314 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS
√ √
Proof. Let α = a + b D ∈ Q( D). Then α is a root of mα (x) = x2 − 2a + (a2 − Db2 ). Let
p(x) ∈ Q[x] be any polynomial that has α as a root. Then we know from elementary algebra that
the quadratic conjugate ᾱ must also be a root. But

(x − α)(x − ᾱ) = mα (x)

so from Proposition 6.5.9 we conclude that if p(x) has α as a root, then mα (x) divides p(x). Write

p(x) = mα (x)q(x).

We now assume that p(x) is monic and in Z[x]. Gauss’ Lemma implies that that mα (x) ∈ Z[x].
Hence, instead of considering all polynomials p(x), we only need to consider mα (x). However, since
we conclude that mα (x) ∈ Z[x], we must have 2a ∈ Z and a2 − Db2 ∈ Z.
Write a = 2s for s ∈ Z. Then we also have

s2
− Db2 = t ∈ Z. (6.13)
4
If b is the fraction b = pq , then after clearing the denominators we get q 2 s2 − 4Dp2 = 4q 2 t. Reducing
this equation to mod 4, we find that q 2 s2 ≡ 0 (mod 4). This leads to two nonexclusive cases:

Case 1: q ≡ 0 (mod 2). That t in (6.13) is an integer implies that q = 2. So

s2 − Dp2 = 4t

where s and p are odd. Considering this equation mod 4 again, we see
√ that this can only hold
if D ≡ 1 (mod 4). Hence, if D ≡ 1 (mod 4), then OQ(√D) = 12 Z[ D], which we can write

1+ D
more succinctly as Z[ω] where ω = .
2
Case 2: s ≡ 0 (mod 2). Then a ∈ Q is √ in fact an integer, which implies that b is also an integer.
So if Case 1 does not hold, then Z[ D] = OQ(√D) .

The theorem follows. 



Rings of the form Z[ D] have entered into our study of rings √ and divisibility in integral domains.
This
√ theorem shows that only when D ≡ 2, 3 (mod 4) is Z[ D] the ring of algebraic integers in
Q( D). For example, if D = −1, since D ≡ 3 (mod 4), we see that the Gaussian integers are√ the
ring hof algebraic integers in Q(i). In contrast, if D = 5, then the ring of algebraic integers in Q( 5)
√ i
1+ 5
is Z 2 .

We define the field norm N : Q( D) → Q by
√ √ √
N (a + b D) = N (α) = αᾱ = (a + b D)(a − b D) = a2 − Db2

it is easy to check that N is multiplicative in the sense that N (αβ) = N (α)N (β) for all α, β ∈ Q.
It is also easy to check that for all square-free D, on the ring of integers OQ(√D) the field norm is
an integer.
The above two properties lead to the following fact: An element α ∈ OQ(√D) is a unit if and only
if N (α) = ±1 and the inverse of α is
ᾱ
α−1 = .
αᾱ

Example 6.7.6 (Eisenstein√Integers). With D = −3, the algebraic integers in Q( −3) form
the ring Z[ω], where ω = 1+i2 3 . Note that the cube roots of unity are the roots of the equation
x3 − 1 = 0, which are given by

2 −1 ± i 3
(x − 1)(x + x + 1) = 0 ⇐⇒ x = 1 or x = .
2
6.7. ALGEBRAIC INTEGERS 315

Writing ζ = (−1 + i 3)/2 we see that ω = ζ + 1 so OQ[√−3] = Z[ζ]. As an element in Z[ζ], the norm
function is  2  2
b b
N (a + bζ) = a − +3 = a2 − ab + b2 .
2 2
The units in Z[ζ] satisfy a2 − ab + b2 = 1. Solving as a quadratic for a in terms of b requires the
discriminant b2 − 4(b2 − 1) ≥ 0, which leads to b2 ≤ 34 . Similarly for a, we must have a2 ≤ 34 . This
leads to only nine pairs (a, b) to check and we find that the six units are
√ ! √ !
−1 + i 3 1+i 3
±1, ±ζ = ± , ±(1 + ζ) = ± .
2 2
h √ i
The ring Z −1+i 2
3
is called the ring of Eisenstein integers. 4

6.7.3 – Integral Closure


From the perspective of abstract algebra, it is undesirable to study a construction on rings only on
a particular case, the integers in this case. The following definitions generalize the construction of
algebraic integers to integral closure of one ring in another. A more general treatment of integral
closure over commutative rings requires module theory. (See Exercise 10.3.31.)

Definition 6.7.7
Let R ⊆ S, where R and S are commutative rings with an identity. An element s ∈ S is
called integral over R if there exists a monic polynomial f (x) ∈ R[x] such that f (s) = 0.
The ring S is called integral over R if every element in S is integral over R.

Definition 6.7.8
If R ⊆ S where R and S are as in the previous definition, then R is called integrally closed
in S if every element in S that is integral over R belongs to R. If R is an integral domain,
we simply say that R is integrally closed (without “in S”) if R is integrally closed in its
field of fractions.

Example 6.7.9. For example, Z is not integrally closed in Q(i) because i ∈ Q(i) is a root of the
monic polynomial x2 + 1 and i ∈
/ Z. In contrast, as we will see in the following proposition, Z[i] is
integrally closed. 4

Proposition 6.7.10
If K is a number field, then the ring of algebraic integers OK is integrally closed.

Proof. Since OK is a subring of K, then the field of fractions of OK is a subfield of K. Let c be an


element in the field of fractions of OK (and hence in K) that is the root of a monic polynomial
f (x) = xn + an−1 xn−1 + · · · + a1 x + a0 such that ai ∈ OK .
By Lemma 6.7.3, each of the rings Z[ai ] is a finitely generated free abelian group. It is possible
to prove by induction that Z[a0 , a1 , . . . , an−1 ] is also a finitely generated free abelian group. (See
Exercise 6.7.2.) Since cn = −(an−1 cn−1 + · · · + a1 c + a0 ), then for every polynomial p(x) ∈ OK [x],
we can write p(c) as
q0 + q1 c + · · · + qn−1 cn−1
for some elements q0 , q1 , . . . , qn−1 ∈ Z[a0 , a1 , . . . , an−1 ]. So if {γ1 , γ2 , . . . , γ` } is a generating set of
the finitely generated free abelian group Z[a0 , a1 , . . . , an−1 ], then
{γi cj | 1 ≤ i ≤ `, 0 ≤ j ≤ n − 1}
316 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS

is a generating set of Z[a0 , a1 , . . . , an−1 , c]. Since (Z[c], +) is a subgroup of (Z[a0 , a1 , . . . , an−1 , c], +),
it is finitely generated by Theorem 4.5.9. By Lemma 6.7.3, we conclude that c ∈ OK . Hence, OK is
integrally closed. 

6.7.4 – Sum of Squares


In 1640, Fermat proved which integers can be written as a sum of two squares. Though this repre-
sented a landmark result in number theory, mathematicians still actively study problems concerning
the sum of squares or the sum of nth powers. Long before Fermat, mathematicians had found all
Pythagorean triples, integer triples (a, b, c) that solve a2 + b2 = c2 . In 1770, Lagrange proved that
every integer can be written as a sum of four squares. In the same year, Waring asked if for all
integers k ≥ 2, there was a number g(k) such that every positive integer could be written as a sum
of g(k) kth powers. This became known as Waring’s problem and was proved in the affirmative in
1909 by Hilbert. Many variations on sums of squares problems and Waring’s problem still drive
considerable investigation.
One variant to the sum of squares problem has to do with deciding in how many ways an integer
can be written as a sum of two squares. Interestingly enough, the algebraic properties of Z[i] allow
us to answer this question.
We have previously seen that Z[i] is a UFD by virtue of being a Euclidean domain. Furthermore,
Section 6.4.4 gave a characterization of the prime elements in Z[i]. The question of whether an
integer n can be written as a sum of two squares is equivalent to asking whether there is a Gaussian
integer a + bi with norm N (a + bi) = a2 + b2 = n.
Since Z[i] is a UFD, then every element a + bi can be written as a product of irreducible elements

a + bi = ρ1 ρ2 · · · ρr

in a unique way (up to reordering and multiplication by units). Now consider also the unique
factorization of n in N

n = p1 p2 · · · pm = N (a + bi) = N (ρ1 )N (ρ2 ) · · · N (ρr ).

If for some i, pi ≡ 3 (mod 4), then the prime pi must divide some N (ρj ). By Proposition 6.4.20, we
must have ρj = pi and N (ρj ) = p2i . If p1 ≡ 1, 2 (mod 4), then also by Proposition 6.4.20, pi factors
into two irreducible elements that are complex conjugates to each other. This proves the following
theorem by Fermat.

Theorem 6.7.11 (Fermat’s Sum of Two Squares Theorem)


An integer n can be written as a sum of two squares if and only if ordp (n) is even for all
primes p with p ≡ 3 (mod 4).

However, the prime factorization in Z[i] leads to a strong result.

Theorem 6.7.12
Given a positive integer n satisfying Theorem 6.7.11, the equation x2 + y 2 = n with (x, y) ∈
Z has 4(a1 +1)(a2 +1) · · · (a` +1) solutions, where ai = ordpi (n) for all the primes pi dividing
n such that pi ≡ 1 (mod 4).

Proof. Counting the number of ways n can be written as a sum of two squares is the same problem
of finding all distinct elements A + Bi ∈ Z[i] such that N (A + Bi) = n. Suppose that the prime
factorization of n in N is
n = 2k pa1 1 · · · pa` ` q12b1 · · · qm
2bm
,
where pi ≡ 1 (mod 4) and qj ≡ 3 (mod 4). Then the prime factorization of n in Z[i] is

n = (1 + i)k (1 − i)k π1a1 π a1 1 · · · π`a` π a1 ` q12b1 · · · qm


2bm
,
6.7. ALGEBRAIC INTEGERS 317

where each πi is an irreducible element in Z[i] such that N (πi ) = πi π i = pi . We notice that
(1 − i) = −i(1 + i) so not only is (1 − i) a conjugate to 1 + i it is also an associate. However, this
does not occur for any of the irreducible πi .
A Gaussian integer α such that N (α) = αα = n can only be created by

α = u(1 + i)k π1c1 π d11 · · · π`c` π d1` q1b1 · · · qm


bm
,

where u is a unit and where for each ≤ i ≤ `, the nonnegative integers ci and di satisfy ci + di = ai .
For each i, there are ai + 1 ways to choose the pairs (ci , di ). By the unique factorization in Z[i], for
all such pairs (ci , di ) and for all i, the resulting Gaussian integers α are distinct. There are 4 units
in Z[i] so there are exactly 4(a1 + 1)(a2 + 1) · · · (a` + 1) Gaussian integers α such that N (α) = n.

To illustrate Theorem 6.7.12, consider 325 = 52 × 13. In Z[i], we have 5 = (2 + i)(2 − i) and
13 = (3 + 2i)(3 − 2i). The six nonassociate Gaussian integers with a norm of 325 are

(2 + i)(2 + i)(3 + 2i) = 1 + 18i,


(2 + i)(2 − i)(3 + 2i) = 15 + 10i,
(2 − i)(2 − i)(3 + 2i) = 17 − 6i,
(2 + i)(2 + i)(3 − 2i) = 17 + 6i,
(2 + i)(2 − i)(3 − 2i) = 15 − 10i,
(2 − i)(2 − i)(3 − 2i) = 1 − 18i.

Multiplying by the three nontrivial units −1, i, and −i, gives the 3 × 6 = 18 other solutions to
x2 + y 2 = 325.

6.7.5 – Historical Perspective


The above section illustrates the interaction between ring theory and number theory. Problems from
classical number theory, from the question of how many ways an integer can be expressed as the sum
or two squares to efforts to prove Fermat’s Last Theorem, motivated many areas of investigations
in ring theory. Such efforts led to studying rings of algebraic integers OK .
Rings of algebraic integers and other integral domains that arise naturally in number theory
are not always unique factorization domains. Some mathematicians perceived this discovery as an
unfortunate result since there was an obstruction to using a strategy that some hoped might be
effective in solving some of these problems in classical number theory.
In 1879, in his third edition to Über der Theorie der ganzen algebraischen Zahlen, Dedekind coined
the term “ideal”[19]. It is thought that this term gained appeal in the mathematical community
because he proved that ideals in OK can be written uniquely as a product of prime ideals. Though
rings of algebraic integers are not UFDs, working with ideals instead of actual elements in OK ,
recovers a sort of unique factorization but into prime ideals.

Exercises for Section 6.7


1. Let R be a UFD and let F be its field of fractions. Prove that R is integrally closed in its field of
fractions. [This result inserts in the chain in (6.8) the the class of domains that are integrally closed
between UFDs and integral domain.]
2. Suppose that a1 , a2 , . . . , an ∈ C are such that the groups (Z[ai ], +) are finitely generated free groups
of ranks r1 , r2 , . . . , rn . Prove that abelian group (Z[a1 , a2 , . . . , an ], +) is finitely generated of rank less
than or equal to r1 r2 · · · rn .
3. This exercises guides a proof that for D = −3, −7, −11 the quadratic integer ring OQ(√D) is a Euclidean
domain with its norm N .

(a) Let D be a negative integer and set ω = (1 + D)/2. Show that as a subset of C, the quadratic
integer ring Z[ω] consists of vertices of congruent parallelograms that cover the plane.
318 CHAPTER 6. DIVISIBILITY IN COMMUTATIVE RINGS

(b) Consider a parallelogram with vertices 0, 1, ω, and ω + 1. Show that the furthest an interior
point P can be from any of the four vertices. [Hint: Show first that P must be the circumcenter
for one of the triangles of the parallelogram.]
p
(c) Use the previous part to prove that every element in C is at most (1 + |D|)/(4 |D|) away from
some element in Z[ω].
(d) Show that (1 + |D|)2 /16|D| < 1 for D = −3, −7, −11.
(e) Show that d(z) = N (z) = |z|2 is a Euclidean function on Z[ω].
[With previous results, this exercise shows that OQ(√D) is a Euclidean domain with the norm function
as the Euclidean function for D = −1, −2, −3, −7, −11. It turns out, though it is more difficult to
prove, that these are the only negative values of D for which OQ(√D) is a Euclidean domain.]
√ √ h √ i
4. Do the Euclidean division of (21 + 13 −7)/2 by (3 + 5 −7)/2 in Z 1+ 2 −7 . (See Exercise 6.7.3.)

5. Consider the element 61 (4 + 4 · 281/3 + 282 3) in the field F = Q( 3 28). Show that this is an algebraic
integer in F by showing that it solves a monic cubic polynomial in Z[x].
6. Find all ways to write 91 as a sum of two squares.
7. Find all ways to write 338 as a sum of two squares.
8. Find all ways to write 61,000 as a sum of two squares.

9. Prove that if α is an algebraic integer in C, then n α is another algebraic integer for any positive n.

6.8
Projects
√ √
Project I. The Ring Z[ 2]. In Example 5.2.3 we briefly encountered the ring Z[ 2].
√ √ √
Recall that the norm function on Z[ 2] is N (a + b 2) = |a2 − 2b2 |. Show first that α ∈ Z[√ 2]
is a unit if and only if√N (α) =√1. Primes in Z are√not necessarily still irreducible
√ in Z[ 2].
For √example, 7 = (3 + 2)(3 − 2). However, 3 + 2 is irreducible since N (3 + 2) = 7 so if
3 + 2 = αβ, then N (α)N (β) = 7 so either N (α) = 1 or N (β) = 1, and one of them would be
a unit.
Here are a few questions to pursue: Find some units in this ring. Can you find patterns in the
units,√like some process that may give you many units? Investigate the irreducible elements
in Z[ 2]. Try to find some. Make sure the ones you list are not associates of each other (i.e.,

multiples of each other via a unit). Try to find √
any patterns as to which elements in Z[ 2]
are irreducible. (You could plot elements a + b 2 as points (a, b) on a graph and look for
patterns.)

Project II. Non-UFD Rings. Find examples of non-UFD rings. Give a number of examples of
nonequivalent factorizations into irreducibles. Prove that certain elements are irreducible. Can
you find nonequivalent factorizations of the same number in which√ the number of irreducible
factors is different? (Suggestion: Focus on rings of the form Z[ D] or OQ(√D) .)

Project III. Eisenstein Integers. Revisit Example 6.7.6 about Eisenstein integers. Look for
irreducible elements in the ring of Eisenstein integers. The Eisenstein integers form a hexagonal
lattice in C. Can you discern any patterns in the irreducible elements in this lattice?

Project IV. RSA in Fp [x]. (For students who have addressed Project VII in Chapter 4.) Modify
the theory for RSA to work over the ring Fp [x]. What takes the role of the primes and the
product of two primes? Decide whether RSA over Z or RSA over Fp [x] is better and state
your reasons why.
6.8. PROJECTS 319

Project V. Factorizations of Polynomials in a non-UFD. Take R to be any polynomial ring


that is not a factorization. Feel free to take a specific R. Study the factorization of polynomials
in R[x]. (Discuss irreducibility in R[x], illustrate nonunique factorization, and so forth.)
Project VI. Quadratic Integer Rings. Consider some positive square-free integers D. Attempt
to determine whether some OQ(√D) might be Euclidean domains (for some Euclidean function)
or PIDs.
Project VII. Power Series as Euclidean Domains. If F is a field, then the ring of formal
power series F [[x]] is a Euclidean domain with the Euclidean function d as the smallest power
on x occurring in the power series. (Check this first.) Choose a field F . Give examples of
interesting power series divisions and power series greatest common divisors using the division
and the Euclidean Algorithm. Find some examples of pairs of power series p and q that are
relatively prime and use the Extended Euclidean Algorithm to find power series s and t such
that sp + tq = 1.
7. Field Extensions

Introductory courses in modern algebra usually present groups, rings, and fields as the three most
important algebraic structures. Though fields are a particular class of rings, they possess unique
properties that lead to many fruitful investigations. This is why they are often viewed as their
own algebraic structure. Chapter 6 studied properties of divisibility in commutative rings. Since
every nonzero element in a field has a multiplicative inverse, questions concerning divisibility are
not interesting: Every nonzero element is an associate to every other. Nonetheless, fields possess a
rich structure.
Field theory has a number of applications internal to mathematics and valuable for digital com-
munication and information security. However, the study of polynomial equations (e.g., how to solve
them, attempts to find patterns in the roots) motivated the development of field theory.
Matching up our study of fields to the list of key themes as outlined in the preface, this chapter
focuses on general properties, key examples, important objects, ways to conveniently describe fields,
and subobjects (subfields). Chapter 11 on Galois theory focuses on properties of homomorphisms
between fields. Both chapters introduce many applications. In fact, as we will see in this chapter
and in Chapter 11, field theory answered many problems in mathematics that had been unsolved
for centuries, not only in classical algebra but also in Euclidean geometry.

7.1
Introduction to Field Extensions
We have already seen that a field is a commutative ring with an identity in which every nonzero
element has an inverse. We have not, however, yet investigated subfields, homomorphisms between
fields, how to concisely describe (or generate) a field, and other important issues as outlined in the
preface of the book. The particular properties of these aspects of field theory are what warrant
studying fields in their own right.

7.1.1 – The Algebraic Structure of a Field Extension


The first distinctive aspect about fields is that homomorphisms between them have a very elementary
structure.

Proposition 7.1.1
A homomorphism of fields ϕ : F → F 0 either is identically 0 or is injective.

Proof. The kernel Ker ϕ is an ideal in F . However, the only two ideals in F are (0) and F itself. If
Ker ϕ = (0), then ϕ is injective. If Ker ϕ = F , then ϕ is identically 0. 
In previous sections, we termed an injective homomorphism (in group theory or ring theory)
an embedding. So we call an injective homomorphism an embedding of F into F 0 . By the first
isomorphism theorem for rings, when there is an embedding of F in F 0 , there exists a subring of F 0
that is isomorphic to F . Consequently, we can view F as a subfield of F 0 . Thus, the existence of
nontrivial homomorphisms between fields is tantamount to containment.
Recall that the characteristic char(R) of a ring R is either 0 or the least positive integer n such
that n · 1 = 0 if such an n exists. By Exercise 5.1.21, the characteristic of an integral domain is
either 0 or a prime number p. Since fields are integral domains, this result applies. The concept of
characteristic of a field leads to a slightly more nuanced concept.

321
322 CHAPTER 7. FIELD EXTENSIONS

Definition 7.1.2
The prime subfield of a field F is the subfield generated by the multiplicative identity.

In other words, the prime subfield is the smallest (by inclusion) field in F that contains the
identity. Suppose that a field has positive characteristic p. Then the prime subfield must contain
the elements
0, 1, 2 · 1, . . . , (p − 1) · 1

and p · 1 = 0. However, the multiplication on these elements as defined by distributivity gives this
set of elements the structure of Fp = Z/pZ. On the other hand, if we suppose that the field F has
characteristic 0, then F must contain

. . . , −(3 · 1), −(2 · 1), −1, 0, 1, (2 · 1), (3 · 1), . . . .

Therefore, Z is contained in F . But then the field F must also contain the field of fractions of Z,
namely Q. Thus, Q is the prime field of F . We have proven the following proposition.

Proposition 7.1.3
Let F be a field. The prime subfield of F is Q if and only if char(F ) = 0 and the prime
subfield of F is Fp if and only if char(F ) = p.

We have encountered a few fields before. For example, Q, R, and C are fields of characteristic 0
while the finite field Fp is by definition of characteristic p.

Definition 7.1.4
If K is a field containing F as a subfield, then K is called a field extension (or simply
extension) of F . This relationship of extension is often denoted by K/F .

The notation for field extension, resembles the usual notation for a quotient ring but the two
constructions are not related. There is never a confusion of notation because the construction of
quotient fields never occurs in field theory. Indeed, the only ideals in a field are the trivial ideal or
the whole field, so the only resulting quotient rings are the trivial field and the field itself.
By Proposition 7.1.3, every field is a field extension of either Q or Fp for some prime p.
Previous sections illustrated a few ways to construct some field extensions. These will become
central to our study of field extensions.
The first method uses a commutative ring generated by elements and then passing to the associ-
ated field of fractions. In other words, suppose that F is a field contained in some integral domain
R. If α ∈ R − F , then F [α] is the subring of R generated by R and α. See Section 5.2.1. Since F [α]
is an integral domain, we can take the field of fractions F (α) of F [α]. See Section 6.2. In this way,
we construct the field extension K = F (α) of F . This construction extends to fields generated by
F and subsets S ⊆ R − F such √ that R(S) √ is the field of fractions of the integral domain R[S].
For example, the fields Q( 2) or Q( 3 7, i) are field extensions of Q, inside C.
This method presupposes that F is a subring of some integral domain R and hence is a subfield
of the field of fractions of R.
The second method involves quotient rings of a polynomial ring. Let F be a field. Since F [x]
is a Euclidean domain, it is also a PID so every ideal I in F [x] is of the form I = (p(x)) for some
polynomial p(x) ∈ F [x]. By Proposition 5.7.10, the ideal (p(x)) is maximal if and only if it is prime
if and only if p(x) is irreducible. So if p(x) is a irreducible polynomial of degree greater than 0, then
the quotient ring F [x]/(p(x)) is a field. Furthermore, the inclusion F into F [x]/(p(x)) is an injection
so F is a subfield of F [x]/(p(x)).

Example 7.1.5. Consider the polynomial p(x) = x2 − 5. By Proposition 6.5.11,√it has no rational
roots so by Proposition 6.5.12 p(x) is irreducible in Q[x]. Then Q[x]/(p(x)) = Q[ 5] is a field. It is
7.1. INTRODUCTION TO FIELD EXTENSIONS 323

obvious by construction that √Q[ 5] is an integral domain. To show that every nonzero element is
invertible, consider α = a + b 5 6= 0. Then
√ √
1 a−b 5 a−b 2
= √ √ = 2 . (7.1)
α (a + b 5)(a − b 5) a − 5b2
2
Since there is no rational number pq such that pq2 = 5, the denominator of this expression is a nonzero
rational number so (7.1) gives a formula for the inverse. 4

One may attempt to generalize the idea in Example 7.1.5 and ask if a ring such as Q[ 3 7] is also
a field. This turns out to be true but the proof becomes more difficult than √ the proof√in the above
Example 7.1.5. For example, finding the inverse to an element such as 1 + 3 3 7 − 12 ( 3 7)2 is not as
simple.

7.1.2 – Field Extensions as Vector Spaces


In many linear algebra courses, because of the importance for applications to science, students
usually encounter vector spaces with the assumption that the scalars are real numbers. However,
the definition for a vector space over R can be generalized to a vector space V over a field F . See
Definition 10.2.1 and Section 10.2. Many of the definitions, algorithms, and constructions introduced
for vector spaces over R can be adapted to vector spaces over a field, in particular:
• systems of linear equations (with coefficients and variables in F );
• the Gauss-Jordan elimination algorithm and the resulting reduced row echelon form for ma-
trices in Mm×n (F );
• linear combinations;
• linear independence;
• subspaces and the span of sets of vectors;
• a basis of a subspace;
• the dimension of a vector space;
• coordinates with respect to a basis;
• linear transformations T : V → V ;
• the determinant of an n × n matrix;
• Cramer’s rule.
In contrast, geometrical interpretations of linear transformations, of the determinant, and of bi-
linear forms do not directly generalize to vector spaces over arbitrary fields. Furthermore, though
the definition of eigenvalues and eigenvectors generalizes immediately, the process of solving the
characteristic equation requires more discussion over an arbitrary field than over R.

Proposition 7.1.6
Let K be a field extension of F . Then K is a vector space over the field F .

Proof. With the addition, (K, +) is an abelian group. Furthermore, by distributivity and associa-
tivity of the multiplication in K, the following properties hold:
• r(α + β) = rα + rβ for all r ∈ F and all α, β ∈ K;
• (r + s)α = rα + sα for all r, s ∈ F and all α ∈ K.
324 CHAPTER 7. FIELD EXTENSIONS

• r(sα) = (rs)α for all r, s ∈ F and all α ∈ K.


• 1α = α for all α ∈ K.
These observations establish all the axioms of a vector space over F and the proposition follows. 

The great value of this simple proposition is that it makes it possible to use the theory of vector
spaces to derive information and structure about extensions of F . The concept of degree is an
important application of this connection between vector spaces over F and extensions of F .

Definition 7.1.7
If the extension K/F has a finite basis as an F -vector space, then the degree of the extension
K/F , denoted [K : F ], is the dimension of K as a vector space over F . In other words
[K : F ] = dimF K. If the extension K/F does not have a finite basis, we say that the
degree [K : F ] is infinite.

√ √
Example
√ 7.1.8. Consider the√extension Q( 5) over Q. According√ to Example 7.1.5, Q( 5) =
in Q( 5) can be written uniquely as a + b 5 for some a, b ∈ Q. Hence,
Q[ √5] so every element √
{1, 5} is a basis for Q( 5) over Q. Thus,

[Q( 5) : Q] = 2. 4

Example 7.1.9 (Complex Numbers). We know that R ⊆ C and C is a field so it is a field


extension of R. However, since every complex number can be written uniquely as a + bi for a, b ∈ R,
then {1, i} is an R-basis of C over R. Thus, [C : R] = 2. 4

The two methods outlined above for constructing field extensions of a given field F appear quite
different. However, a first key result that emerges from the identification of a field extension of F
with a vector space over F is that these two methods are in fact the same. We develop this result
in the theorems below.

Theorem 7.1.10
If [F (α) : F ] = n is finite, then α is the root of some irreducible polynomial p(x) ∈ F [x] of
degree n. Furthermore, F (α) = F [α].

Proof. If [F (α) : F ] = 1 then F (α) = F so α ∈ F and the corresponding polynomial p(x) is


p(x) = x − α. We assume from now on that [F (α) : F ] > 1.
The field F (α) includes, among other elements, linear combinations of the form

a0 + a1 α + a2 α2 + · · · + ak αk

where ai ∈ F . All these linear combinations are polynomials in F [x] evaluated at α. Since F (α)
has dimension n as a vector space over F , the set {1, α, α2 , . . . , αn } is linearly dependent. Thus,
there exists some nontrivial polynomial q(x) of degree at most n such that q(α) = 0. Let p(x) be a
polynomial of least degree such that p(α) = 0 and let us call deg p(x) = d. (The fact that such a
polynomial exists follows from the well-ordering principle of the integers and the fact that [F (α) : F ]
is finite.)
We claim that p(x) is irreducible in F [x]. Suppose that p(x) is not irreducible. Then p(x) =
p1 (x)p2 (x) with p1 (x), p2 (x) ∈ F [x] with positive degrees. Thus, evaluating p(x) at the root α gives

0 = p(α) = p1 (α)p2 (α).

Since there are no zero divisors in F , p1 (α) = 0 or p2 (α) = 0. This contradicts the fact that p(x)
is a polynomial in F [x] of minimal degree for which α is a root. Hence, we conclude that p(x) is
irreducible.
7.1. INTRODUCTION TO FIELD EXTENSIONS 325

Writing p(x) = c0 + c1 x + · · · + cd xd , we note that cd 6= 0, so


c0 c1 cd−1 d−1
αd = − − α − ··· − α .
cd cd cd

This expresses αd as a linear combination of {1, α, · · · , αd−1 }. By a recursion argument, we can see
that for all m ≥ 0, the element αm can also be written as a linear combination of {1, α, · · · , αd−1 }.
Hence, the set of powers of α spans F [α] as a vector field over F . However, a priori F [α] is only a
subset of F (α).
By definition, every element in F (α) can be written as a rational expression of α, namely

a(α)
γ= where a(x), b(x) ∈ F [x] and b(α) 6= 0.
b(α)

Suppose also that a(x) and b(x) are chosen such that b(x) has minimal degree and γ = a(α) b(α) .
Performing the Euclidean division of p(x) by b(x) we get p(x) = b(x)q(x) + r(x), where deg r(x) <
6 0. Then
deg b(x) or r(x) = 0. Assume that r(x) =

a(α) a(α)q(α) a(α)q(α) a(α)q(α)


γ= = = =− .
b(α) b(α)q(α) p(α) − r(α) r(α)

Hence, a(α)/b(α) can be written as a2 (α)/b2 (α) where deg b2 (x) < deg b(x). This contradicts the
choice that b(x) has minimal degree. Consequently, r(x) = 0 and hence b(x) divides a(x). Then
the expression γ = a(α)/b(α) can only be such that b(x) has minimal degree among such rational
expressions if b(x) is a constant. Consequently, F (α) ⊂ F [α] and therefore F (α) = F [α].
Consequently, it also follows that d = dimF F (α) = n and p(x) is an irreducible polynomial such
that p(α) = 0. 

Definition 7.1.11
Let F be a field. An extension field K over F is called simple if K = F (α) for some α ∈ K.

It is important to note as a contrast that F (α) is not necessarily equal to F [α] if [F (α) : F ] is
infinite which may occur if α is not the root of a polynomial in p(x). For example, keeping t as a
free parameter, F [t] is a subring of F (t). Furthermore, t is not a unit in F [t] whereas in F (t), the
multiplicative inverse to the polynomial t is the rational expression 1t . Hence, F [t] is a strict subring
of F (t).
Theorem 7.1.10 may feel unsatisfactory because the hypotheses assumed that α was some element
in a field extension of F that remained unspecified. So one might naturally ask whether there exists
a field extension of F that contains some α such that p(α) = 0. We have already seen that the answer
to this question is yes and we encapsulate the result in the following converse to Theorem 7.1.10.

Theorem 7.1.12
Let p(x) ∈ F [x] be an irreducible polynomial of degree n. Then K = F [x]/(p(x)) is a field
in which θ = x = x + (p(x)) that satisfies p(θ) = 0. Furthermore, the elements

1, θ, θ2 , . . . , θn−1

form a basis of K as a vector space over F . So [K : F ] = n and K = F [θ].

Example 7.1.13. As an example of constructing a field extension, let F = F2 and consider the
polynomial p(x) = x3 + x + 1. Since p(x) has no roots in F2 and since it is a cubic, it is irreducible
by Proposition 6.5.12. Hence, K = F2 [x]/(x3 + x + 1) is a field extension of F2 with [K : F2 ] = 3.
Consequently, K is a finite field containing 23 = 8 elements. 4
326 CHAPTER 7. FIELD EXTENSIONS

Let L be a field extension of F and let α ∈ L. So if α is a root of the irreducible polynomial


p(x) ∈ F [x], then F (α) = F [α]. It is easy to understand the addition (and subtraction) in F [α] but
the multiplication requires some simplifications and the process of finding inverses in F [α] is not
obvious.
One method to find inverses to elements q(α) ∈ F [α] come from the fact that F [x] is a Euclidean
domain. Every element in K ∼ = F [x]/(p(x)) is of the form q(x) = q(x) + (p(x)) where q(x) can be
chosen so that deg q(x) < deg p(x). Since p(x) is irreducible in F [x] then p(x) and q(x) do not have
a common divisor of positive degree. Hence, performing the Extended Euclidean Algorithm leads
us to find a(x) and b(x) such that

a(x)q(x) + b(x)p(x) = 1

in F [x]. In the quotient ring K, this implies that a(x)q(x) = 1. Thus, in K, a(α)q(α) = 1, so that
a(α) is the inverse to q(α). This method is not the simplest method to find inverses in K. The
example below illustrates this method and a faster algorithm that uses linear algebra.
Example 7.1.14. Let F = Q. Consider p(x) = x3 − 2. This is irreducible in Q[x]. Then K =
Q[x]/(x3 − 2) is a field and√we denote by
√ θ an element in K such that θ3 − 2 = 0. (For simplicity,
we could assume that θ = 2 ∈ C but 2 is not the only complex number that solves x3 − 2 = 0.)
3 3

Then from the above theorems,

K = {a + bθ + cθ2 | a, b, c ∈ Q}.

The fact that K is isomorphic to a subfield of C is irrelevant to this construction.


Let α = 3 − θ + θ2 and β = 5 + 3θ − 21 θ2 . The addition in K occurs component-wise so

1
α + β = 8 + 2θ + θ2 .
2
For the product, we remark that θ3 = 2 so also θ4 = 2θ and thus
3 1 1
αβ = 15 + 9θ − θ2 − 5θ − 3θ2 + (2) + 5θ2 + 3(2) − (2θ)
2 2 2
1 2
= 22 + 3θ + θ .
2
To find the inverse of α via the Extended Euclidean algorithm, we first find a linear combination
between x3 − 2 and x2 − x + 3. The Euclidean Algorithm gives

x3 − 2 = (x2 − x + 3)(x + 1) + (−2x − 5)


 
1 7 47
x2 − x + 3 = (−2x − 5) − x + + ,
2 4 4

and taking this process backwards leads to


  
4 47 4 2 1 7
1= = (x − x + 3) − (−2x − 5) − x +
47 4 47 2 4
  
4 2 3 2
 1 7
= (x − x + 3) − (x − 2) − (x − x + 3)(x + 1) − x +
47 2 4
    
4 2 1 7 4 3 1 7
= (x − x + 3) 1 + (x + 1) − x + − (x − 2) − x +
47 2 4 47 2 4
1 1
= (11 + 5x − 2x2 )(x2 − x + 3) − (−2x + 7)(x3 − 2).
47 47
Consequently, in K,
11 5 2
(3 − θ + θ2 )−1 = + θ − θ2 .
47 47 47
7.1. INTRODUCTION TO FIELD EXTENSIONS 327

Linear algebra offers an easier way to find inverses (and, more generally, compute division).
Suppose that α−1 = a0 + a1 θ + a2 θ2 . Then this element satisfies

(3 − θ + θ2 )(a0 + a1 θ + a2 θ2 ) = 1
⇐⇒3a0 + (3a1 − a0 )θ + (3a2 − a1 + a0 )θ2 + (−a2 + a1 )θ3 + a2 θ4
⇐⇒3a0 + (3a1 − a0 )θ + (3a2 − a1 + a0 )θ2 + (−a2 + a1 )(2) + a2 (2θ)
⇐⇒(−2a2 + 2a1 + 3a0 ) + (2a2 + 3a1 − a0 )θ + (3a2 − a1 + a0 )θ2 .

Consequently, the coefficients a0 , a1 , a2 satisfy the system of linear equations



−2a2 + 2a1 + 3a0 = 1

2a2 + 3a1 − a0 = 0

3a2 − a1 + a0 = 0.

Using a computer or calculator (or working by hand) gives


   
−2 2 3 1 1 0 0 −2/47
rref  2 3 −1 0 = 0 1 0 5/47  .
3 −1 1 0 0 0 1 11/47

Interpreting this calculation gives precisely the same result for (3 − θ + θ2 )−1 as above. 4

7.1.3 – Isomorphisms of Fields


Theorems 7.1.12 and 7.1.10 indirectly lead to an interesting consequence. Let L be a field extension
of a field F . Suppose that α, β ∈ L such that α and β both are roots of the same irreducible
polynomial p(x). Then we have isomorphisms of fields

F [α] ∼
= F [x]/(p(x)) ∼
= F [β].

In fact, the (composition) isomorphism described above satisfies f : F [α] → F [β] with f (c) = c for
all c ∈ F , f (α) = β, and all other values of f resulting from the axioms of a homomorphism.
It is not at all uncommon that F [α] 6= F [β] so this isomorphism is not trivial or even an
automorphism. Recall from group theory that an automorphism is an isomorphism from a field to
itself.
We have shown that there is a close connection between properties of subfields and morphisms
(homomorphisms between fields). The only nontrivial morphisms are either injections, which are
embeddings, and isomorphisms. Despite or perhaps because of this restriction, the study of field
extensions and automorphisms of fields is rich and has many applications.

Exercises for Section 7.1


1. Consider the ring F = F5 [x]/(x2 + 2x + 3) and call θ the element x in F .
(a) Prove that F is a field.
(b) Prove that every element of F can be written uniquely as aθ + b, with a, b ∈ F5 .
(c) Let α = 2θ + 3 and β = 3θ + 4. Calculate (i) α + β; (ii) αβ; (iii) α/β.
√ √ 2 √ 2 √
2. Let α = 1 + 3 2 − 3 3 2 and β = 3 + 3 2 in Q( 3 2). Calculate (i) αβ; (ii) α/β; (iii) β/α.
3. Consider the ring F = Q[x]/(x3 + 3x + 1) and called θ the element x in F .
(a) Prove that F is a field (which we can write as Q(θ)).
(b) Prove that every element of F can be written unique as aθ2 + bθ + c, with a, b ∈ Q.
(c) Let α = 2θ2 − 1 and β = θ2 + 5θ − 3. Calculate (i) α + β; (ii) αβ; (iii) α/β.
4. Let F = F7 and consider the irreducible polynomial p(x) = x3 − 2. Write θ for the element x in
F7 /(p(x)). Find the inverse of θ2 − θ + 3.
√ √
5. In Q( 4 10), find the inverse of 1 + 4 10.
328 CHAPTER 7. FIELD EXTENSIONS

6. Consider the field of order 8 constructed in Example 7.1.13. Call θ an element in F such that
θ3 + θ + 1 = 0, so that we can write F = F2 [θ].
(a) Let α = θ2 + 1, β = θ2 + θ + 1, and γ = θ. Calculate: (i) αγ + β; (ii) α/γ; (iii) α2 + β 2 + γ 2 .
(b) Solve for x in terms of y in the equation y = αx + β.
(c) Prove that the function f : F → F defined by f (α) = α3 is a cyclic permutation on F .
7. Consider the field F of order 8 constructed in Example 7.1.13. Prove that U (F ) is a cyclic group.
√ √
8. Prove that {1, 2, 3} are linearly independent in C as a vector space over Q.
√ √
9. Prove that {1, 3, i, i 3} are linearly independent in C as a vector space over Q.
√ √
10. Consider the ring K = Q[ 2, 5].
√ √
(a) Prove that K = Q[ 2 + 5] and prove also that this is a field. Indicate [K : Q].
√ √ √ √ √
(b) Set γ = 2 + 5. Show that B1 = {1, 2, 5, 10} and B2 = {1, γ, γ 2 , γ 3 } are two bases of K
as a vector space over Q.
(c) Determine the change of coordinate matrix from the basis B2 to B1 coordinates.
(d) Use part (c) to write 2 + 3γ 2 − 7γ 3 in the basis B1 .
√ √ √
(e) Use part (c) to write −3 + 2 − 5 + 7 10 as a linear combination of {1, γ, γ 2 , γ 3 }.
11. Construct a field of 9 elements and write down the addition and multiplication tables for this field.
12. Consider the field F3 (t) of rational expressions with coefficients in F3 . Let
2t + 1 1 t+1
α= , β= , γ= .
t+2 2t2 + 1 t2 + 1
Calculate (a) α + β; (b) βγ; (c) αγ/β.
13. Prove that an automorphism of a field F leaves the prime subfield of F invariant.
√ √ √ √
14. Prove that the function f : Q[ 5] → Q[ 5] defined by f (a + b 5) = a − b 5 is an automorphism.
15. Prove that there exists an isomorphism of fields f : R → R that maps π to −π.
16. Let K = F (α) where α is the root of some irreducible polynomial p(x) ∈ F [x]. Suppose that p(x) =
an xn + · · · + a1 x + a0 . Show that the function fα : K → K defined by fα (x) = αx is a linear
transformation and that the matrix of fα with respect to the ordered basis (1, α, α2 , . . . , αn−1 ) is
0 0 0 ··· 0 −a0 /an
 
1 0 0 · · · 0 −a1 /an 
 
0 1 0 · · · 0 −a2 /an 
 .
. . . .
 .. .. .. . . ... .. 
. 
0 0 0 · · · 1 −an−1 /an
17. (Analysis) Prove that the only continuous automorphism on the field of real numbers is the identity
function.
18. Let ϕ : F → F 0 be an isomorphism of fields. Let p(x) ∈ F [x] be an irreducible polynomial and let
p0 (x) be the polynomial obtained from p(x) by applying ϕ to the coefficients of p(x). Let α be a root
of p(x) in some extension of F and let β be a root of p0 (x) in some extension of F 0 . Prove that there
exists an isomorphism
Φ : F (α) → F 0 (β)
such that Φ(α) = β and Φ(c) = ϕ(c) for all c ∈ F .
√ √
19. Let D be a square-free integer and let K = Q[ D]. Prove that the function f : Q[ D] → M2 (Q)
defined by

 
a Db
f (a + b D) =
b a

is an injective ring homomorphism. Conclude that M2 (Q) contains a subring isomorphic to Q[ D].

20. Consider the field Q[ 3 2].

(a) Prove that the function ϕ : Q[ 3 2] → M3 (Q) defined by
 
√ √ a 2c 2b
3 3 2
ϕ(a + b 2 + c 2 ) =  b a 2c
c b a
is an injective
√ homomorphism. Conclude that M3 (Q) contains a subring that is a field isomorphic
to Q[ 3 2].
7.2. ALGEBRAIC EXTENSIONS 329


3
√ 2
(b) Use this homomorphism and matrix inverses to find the inverse of 3 − 2+532 .
21. Consider the field of rational expressions K1 = Q(x) with coefficients in Q and also the field K2 =

Q( p | p is prime). Prove that K1 and K2 are extensions of Q of infinite degree. Prove also that K1
and K2 are not isomorphic.

7.2
Algebraic Extensions
Section 7.1 introduced field extensions and emphasized the properties that follow from viewing an
extension of a field F as a vector space over F . Theorem 7.1.10 brought together two disparate
ways of constructing field extensions. It is also precisely this theorem that connects field theory
so closely with the study of polynomial equations. This section further develops consequences of
Theorem 7.1.10 by studying field extensions K/F as a field K containing roots of polynomials in
F [x].

7.2.1 – Algebraic Elements


Let F be a field and let K be an extension of F .

Definition 7.2.1
An element α ∈ K is called algebraic over F if α is a root of some nonzero polynomial
f (x) ∈ F [x]. If α ∈ K is not algebraic over F , then α is called transcendental over F .
If every element of K is algebraic over F , then the extension K/F is called an algebraic
extension.

Consider the fields Q ⊆ R. The element 2 in R is algebraic over Q because it is the root of
x2 − 2. As another example, note that it is easy to show that cos(3θ) = 4 cos3 θ − 3 cos θ. Hence,
setting θ = π9 , we see that
π π π 1
4 cos3 − 3 cos = cos = .
9 9 3 2
Hence, though we do not know the value of cos(π/9), we see that it is a root of the cubic equation
4x3 − 3x − 12 = 0 and so it is algebraic over Q. By an abuse of language, if we say that a number is
algebraic (with no other qualifiers) we usually imply that K = C and F = Q.
A first important property about algebraic elements is that for each algebraic element α there
exists a naturally preferred polynomial with α as a root.

Proposition 7.2.2
Let K/F be a field extension and let α ∈ K be algebraic over F . There exists a unique
monic irreducible polynomial mα,F (x) ∈ F [x] such that α is a root of mα,F (x).

Proof. Consider the set of polynomials S = {p(x) ∈ F [x] − {0} | p(α) = 0}. Since α is algebraic over
F , the set S is nonempty. By the well-ordering principle, S contains an element p(x) of least degree n.
Let p(x) be an element of S of least degree. Assume p(x) is reducible with p(x) = p1 (x)p2 (x). Then
0 = p(α) = p1 (α)p2 (α) so p1 (α) = 0 or p2 (α) = 0. So p1 (x) ∈ S or p2 (x) ∈ S but this contradicts
the minimality of p(x) in S. Thus, any polynomial in S of minimum degree is irreducible.
Now let

a(x) = an xn + · · · + a1 x + a0 and b(x) = bn xn + · · · + b1 x + b0


330 CHAPTER 7. FIELD EXTENSIONS

be two polynomials of least degree in S. Then α is also a root of q(x) = bn a(x) − an b(x) so q(x)
is either 0 or in S. However, the subtraction cancels the leading terms so q(x) is either 0 or has
deg q(x) < n. Since n is the least degree of any polynomial in S, we conclude that q(x) = 0.
Thus, bn a(x) = an b(x) and any two polynomials in S of least degree are multiples of each other.
Consequently, there exists a unique monic irreducible polynomial in S. 

The notation of mα,F (x) indicates the dependence of the polynomial on the specific field of
coefficients. The polynomial may change based on the context of the field of coefficients but, as the
following corollary shows, the corresponding polynomials in different fields are related.

Corollary 7.2.3
Let F ⊆ L ⊆ K be a chain of fields and suppose that α ∈ K is algebraic over F . Then α
is algebraic over L and mα,L (x) divides mα,F (x) in L[x].

Proof. Both polynomials mα,L (x) and mα,F (x) are in L[x] and have α as a root. The polynomial
division in L[x] of the two polynomials gives

mα,F (x) = mα,L (x)q(x) + r(x)

where r(x) = 0 or deg r(x) < deg mα,L (x). However, since α is a root of both polynomials, we
deduce that r(α) = 0. Since deg mα,L (x) is the least degree of a nonzero polynomial in L[x] that
has α as a root, then r(x) = 0 and hence mα,L (x) divides mα,F (x). 

Definition 7.2.4
The polynomial mα,F (x) is called the minimal polynomial for α over F . The degree of the
algebraic element α over F is the degree deg mα,F (x).

The proof of Theorem 7.1.10 already established the following proposition but we restate it in
this context to make the connection explicit.

Proposition 7.2.5
Let α be algebraic over F . Then F (α) ∼
= F [x]/(mα,F (x)) and [F (α) : F ] = deg mα,F (x).

This proposition illustrates the reason for using the term “degree” as opposed to just “dimension”
for the quantity [F (α) : F ].

Example 7.2.6. Consider the element α = 3 7 over Q. It is a root of x3 − 7, however, we do not
yet know that this is the minimal polynomial m √ 3
7,Q (x). By Proposition 6.5.11, we can tell that
x3 − 7 does not have a rational root. Since it is a cubic, we deduce that it is irreducible. Hence,
mα,Q (x) = x3 − 7. 4
√ √
Example 7.2.7. Consider the element α √ = 2+ 3 ∈ C. We determine √ the degree and the minimal
polynomial over Q. Note that α2 = 2 + 2 6 + 3 so then α2 − 5 = 2 6. Hence,

(α2 − 5)2 = 24 =⇒ α4 − 10α2 + 25 = 24 =⇒ α4 − 10α2 + 1 = 0.

So α is a root of p(x) = x4 − 10x2 + 1. We do not yet know if p(x) is the minimal polynomial of α,
since we have not checked if it is irreducible. By the Rational Root Theorem, and since neither 1
nor −1 is a root of p(x), then p(x) has not roots and hence no linear factors. If p(x) is reducible,
then it must be the product of two quadratic polynomials. Furthermore, without loss of generality,
we can assume the polynomials are monic. Furthermore, by Gauss’ Lemma, if p(x) factors over Q,
then it must factor over Z. Hence, we see that if p(x) factors, then there are two cases:

p(x) = (x2 + ax + 1)(x2 + bx + 1) or p(x) = (x2 + cx − 1)(x2 + dx − 1).


7.2. ALGEBRAIC EXTENSIONS 331

These two cases require respectively


( (
a+b=0 c+d=0
or
ab + 2 = −10 cd − 2 = −10.
√ √ √ √
The first case leads to (a, b) = ±(2 3, −2 3) and the second gives (c, d) = ±(2 2, −2 2). None of
4 2
these √ are in Q so p(x) is irreducible and mα,Q (x) = x − 10x + 1. We deduce that the degree
√ values
of 2 + 3 over Q is 4. √ √ √
Now consider the field L = Q( √ 2). Assume√that 3 is an element of Q( 2). Then there exist
rational
√ numbers r and s such that 3 = r + s 2. Squaring both sides and then √ rearranging gives
2 = (3 − a2 − 2b2 )/(2ab).
√ This
√ is a contradiction because
√ we know that 2 is not a rational
number. Consequently, 3 ∈ / Q( 2), and thus α ∈/ Q( 2) so deg mα,L (x) > 1. From our previous
calculation, we see that in L[x],
√ √
mα,Q (x) = (x2 − 2 2x − 1)(x2 + 2 2x − 1).

Hence,√α is a root of one of those two quadratics. By direct observation, we find that mα,L (x) =
x2 − 2 2x − 1. √ √ √
To take a different approach
√ in looking for mα,K (x), where√K = Q( 3), notice that√α− 3 = 2.
After squaring, α2 − 2 3α + 3 = 2 so α is a root of x2 − 2 3x + 1. Since
√ α∈ / Q( 3), then α is
not a root of a degree-1 polynomial so we must have mα,K (x) = x2 − 2 3 + 1. Again, we see that
mα,K (x) divides mα,Q (x) since

mα,Q (x) = mα,K (x)(x2 + 2 3x + 1). 4

To recapitulate some of our results, Theorem 7.1.10 established that if [F (α) : F ] is finite then
α is algebraic. Conversely, if [F (α) : F ] = n is finite, then for some n, the set {1, α, α2 , . . . , αn } is
linearly dependent, so there exist nonzero ci ∈ F such that

cn αn + · · · + c1 α + c0 = 0.

Thus, α is algebraic. Notice that this reasoning extends further and supports the following propo-
sition.

Proposition 7.2.8
If the extension K/F is of finite degree then every element in K is algebraic and hence K
is an algebraic extension.

We should underscore that this implication is not an “if and only if” statement. Indeed, it is easy
to construct algebraic extensions that are not of finite degree. However, in order to make examples
of this precise, we need a few more facts about degrees.

7.2.2 – Properties of the Degree of an Extension


Even though it seems like a simple concept, properties of the degree of field extensions have far
ranging consequences. As we will see, these consequences become the foundation for the solutions to
mathematical questions that had been unsolved for hundreds of (and some over a thousand) years
prior to the discovery of field theory. We begin to develop those properties now.

Theorem 7.2.9
Let F ⊆ K ⊆ L be fields. The degrees of the extensions satisfy

[L : F ] = [L : K][K : F ].
332 CHAPTER 7. FIELD EXTENSIONS

Proof. Suppose first that either [L : K] or [K : F ] is infinite. If [K : F ] is infinite, then there does
not exist a finite basis for K as a vector space over F . Since L contains K as a subspace then
dim L ≥ dim K so L does not have a finite basis over F either so [L : F ] is infinite. If [L : K] is
infinite, then L does not have a finite basis as a vector space of K. If L possessed a finite basis as
a vector space over F , then since F ⊂ K, this would serve as a finite basis of L over K. Hence, if
[L : K] is infinite, then so is [L : F ].
Now suppose that [L : K] = n and [K : F ] = m. Let {α1 , α2 , . . . , αn } be a basis of L over K
and let {β1 , β2 , . . . , βm } be a basis of K over F . Every element x ∈ L can be written as
n
X
x= ci αi
i=1

for ci ∈ K. Furthermore, for each i we can write the elements ci as


m
X
ci = dij βj
j=1

with dij ∈ F . Then


n X
X m
x= dij αi βj .
i=1 j=1

Therefore, the field L is spanned as a vector space over F by {αi βj | 1 ≤ i ≤ n, 1 ≤ j ≤ m}.


On the other hand, consider a linear combination of the form
n Xm m n
!
X X X
0= dij αi βj = dij αi βj .
i=1 j=1 j=1 i=1

Since {βj }m
j=1 forms a basis of K over F , then for each j,

n
X
dij αi = 0.
i=1

Since the set {αi }ni=1 forms a basis of L over K, it is linearly independent so dij = 0 for all pairs
(i, j). Thus,
{αi βj | 1 ≤ i ≤ n, 1 ≤ j ≤ m}
is a linearly independent set. Hence, it forms a basis of L over F and so

[L : F ] = dimF L = nm = [L : K][K : F ]. 

Though Theorem 7.2.9 describes how the degree evaluates on extensions of extensions, it can
also be used to deduce information about field containment as the following example shows.
√ √ √
Example
√ 7.2.10. We prove that√ 7 ∈
/ Q(

3
7). The minimal polynomial of 3 7 over Q is x3 − 7 so
[Q( 3 7) : Q] = 3. Assuming that 7 ∈ Q( 3 7). Then
√ √
3
Q ⊆ Q( 7) ⊆ Q( 7).

So by Theorem 7.2.9, √ √ √ √
3 3
[Q( 7) : Q( 7)][Q( 7) : Q] = [Q( 7) : Q].
√3

Hence,√3 = 2[Q(
√ 7) : Q( 7)], which is a contradiction because degrees of extensions are integers.
Thus, 7 ∈ / Q( 3 7).
Note that this reasoning is much easier than proving directly that there do not exist a, b, c ∈ Q
√ √ √ 2
such that 7 = a + b 3 7 + c 3 7 . 4
7.2. ALGEBRAIC EXTENSIONS 333

4 2
√ √
Example 7.2.11. In Example 7.2.7 we √ found that mα,Q (x)√= x − 10x + 1 for α = 2 + 3.
√ : Q] = 4. Since Q ⊆ Q( 2) ⊆ Q(α), and [Q( 2) : Q] = 2, then by Theorem
Hence, [Q(α) √ 7.2.9,
[Q(α) : Q( 2)] = 2. Hence, α is the root of an irreducible quadratic polynomial in Q( 2)[x]. 4

Definition 7.2.12
A field extension K over F is said to be simple field extension if K = F (α) for some element
α ∈ K. Moreover, the element α is called a primitive element of K over F .
The extension K is said to be finitely generated if K = F (α1 , α2 , . . . , αk ) for some elements
α1 , α2 , . . . , αk ∈ K.

This definition makes no assumption that the generating elements α1 , α2 , . . . , αk are algebraic
over F .
Note that if F is a field then F (α, β) = F (α)(β), or more precisely the field extension over F
generated by α and β is equal to the field extension over F (α) generated by β. Of course, this also
implies that F (α, β) = F (β)(α).
Suppose now that α1 , α2 , · · · , αk are all algebraic over a field F and have degree deg αi = ni .
We define a chain of subfields Fi by F0 = F and

Fi = F (α1 , α2 , · · · , αi ) for 1 ≤ i ≤ k.

We then have
F = F0 ⊆ F1 ⊆ F2 ⊆ · · · ⊆ Fk .
For all i we have ni = deg mαi ,F (x). Furthermore, Fi = Fi−1 (αi ) so by Corollary 7.2.3, mαi ,Fi−1 (x)
divides mαi ,F (x). Thus, [Fi : Fi−1 ] ≤ ni . Therefore,

[F (α1 , α2 , . . . , αk ) : F ] = [Fk : Fk−1 ] · · · [F2 : F1 ][F1 : F0 ] ≤ nk · · · n2 n1 .

This give an upper bound for [Fk : F ]. However, Theorem 7.2.9 gives a lower bound on [Fk : F ]
because F ⊆ F (αi ) ⊆ Fk . This establishes the important theorem.

Theorem 7.2.13
A field extension K/F is finite if and only if K is generated by a finite number of algebraic
elements over F . If these algebraic elements have degrees n1 , n2 , . . . , nk over F , then

lcm(n1 , n2 , . . . , nk ) [K : F ] and [K : F ] ≤ n1 n2 · · · nk .

It is not always easy to determine when [K : F ] is strictly less than n1 n2 · · · nk or when it is an


equality. The following examples and some of the exercises explore a variety of situations related to
this theorem.
√ √ √ √
Example 7.2.14. Consider the field K = Q( 2, 5). The elements 2 and 5 both have degree
2 over Q. By Theorem √ 7.2.13, [K√: Q] is a multiple
√ of 2 and less than or equal to 4. Hence, [K : Q]
can be 2 or 4. Since√ 5 ∈ / Q( 2) the [K : Q( 2)] is strictly greater than 1, so [K : Q] > 2.
Consequently, [K : Q( 2)] = 2 and [K : Q] = 4. 4
√ √
Example
√ 7.2.15. Following Example 7.2.7, it is straightforward to show that α = 2 + 3 and

β = 2 + 5 both have degree 4 over Q. Theorem 7.2.13 tells us that [Q(α, β) : Q] is a multiple of
√ [Q(α, β) : Q] is equal to 4, 8, 12, or 16.
4 but is less or equal to 16. Hence, we deduce that
Now β is the root of the polynomial x2 − 2 2x − 3 ∈ Q(α)[x]. Without knowing whether this
polynomial is irreducible in Q(α)[x], we can deduce that [Q(α, β) : Q(α)] ≤ 2. Hence, [Q(α, β) : Q]
is equal to 4 or 8. √
We can prove that 5 ∈ / Q(α). Assume that for some a, b, c, d ∈ Q,
√ √ √ √
5 = a + b 2 + c 3 + d 6.
334 CHAPTER 7. FIELD EXTENSIONS

Then squaring this expression


√ √ √ √ √ √
5 = a2 + 2b2 + 3c2 + 6d2 + 2ab 2 + 2ac 3 + 2ad 6 + 2bc 6 + 4bd 3 + 6cd 2
√ √ √
=⇒(5 − a2 − 2b2 − 3c2 − 6d2 ) + (2ad + 2bc) 6 = (2ab + 6cd) 2 + (2ac + 4bd) 3.

After squaring once√more, we obtain an equation for 6 in terms of rational√ numbers. This is a
contradiction since 6 cannot be expressed as a rational number. Since 5 ∈ / Q(α), then β ∈/ Q(α)
and [Q(α, β) : Q(α)] > 1. This allows us finally to deduce that [Q(α, β) : Q] = 8. 4
√ √
Example 7.2.16. As another
√ example, it is not hard to show that Q( 4 2, 6 2) is in fact equal to
12
the simple extension Q( 2). Thus,

4
√6
[Q( 2, 2) : Q] = 12,

which is less than 4 × 6 = 24, the product of the degrees of the generating algebraic elements. This
gives another example where [K : Q] is strictly less than the product of the degrees of the generating
algebraic elements. 4

Theorem 7.2.13, along with other theorems on degrees, lead to a powerful corollary that would
be rather difficult to prove directly from the definition of algebraic elements.

Corollary 7.2.17
Let α and β be two algebraic elements over a field F . Then the following elements are also
algebraic:
α
α + β, α − β, αβ, (for β 6= 0).
β

Proof. Suppose that α and β are algebraic over F with degrees n1 and n2 , respectively. By Theo-
rem 7.2.13, [F (α, β) : F ] ≤ n1 n2 . Let γ be α + β, α − β, αβ, or α/β. Then F (γ) is a subfield of
F (α, β). By Theorem 7.2.9, [F (γ) : F ] divides [F (α, β) : F ] so [F (γ) : F ] = d is finite. By Theo-
rem 7.1.10, γ is the root of an irreducible polynomial of degree d in F [x]. Hence, γ is algebraic. 

1+ 2 √ √
Example 7.2.18. Consider the element γ = √ . This is an element in Q( 2, 3), which has
1+ 3
degree 4 over Q. Thus, γ is algebraic. For completeness, we can look for the minimal polynomial of
γ. We first have √ √ √ √
γ(1 + 3) = (1 + 2) =⇒ γ − 1 = 2 − γ 3.
Squaring both sides gives
√ √
γ 2 − 2γ + 1 = 2 − 2 6γ + 3γ 2 =⇒ 2 6γ = 2γ 2 + 2γ + 1.

After squaring both sides again, we deduce that γ is the root of


9 1 1
x4 + 2x3 − x + x + .
2 2 4
This is the unique monic irreducible polynomial mγ,Q (x). 4

Let K be a field and consider the set of subfields of K. Consider the relation “is an extension of
finite degree of” on the set of subfields of K. Theorem 7.2.9 proves that this relation is transitive.
Since the relation of algebraic extension also satisfies antisymmetry and reflexivity, we deduce that
it is a partial order on the subfields of K.
We can now mention a few√field √ extensions
√ √ that are algebraic and of infinite degree. For example,
it is possible to show that Q( 2, 3, 5, 7, . . .) is an algebraic extension of infinite degree over Q.
(See Exercise 7.2.19.) This also provides an example of a field extension that is not finitely generated
but still algebraic.
7.2. ALGEBRAIC EXTENSIONS 335

Another important example of an algebraic extension of infinite degree is the subfield of C of all
algebraic numbers, denoted Q, i.e., complex numbers that are roots of polynomials in Q[x]. The set
of algebraic numbers forms a field by virtue of Corollary 7.2.17. It is easy to√see that Q does not
have finite degree over Q since, for all positive integers, it contains the field Q( n 2) which has degree
n over Q. Thus, [Q : Q] is greater than every positive integer n so this degree is infinite.
An interesting property about algebraic numbers is that Q is a countable set. (See Exer-
cise 7.2.23.) To many who first encounter it, the result that Q is countable feels counterintuitive
since the set of reals is uncountable and every real number can be approximated to arbitrary preci-
sion by rational numbers. As we think of all the possibilities covered by algebraic numbers and how
few numbers we know for certain to be transcendental, it seems even more counterintuitive that Q
is a countable subset of the uncountable set C.

7.2.3 – The Lattice of Algebraic Extensions


The previous section established many properties about the degree of a finite field extension. We
already pointed out that algebraic extensions need not have finite degree. The goal of this section
is to study the relation of algebraic extensions regardless of whether the extensions are finite. The
main theorem (Theorem 7.2.24) summarizes the section by establishing that the relation of algebraic
extensions is a lattice.
Let L be an extension of a field F . Let Alg(L/F ) be the set of all subfields of L that are algebraic
extensions of F .

Proposition 7.2.19
Define the relation 4 on Alg(L/F ) by K1 4 K2 if K2 is an algebraic extension of K1 . Then
4 is a partial order on Alg(L/F ).

Proof. For all K ∈ Alg(L/F ), it is obvious that K is an algebraic extension of itself. Hence, the
relation 4 is reflexive. If K1 4 K2 and K2 4 K1 , then K1 ⊆ K2 and K2 ⊆ K1 so K1 = K2 . Hence,
4 is antisymmetric.
Suppose that K3 is an algebraic extension of K2 , which in turn is an algebraic extension of K1 .
Let α ∈ K3 . Since K3 is algebraic over K2 , α is the root of a minimal polynomial
mα,K2 (x) = cn xn + · · · + c1 x + c0 ∈ K2 [x].
Since K2 is algebraic over K1 , then each coefficient ci is algebraic over K1 . By Theorem 7.2.13, the
degree [K1 (c0 , c1 , . . . , cn ) : K1 ] divides (degK1 c0 )(degK1 c1 ) · · · (degK1 cn ). In particular, this field is
finite. Furthermore, by Theorem 7.2.9,
[K1 (α) : K1 ] = [K1 (α) : K1 (c0 , c1 , . . . , cn )][K1 (c0 , c1 , . . . , cn ) : K1 ] = n[K1 (c0 , c1 , . . . , cn ) : K1 ].
Hence, K1 (α) is a finite extension of K1 . Thus, K3 is an algebraic extension of K1 and 4 is
transitive. 
For any two subfields K1 and K2 of L that are algebraic extensions of F , the intersection K1 ∩K2
is again subfield of L that is an algebraic extension of F . It is the greatest lower bound between K1
and K2 with respect to the partial order of “algebraic extension of.”
Let L be an extension of a field F and let K1 , K2 be two subfields of L that are algebraic
extensions of F . It is easy to show that the intersection K1 ∩ K2 is a field extension of F . In general,
the union K1 ∪ K2 is not another field extension. In order to show that the partial order of algebraic
extensions as a least upper bound for any two algebraic extensions of F , we need to introduce the
composite of fields.

Definition 7.2.20
Let K1 and K2 be two subfields of any field E. Then the composite field K1 K2 is the
smallest subfield of E (by inclusion) that includes both K1 and K2 .
336 CHAPTER 7. FIELD EXTENSIONS

Proposition 7.2.21
Let K1 and K2 be two finite extensions of a field F , both contained in a field extension L.
Then [K1 K2 : F ] ≤ [K1 : F ][K2 : F ].

Proof. (Left as an exercise for the reader. See Exercise 7.2.13.) 


Note that if K is a field extension of F and α, β ∈ K, then the composite field F (α)F (β) =
F (α, β). However, the construct of a composite field can be far more general than the composite of
two simple field extensions.
It is possible to characterize the elements in the composite of two subfields of a field L.

Proposition 7.2.22
Let K1 and K2 be two subfields of a field L. Let γ ∈ K1 K2 . Then
α1 β1 + α2 β2 + · · · + αm βm
γ= . (7.2)
a1 b1 + a2 b2 + · · · + an bn
for some integers m and n and for some elements α1 , α2 , . . . , αm ∈ K1 , a1 , a2 , . . . , an ∈ K1 ,
β1 , β2 , . . . , βm ∈ K2 and b1 , b2 , . . . , bn ∈ K2 .

Proof. Let S be the set of all elements in L of the form (7.2), assuming the denominator is nonzero.
For all α ∈ K1 , we have α = (α · 1)/(1 · 1) ∈ S. Hence, K1 ⊆ S and similarly K2 ⊆ S.
Since K1 K2 contains K1 and K2 and is a field, S ⊆ K1 K2 . Performing distributivity on a
product of linear combinations
(α1 β1 + α2 β2 + · · · + αm βm )(a1 b1 + a2 b2 + · · · + an bn )
produces a linear combination (with possibly mn terms) of products of elements from K1 and K2 .
Consider the difference of two elements in S. By performing cross-multiplication and distributivity
on products of linear combinations, one recovers another expression of the form (7.2). Hence, by the
One-Step Subgroup Criterion, (S, +) is a subgroup of (L, +). Similarly, the division of two nonzero
elements in S is again an element in S. Thus, S is a subring of L, containing the identity and closed
under taking inverses. Thus, S is a subfield of L. Since K1 K2 is the smallest subfield of L containing
both K1 and K2 , then K1 K2 ⊆ S. Consequently, S = K1 K2 . 

Proposition 7.2.23
Let L be an extension of a field F and let K1 and K2 be two subfields of L that are algebraic
over F . Then K1 K2 is another algebraic extension of F .

Proof. Let γ ∈ K1 K2 . Then, by Proposition 7.2.22, γ is equal to an expression of the form (7.2).
However, by a repeated application of Corollary 7.2.17, since αi , βi , aj , bj with i = 1, 2, . . . , m and
j = 1, 2, . . . , n are algebraic, then γ is also algebraic. Thus, K1 K2 is an algebraic extension of F . 
We summarize the results of this section into a concise theorem.

Theorem 7.2.24
Let L be an extension of a field F . The relation “is an algebraic extension of” on the set of
algebraic extensions of F in L is a partial order. Furthermore, this partial order is a lattice
such that for any two algebraic extensions K1 and K2 of F in L, the least upper bound is
K1 K2 and the greatest lower bound is K1 ∩ K2 .

For any two subfields K1 and K2 of L that are algebraic extensions of F , the Hasse diagram
of the lattice Alg(L/F ) includes the following subdiagram illustrating the least upper bound K1 K2
and the greatest lower bound K1 ∩ K2 .
7.2. ALGEBRAIC EXTENSIONS 337

K1 K2

K1 K2

K1 ∩ K2

7.2.4 – Transcendental Numbers (Optional)


The set of transcendental numbers in C is precisely C − Q. At first pass, it is hard to imagine
numbers that are not solutions to an algebraic equation.
For a simple example of a transcendental extension, consider the field of rational expressions
F (x) in the variable x over the field F . In this notation, it is understood that the variable is not
the root of any polynomial equation. Hence, x ∈ F (x) is transcendental over F . Though easy to
understand, this example feels artificial since x is not a number and Q(x) is not a subfield of C.
In general, it is hard to show that a given complex number is transcendental. In 1882, Lindemann
proved that π is transcendental [60]. This is tantamount to saying that [Q(π) : Q] is infinite. Since
π ∈ R, then Q(π) is a subfield of R. By Theorem 7.2.9, we deduce that [R : Q] is infinite as well. The
extension of Q(π)/Q is an example of an extension that is finitely generated but of infinite degree.
Among the list of 23 Hilbert Problems, which David Hilbert posed to the mathematical commu-
nity in 1900, the seventh problem asked: If a is algebraic with a 6= 0, 1 and if b is irrational, then is ab
transcendental? The Gelfond-Schneider Theorem, proved independently by√ the namesakes in 1934,
answered Hilbert’s seventh problem in the affirmative [29]. For example, 7 3 is transcendental.
The following theorem, Liouville’s Theorem on Diophantine approximation, gives a strategy to
find some transcendental numbers.

Theorem 7.2.25 (Liouville’s Theorem)


Let α be an algebraic number (over Q) of degree n > 1. There exists a real number A > 0
such that for all integers p and q > 0,

p A
α− > n.
q q

Proof. Let mα (x) be the minimal polynomial of α over Q. Let c be the least common multiple of the
denominators of the coefficients of mα (x) and set f (x) = cmα (x). Then f (x) ∈ Z[x] is a polynomial
of degree n, with α as a root, with integer coefficients, and such that the greatest common divisor
of the coefficients is 1.
Let δ be any positive real number less than the distance between α and any other root, namely
0 < δ < min(|α − α1 |, |α − α2 |, . . . , |α − αk |),
where α1 , α2 , . . . , αk are the roots of f (x) that are different from α. Let M be the maximal value of
|f 0 (x)| over the interval [α − δ, α + δ] and let A be a real number with 0 < A < min(δ, 1/M ).
Let p/q be an arbitrary rational number. We consider two cases.
A
Case 1. Suppose p/q ∈
/ [α − δ, α + δ]. Then |α − p/q| > δ > A > qn .

Case 2. Now suppose that p/q ∈ [α − δ, α + δ]. By the mean value theorem, there exists an
c ∈ [p/q, α], such that
f (α) − f (p/q) f (p/q)
f 0 (c) = =− ,
α − p/q α − p/q
338 CHAPTER 7. FIELD EXTENSIONS

which implies that  


p
p fq
α− = .
q |f 0 (c)|
Since |f 0 (c)| ≤ M , we have 1/|f 0 (c)| ≥ 1/M . Hence,

p |f (p/q)|
α− ≥ > A|f (p/q)|.
q M

By the definition of δ, the only root of f (x) ∈ [α − δ, α + δ] is α so in particular f (p/q) 6= 0.


Since f is of degree n and since f (x) ∈ Z[x], the polynomial evaluated on the fraction satisfies
|f (p/q)| ≥ 1/q n . Hence,
p A
α− > A|f (p/q)| > n ,
q q
and the theorem follows. 

Liouville’s Theorem offers a strategy to prove that some numbers are transcendental by finding
an irrational number α that violates the conclusion of the theorem for all positive n. The following
proposition constructs a specific family of transcendental numbers using this strategy.

Corollary 7.2.26
Let b be a positive integer greater than 2 and let {ak }k≥1 be a sequence whose values are
in {0, 1, 2, . . . , b − 1}. Then the series

X ak
bk!
k=1

converges to a transcendental number.

Proof. (Left as an exercise for the reader. See Exercise 7.2.28.) 

Exercises for Section 7.2


√ √
1. Find the minimal polynomial of 10 + 11 over Q.
√ √
2. Find the minimal polynomial of 5 + 3 2 over Q.

2+√7

3. Find the minimal polynomial of 1+ 2
over Q( 2).
√ √ 2
4. Determine the degree of 1 − 2 3 7 + 10 3 7 over Q.
5. Consider the field F = F2 [x]/(x3 + x + 1) with θ an element in F such that θ3 + θ + 1 = 0. Find the
minimal polynomial of θ2 + 1 over F2 .
6. Consider the field F = F2 [x]/(x3 + x + 1) with θ an element in F such that θ3 + θ + 1 = 0. In F [x], the
polynomial x+θ divides x3 +x+1. Find the polynomial q(x) ∈ F [x] such that x3 +x+1 = (x+θ)q(x).
7. Consider the field F = F5 [x]/(x3 +2x+4) with θ an element in F such that θ3 +2θ +4 = 0. In F [x], the
polynomial x−θ divides x3 +2x+4. Find the polynomial q(x) ∈ F [x] such that x3 +2x+4 = (x−θ)q(x).
8. Let K/F be a field extension and let α ∈ K. Prove that a polynomial p(x) ∈ F [x] satisfies p(α) = 0
if and only if mα,F (x) divides p(x) in F [x].
9. (Palindromic polynomials) A palindromic polynomial in Q[x] is a polynomial

p(x) = an xn + · · · + a1 x + a0 ∈ Q[x]

such that ai = an−i for all i = 0, 1, . . . , n.


 
1
(a) Let q(x) ∈ Q[x] be a polynomial of degree n. Prove that xn q x + is a palindromic polyno-
x
mial of degree 2n.
7.2. ALGEBRAIC EXTENSIONS 339
 
1
(b) Show that every palindromic polynomial p(x) ∈ Q[x] of even degree can be written xn q x +
x
for some q(x) ∈ Q[x].
(c) Use this to solve the equation x4 − 3x3 − 2x2 − 3x + 1 = 0.
10. (Even Quartic Polynomials)
p √ p √
(a) Consider a complex number of the form a + b c, where a, b, c ∈ Q. Prove that a + b c is
the root of an even (all powers are even) quartic polynomial.
p √
(b) Prove that the roots of of any even quartic polynomial are of the form ± a ± b c with a, b, c ∈ Q.
p √
(c) Let α = a + b c. Show that all the roots of mα,Q (x) are in Q(α) if and only if a2 − cb2 ∈ Q.
11. Let L be an algebraic extensions of a field F and let K1 , K2 be two subfields of L containing F . Prove
that K1 ∩ K2 is a field extension of F and that [K1 ∩ K2 : F ] divides gcd([K1 : F ], [K2 : F ]).
√ √ √
12. Let F = Q( r1 , r2 , . . . , rn ) where ri ∈ Q.
(a) Prove that [F : Q] = 2k for some nonnegative k.

(b) Deduce that 3 7 ∈ / F.
13. Prove Proposition 7.2.21.
14. Suppose that L/F is a field extension of degree p, with p prime. Prove that any subfield K of L
containing F is either L or F .
15. Let F be a field and consider a simple extension F (α) such that [F (α) : F ] is odd. Prove that
F (α) = F (α2 ).
√ √ √
16. Prove that the composite field of Q( 2) and Q( 3 3) is Q( 6 72).
17. Let K/F be a field extension and let α, β ∈ K with degrees n1 and n2 respectively over F . Show that
if gcd(n1 , n2 ) = 1, then [F (α, β) : F ] = n1 n2 .

18. Prove that [Q(x, 1 − x2 ) : Q(x)] = 2. [Hint: Use the fact that Q[x] is a UFD.]
19. Let pn be the nth prime number (so p1 = 2, p2 = 3, p3 = 5, and so on).
√ √ √ √
(a) Prove that pn ∈ / Q( p1 , p2 , . . . , pn−1 ). [Hint: Use a proof by induction on n. Use F−1 =
√ √ √ √ √ √
Q( p1 , p2 , . . . , pn−1 ) and F0 = Q( p1 , p2 , . . . , pn ).]
√ √ √
(b) Deduce that [Q( p1 , p2 , . . . , pn ) : Q] = 2n for all positive integers n.

(c) Deduce that Q( p | p ≥ 2 is a prime) is an algebraic extension of Q of infinite degree.
[This Exercise is motivated by [54].]
√ √ √ √
20. Show that 3 2 ∈ / Q( 3 3). Prove also that [Q( 3 2, 3 3) : Q] = 9.
√ √
21. Let S = { n 2 | n ∈ Z with n ≥ 2}. Prove that 3 ∈ / Q[S].
22. Prove that cos(kπ/n) is algebraic for all positive integers k and n.
23. In this exercise, we prove that the set of algebraic numbers Q is countable. Recall that Q is countable
and review Exercise 1.2.7.
(a) Prove that A1 , A2 , . . . , An are n countable sets, then the Cartesian product A1 × A2 × · · · × An
is a countable set.
(b) Prove that the set of polynomials Q[x] is a countable set.
(c) Since algebraic numbers consist of all the roots polynomials in Q[x], deduce with a proper proof
that Q is countable.
24. Prove that the field of fractions of the algebraic integers (introduced in Section 6.7) is the field of
algebraic numbers Q.
25. Let F be a field and let α be an algebraic element over F for which we may not know the minimal
polynomial mα,F (x). Consider the linear transformation fα : F (α) → F (α) defined by fα (x) = αx in
Exercise 7.1.16.
(a) Prove that α is an eigenvalue of fα .
(b) Deduce that α is a root of the characteristic polynomial for the linear transformation fα .
√ √ 2
26. Use the result of Exercise 7.2.25 to find the minimal polynomial of β = 1 + 3 7 − 3 7 over Q. [Hint:
√ √ √ 2
View β as an element in Q( 3 7), with respect to the basis {1, 3 7, 3 7 }.]
√ √
27. Use the result of Exercise 7.2.25 to find the minimal polynomial of β = 2 − 3 2 + 4 2 over Q. [Hint:
√ √ √ √ 3
View β as an element in Q( 4 2), with respect to the basis {1, 4 2, 2, 4 2 }.]
28. Prove Theorem 7.2.26.
340 CHAPTER 7. FIELD EXTENSIONS

7.3
Solving Cubic and Quartic Equations
As early as middle school and certainly in high school, students encounter the quadratic formula.
The solutions to the generic quadratic equation ax2 + bx + c = 0 with a 6= 0, are

−b ± b2 − 4ac
x= .
2a
The original interest in solving quadratic equations came from applications to geometry. There is
historical evidence that as early as 400 B.C.E. Babylonian scholars knew the strategy of completing
the square to solve a quadratic equation. Solutions to the quadratic equation appeared in a variety
of forms throughout history.
Subsequent generations of scholars attempted to find formulas for the roots of equations of higher
degree. One can approach the problem of finding solutions in a variety of ways: radical expressions,
trigonometric sums, hypergeometric functions, continued fractions, etc. However, historically, by a
formula for the roots of a polynomial equation, people understood an expression in terms of radicals of
algebraic combinations of the coefficients of the generic polynomial. For centuries, mathematicians
only made progress on particular cases. Then, in 1545, Cardano published formula solutions for
both the cubic and the quartic equation in Ars Magna. Though Cardano often receives the credit,
Tartaglia (a colleague) and Ferrari (a student) contributed significantly.
We propose to look at some formulas for the solutions to the cubic and the quartic equation and
discuss the merits of the approach.
Throughout this section, we assume that the polynomials are in R[x] but the strategies can be
generalized to C[x].

7.3.1 – The Cubic Equation


3
√ at the simple case x = a,
Before we develop a method to solve the cubic equation, we should look
where a ∈ Q. Obviously, this equation has one real root, namely x = 3 a. We then have
√ √ √ 
x3 − a = 0 ⇐⇒ (x − 3 a) x2 + 3 ax + ( 3 a)2 = 0. (7.3)
√ √ √
Solving the quadratic factor, we find that the three roots of this cubic are x = 3
a, 3
aω, 3
aω 2 ,
where ω is the complex number √
−1 + i 3
ω= .
2
The complex number ω is called a (primitive) third root of unity since it satisfies ω 3 = 1 and since
ω does not solve the equation xn − 1 for any n less than 3. Note that ω 2 = ω −1 = ω̄ = −1 − ω.
Without loss of generality, we can suppose that the cubic equation is already monic, i.e., one has
already divided the polynomial equation by the leading coefficient. Therefore, we propose to solve

x3 + ax2 + bx + c = 0. (7.4)

As a first step, we change the variables by setting x = y − a3 . This shift of variables has a similar
effect as completing the square in solving the quadratic formula. We have
 a 3  a 2 3a2 a3 2a2 a3
y− +a y− = y 3 − ay 2 + 2 + + ay 2 − y+ .
3 3 3 27 3 9
This change of variables leads to an equation in y equivalent to the original equation but that does
not involve a quadratic term. We get
y 3 + py + q = 0 (7.5)
7.3. SOLVING CUBIC AND QUARTIC EQUATIONS 341

where
a2 2a3 − 9ab
p=b− and q =c+ .
3 27
Cardano’s strategy introduces two variables u and v, related to each other by
(
u+v =y
3uv + p = 0.

p
In other words, u and v are the two roots to the quadratic equation t2 − yt − 3 = 0. Plugging
y = u + v into (7.5) gives

u3 + 3uv(u + v) + v 3 + p(u + v) + q = 0
⇐⇒ u3 + v 3 + (3uv + p)(u + v) + q = 0
⇐⇒ u3 + v 3 + q = 0.

Multiplying through by u3 gives u6 + u3 v 3 + qu3 = 0 but since 3uv + p = 0, we get

p3
u6 + qu3 − = 0.
27
This becomes a quadratic equation in u3 with the two solutions of
r
3 q q2 p3
u =− ± + . (7.6)
2 4 27
By (7.3), the possible values of u are
s r
i 3 q q2 p3
u=ω − ± + for i = 0, 1, 2. (7.7)
2 4 27

Note that u3 and v 3 solve the system of equations


(
u3 + v 3 = q
27u3 v 3 = −p3 ,

from which we see that u3 and v 3 are the two distinct roots of (7.6). We give u the + and v the −
sign. However, the identity 3uv = −p leads to precisely three combinations of the possible powers
on ω. The three roots of the cubic equation (7.5) are
s r s r
3 q q2 p3 3 q q2 p3
y1 = u0 + v0 = − + + + − − + ,
2 4 27 2 4 27
√ ! √ !
s r s r
−1 + i 3 3 q q 2 p3 −1 − i 3 3 q q2 p3
2
y2 = ωu0 + ω v0 = − + + + − − + ,
2 2 4 27 2 2 4 27
√ ! √ !
s r s r
2 −1 − i 3 3 q q2 p3 −1 + i 3 3 q q2 p3
y3 = ω u0 + ωv0 = − + + + − − + .
2 2 4 27 2 2 4 27

The three roots of the original cubic equation (7.4) are given by xi = yi − a3 .
The square root that appears in formula (7.7) indicates that there may be a bifurcation in
behavior for the solutions for whether the expression under the square root is positive or negative.
Indeed, the expression under the square root plays a similar role for the cubic equation as b2 − 4ac
plays in the quadratic formula.
342 CHAPTER 7. FIELD EXTENSIONS

Definition 7.3.1
When a cubic equation is written in the form y 3 + py + q = 0, the expression

∆ = −27q 2 − 4p3

is called the discriminant of the cubic polynomial.

The reader might wonder why we define the discriminant as above rather than the quantity
q 2 /4 + p3 /27, which arose naturally from Cardano’s method. The concept of discriminant has a
more general definition (see Definition 11.5.12) so we have stated the definition of the discriminant
of a cubic to conform to the more general definition.

Theorem 7.3.2
Consider the cubic equation y 3 + py + q = 0 with p, q ∈ R. Then
• if ∆ > 0, then the cubic equation has 3 real roots;

• if ∆ = 0, then the cubic equation has a double root;


• if ∆ < 0, then the cubic equation has 1 real root and 2 complex roots that are
conjugate to each other.

Proof. Suppose that ∆ > 0. Then


s r s r s r
3 q q2 p3 3 q ∆ 3 q ∆
− ± + = − + − = − + −
2 4 27 2 4 · 27 2 108
is a complex number with a nontrivial imaginary component. (In order to make sense of the cube
root for complex numbers, we can assume that we choose the complex number with an angle θ such
that − π3 ≤ θ ≤ π3 .) Furthermore, its complex conjugate is precisely
s r
3 q ∆
− − − .
2 108
Thus, y1 = u0 + v0 = 2<(u0 ), twice the real part of u0 .
For the other roots, ωu0 corresponds to the rotation of u0 around the origin by an angle of 2π/3
and we notice that ω 2 v0 is the complex conjugate of ωu0 . Hence, y2 = 2<(ωu0 ). Similarly, y3 =
2<(ω 2 u0 ). The values u0 , ωu0 , and ω 2 u0 are the vertices of a equilateral triangle with orthocenter
at the origin. Since the point u0 with polar angle satisfying −π/3 ≤ θ ≤ π/3 is not on the x-axis,
then the other two points are not reflected across the x-axis. Hence, <(u0 ), <(ωu0 ), and <(ω 2 u0 )
are all distinct real numbers. Hence, if ∆ > 0, the cubic equation has 3 real roots.
Suppose that ∆ = 0. Then u0 = v0 , and they are both real. We obtain the following roots
y1 = 2u0 , y2 = ωu0 + ω 2 v0 = (ω + ω 2 )u0 = −u0 = y3 .
If u0 = 0, then we have a triple root at yi = 0 but otherwise, we have two distinct real roots with
one double root. p
Suppose that ∆ < 0. Then −∆/108 is a real number and both
s r s r
3 q ∆ 3 q ∆
u0 = − + − and v0 = − − −
2 108 2 108
are distinct positive real numbers. Then y1 ∈ R whereas
√ √
1 3 1 3
y2 = − (u0 + v0 ) + i (u0 − v0 ) and y3 = − (u0 + v0 ) − i (u0 − v0 )
2 2 2 2
are two complex roots that are conjugate to each other. 
7.3. SOLVING CUBIC AND QUARTIC EQUATIONS 343

Theorem 7.3.2 assumes that p and q are real numbers. The solutions for the cubic equation also
are correct when p and q are complex numbers. In this latter case, in the calculation for u0 , any
value of the three possible values of the cube root of a complex will recover all three distinct roots.
Example 7.3.3. Consider the equation x3 − 3x − 1 = 0. Cardano’s solution for the cubic involves
r √
3 q q2 p3 1 3
u =− ± + = ± i.
2 4 27 2 2
Though Cardano did not have complex numbers at his disposal, we can do u3 = cos( π3 ) + i sin( π3 ) =
eiπ/3 . Thus, u0 = eiπ/9 and v0 = e−iπ/9 . The roots of the equation are
x1 = eiπ/9 + e−iπ/9 = 2 cos(π/9),
x2 = ei2π/3 eiπ/9 + e−i2π/3 e−iπ/9 = 2 cos(7π/9),
x3 = e−i2π/3 eiπ/9 + ei2π/3 e−iπ/9 = 2 cos(5π/9). 4
The proof of Theorem 7.3.2 indicates that Cardano’s formula is not particularly easy to deal
with. If the cubic has three real roots then ∆ > 0, so q 2 /4 + p3 /27 must be negative, which makes
r
q2 p3
+
4 27
an imaginary number. It is precisely this case in which the solution to the cubic has three real roots.
In particular, in order to find these real roots, we must pass into the complex numbers.
Example 7.3.4. Consider the cubic equation x3 − 15x − 20 = 0. We have ∆ = 2700 so the equation
should have three real roots. Also,
√ √
u0 = 3 10 + 5i and v0 = 3 10 − 5i.
Writing these complex numbers in polar form (see Appendix A.1) gives
√ √ √
10 + 5i = 125ei arctan(1/2) =⇒ 3
10 + 5i = 5ei arctan(1/2)/3 ,
√ √ √
10 − 5i = 125e−i arctan(1/2) =⇒ 3
10 − 5i = 5e−i arctan(1/2)/3 .
In particular, one of the solutions is
√ √ √
  
3 3 1 1
10 + 5i + 10 − 5i = 2 5 cos arctan .
3 2
In a similar manner, we can find trigonometric expressions for the other two roots. 4
Example 7.3.5. Consider the polynomial equation x3 + 6x2 + 18x + 18 = 0. Setting x = y − 2, we
get the equation
y 3 + 6y − 2 = 0.
The discriminant is
∆ = −4p3 − 27q 2 = −4 × 216 − 27 × 4 = −972,
so there will be two complex roots and one real root. We calculate
q
3
p √ √
3
q
3
p √ √
3
u0 = −q/2 + −∆/108 = 3 1 + 3 = 4, v0 = −q/2 − −∆/108 = 3 1 − 3 = − 2,
so the three roots of the original equation are

3

3
x1 = −2 + 4 − 2,
√ ! √ ! √
−1 + i 3 √ 3 −1 − i 3 √ 3 1 √ 3

3 3 √
3

3
x2 = −2 + 4− 2 = −2 + (− 4 + 2) + i ( 4 + 2),
2 2 2 2
√ ! √ ! √
−1 − i 3 √ 3 −1 + i 3 √ 3 1 √ 3

3 3 √
3

3
x3 = −2 + 4− 2 = −2 + (− 4 + 2) − i ( 4 + 2). 4
2 2 2 2
344 CHAPTER 7. FIELD EXTENSIONS

7.3.2 – The Quartic Equation


Consider the generic quartic equation

x4 + ax3 + bx2 + cx + d = 0, (7.8)

where we can assume the polynomial is monic after dividing by the leading coefficient. As with the
cubic equation, the change of variables x = y − a4 eliminates the cubic term and changes (7.8) into

y 4 + py 2 + qy + r = 0, (7.9)

for p, q, and r depending on a, b, c, and d. We propose to solve (7.9). We follow the strategy
introduced by Ferrari in which we rewrite (7.9) as

y 4 = −py 2 − qy − r (7.10)

and add an expression that simultaneously makes both sides into perfect squares. Because y 4 is
alone on one side, we are limited to what we can add to create a perfect square. We choose to add
the quantity
t2
ty 2 + (7.11)
4
so that 2
t2
  
4 2 2 t
y + ty + = y + .
4 2
The trick to this method is to choose a value of t that makes the right-hand side into a perfect
square as well. Adding (7.11) on the right-hand side of (7.10) gives
 2 
t
(t − p)y 2 − qy + −r . (7.12)
4

Now a quadratic expression Ax2 +Bx+C is the square of a linear expression if and only if B 2 −4AC =
0. Hence, for (7.12) to be a perfect square, t must satisfy
 2 
t
q 2 − 4(t − p) − r = 0 ⇐⇒ t3 − pt2 − 4rt + (4rp − q 2 ) = 0.
4

This is called the resolvent equation for the quartic equation (7.9). We can solve for t using the
solution method for the cubic, and in fact any of the three solutions work for the rest of the algorithm
to finish solving the quartic. So when t solves the resolvent equation, (7.10) becomes
 2
t
y2 + = (my + n)2
2

for some m and n that depend on p, q, and r. Then we have


 2   
t t t
y2 + − (my + n)2 = 0 =⇒ y 2 − my + −n y 2 + my + + n = 0.
2 2 2

So we now are reduced to solving two quadratic equations.

Example 7.3.6. Consider the quartic equation y 4 + y 2 + 6y + 1 = 0. The resolvent equation is


t3 − t2 − 4t − 32 = 0. We could solve this via Cardano’s method. However, by trying the rational
roots that are possible by virtue of the Rational Root Theorem, we find that t = 4 is a solution to
the resolvent equation. So following Ferrari’s method, we add 4y 2 + 4 to the equation

y 4 = −y 2 − 6y − 1
7.3. SOLVING CUBIC AND QUARTIC EQUATIONS 345

to get √ √
y 4 + 4y 2 + 4 = 3y 2 − 6y + 3 =⇒ (y 2 + 2)2 = ( 3y − 3)2 .
Hence,
√ √ √ √ √ √
(y 2 + 2)2 − ( 3y − 3)2 = 0 =⇒ (y 2 + 3y + 2 − 3)(y 2 − 3y + 2 + 3) = 0.

Now applying the quadratic formula to two separate quadratic polynomials we get the four roots of
√ √ √ √ √ √
   q 
1 1
q
y 2 + 3y + 2 − 3 = 0 =⇒ y = − 3 ± 3 − 4(2 − 3) = − 3± 4 3−5 ,
2 2
√ √ √ √ 1 √ √
   q 
1
q
y 2 − 3y + 2 + 3 = 0 =⇒ y = 3 ± 3 − 4(2 + 3) = 3 ± −4 3 − 5 .
2 2

The first two roots are real and the last two roots are complex. 4

Exercises for Section 7.3

For Exercises 7.3.1 to 7.3.11 solve the equation using Cardano-Ferrari methods. For the solutions to a cubic
equation, if all the roots are real, then write the solutions without reference to complex numbers.
1. x3 − 15x + 10 = 0
2. x3 + 6x − 2 = 0
3. x3 − 9x + 10 = 0
4. x3 − 6x + 4 = 0
5. x3 − 12x + 8 = 0
6. x3 − 12x + 16 = 0
7. x3 + 3x2 + 12x + 4 = 0
8. x3 − 9x2 + 24x − 16 = 0
9. x4 + 4x2 + 12x + 7 = 0
10. x4 + 4x2 − 3x + 1 = 0
11. x4 − 4x3 + 4x2 − 8x + 4 = 0
12. Consider the polynomial p(x) = x3 − 6x2 + 11x − 6.

(a) Solve the equation via Cardano’s method.


(b) Find the rational roots of this polynomial by the Rational Root Theorem.
(c) Decide which rational root corresponds to which solution via Cardano’s method.

13. Let n be a real number and consider the polynomial

p(x) = x3 − (3n + 3)x2 + (3n2 + 6n + 2)x − (n3 + 3n2 + 2n).

Apply Cardano’s method to solve this equation. After finding the roots, explain why it was so easy
to solve.
14. Prove that a palindromic polynomial of odd degree has −1 as a root. Use this and Exercise 7.2.9 to
find all the roots to x5 + 2x4 + 3x3 + 3x2 + 2x + 1 = 0.
15. Consider the polynomial p(x) = x6 + 4x4 + 4x2 + 1. Use either the strategy provided by Exercise 7.2.9
to find the roots or use Cardano’s method to solve the equation in x2 to find all the roots. Which do
you think is easier?
346 CHAPTER 7. FIELD EXTENSIONS

7.4
Constructible Numbers
7.4.1 – Euclidean Geometry
In classical geometry, one of the most common types of exercises requests the student to construct
certain geometrical figures. Practical problems in surveying and architecture were likely the original
purpose for geometric constructions. All ancient civilizations possessed some geometric knowledge
but each expressed their scholarship in different ways.
Such construction problems ask for a method to create a specific configuration (circle, triangle,
line segment, point, etc.), with a specified property, and using specified tools (compass, straightedge,
ruler, and so on) at one’s disposal. The mathematics in some cultures would outline a recipe involving
specific numbers and then conclude, “and by this procedure, we have constructed a such and such
with such and such properties.” The numbers used needed to be generic enough that the validity of
the construction did not rely on any particular properties of those numbers.
Greek mathematics, exemplified by Euclid’s Elements, overlaid the practical geometric problems
with a philosophical approach. Instead of providing a recipe for a geometric construction (and merely
claim that it works), they defined their terms and common notions, and then, starting from a small
list of five postulates, proved propositions about geometric objects using logic. Some geometry
propositions in Euclid’s Elements establish certain measure relationships while others state “it is
possible to construct...” a specific configuration. For example, Proposition 12 in Book I states that
it is possible to draw a straight line perpendicular to a given infinite straight line L through a given
point not on L. The proof provides the construction using a compass and a straightedge and also
establishes through logic that the construction produces the described configuration.
It is particularly interesting that propositions in the Elements never refer to specific distance
values or angle values. (The Elements do refer to right angles and rational multiples thereof but no
angle is ever measured in degrees, radians, or any other unit.) Effectively, the propositions are true
regardless of any units used. Perhaps because of this feature, the geometric constructions assumed
the use of a straightedge, a ruler without distance markings.
Solutions to many such construction problems became jewels in the crown of Greek mathematics
and served as examples for the purity of proofs for many generations of mathematics education. The
ability to construct a circle inscribed in a triangle or the problem of constructing a regular pentagon
are interesting, though still elementary, examples of these achievements.
A few problems stymied mathematicians for centuries and even millennia. For example, Propo-
sition 9 in Book I of the Elements gives a construction of how to bisect a given angle α. More
specifically, given two lines that meet at a point P and span an angle α between then, construct a
line through P that makes equal angles with the other two lines. However, the problem of trisecting
the angle (constructing a line that cuts an angle by a third) remained an open problem for many
centuries. A few other problems that remained open for just as long included: constructing a reg-
ular heptagon; (“Squaring the circle”) construct a square with the same area as a given circle; and
(“Doubling the cube”) given a line segment a, construct a line segment b such that the cube with
side b has twice the volume as the cube with side a.
To the surprise of many, a large number of these open problems in geometry were either resolved
or proved impossible using field theory.

7.4.2 – Constructible Numbers


A first step to bringing algebra and geometry closer is to put a numerical value of the types of
segments that can be created via a straightedge and compass construction. By labeling one preferred
line segment as a reference length, we still do not need to refer to any particular units of measure.
7.4. CONSTRUCTIBLE NUMBERS 347

Definition 7.4.1
The set of constructible numbers is the set of real numbers a ∈ R such that given a segment
OR, it is possible to construct with a straightedge and compass a segment OA such that
as distances OA = |a| · OR.

In this section, we will denote the set of constructible numbers by C.


Recall that in a Euclidean construction, it is only possible to draw a line (with a straightedge)
when we have two distinct points. Also, it is only possible to draw a circle (with a compass) with
center O and with radius OA where O and A are points already obtained in the construction or
specified in the hypotheses. In particular, it is not allowed in a construction to pick up the compass
and retain the radius. Since a Euclidean construction problem only uses a straightedge and a
compass, then the points that arise in a construction come from the intersection of two lines, a line
and a circle, or two circles. In Definition 7.4.1, the only geometric objects specified in the hypothesis
are two distinct points O and R.
By Proposition 2 in Book I of Elements, it suffices to construct any segment CD of length |a|·OR
to satisfy the requirement of Definition 7.4.1.

Proposition 7.4.2
Let a and b be any nonnegative constructible numbers. Then
a √
a ± b, ab, (if b 6= 0), a
b
are also constructible.

Proof. Fix two points O and R in the plane and let L be the line through O and R. Let A and B
be two points on the line L with lengths OA = aOR and OB = bOR and such that A and B are
on the same side of L from O as R is. Construct the circle Γ of center O and radius OA. The circle
Γ intersects L in two points, one of which is A. Call A0 the other point. Since O is between A0 and
B, then for distances
A0 B = A0 O + OB = (a + b)OR.
Hence, a + b ∈ C. Suppose without loss of generality that A is between O and B. Then
b = OB = OA + AB =⇒ AB = (b − a)OR.
Hence, b − a ∈ C.

L
A0 O R A B

Next, we prove that if a, b ∈ C, then ab ∈ C. Let A and B be points on the line L with lengths
OA = aOR and OB = bOR such that A and B are on the same side of L from O as R. Construct
(via Proposition 11 of Book I in Elements) the line L0 that is perpendicular to L and that goes
through O. Construct the circle Γ of center O and radius OR. It intersects L0 in two points. Pick
one of these intersections and call it R0 . Construct also the circle Γ0 of center O and radius OB. It
intersects L0 in two points. Call B 0 the point that is on the same side of L as R0 .
Construct the line L2 through R0 and A. Construct (via Proposition 31 of Book I in Elements)
the line L3 parallel to L2 going through B 0 . Since L2 intersects L (in A) and L3 is parallel to L2 ,
then L3 intersects L in a point we call C.
348 CHAPTER 7. FIELD EXTENSIONS

L0
B0

b
R0
1
L C
O R A B

a L2 L3
ab

By Thales’ Theorem,

OC OB 0 OC bOR
= =⇒ = =⇒ OC = ab · OR.
OA OR0 aOR OR
Hence, ab ∈ C.
The proof that C is closed under division is similar. We suppose that we have already constructed
the points O, R, R0 , A, B and B 0 , and the lines L and L0 . Now construct the line L2 through A and
B 0 . Construct also (via Proposition 31 of Book I in Elements) the line L3 parallel to L2 going
through R0 . L3 intersects L in a point that we call C.

L0
B0

b
R0
1
L
O C R A B

a
a/b

By Thales’ Theorem,

OA OB 0 a · OR · OR a
= 0
=⇒ OC = = · OR.
OC OR b · OR b
Hence, a/b is a constructible number.
√ ←→
Finally, we prove that a ∈ C for all positive a ∈ C. Construct a point A on the line L = OR
such that AO = a · OR and O is between A and R. In particular, AR = (a + 1) · OR. Construct
(via Proposition 10 in Book I of the Elements) the midpoint M of the segment AR. Construct the
circle Γ of center M and radius M A. Construct (via Proposition 11 in Book I of the Elements) the
line L0 that is perpendicular to L and passes through R. The√line L0 intersects Γ in two points. Call
one of them P . We claim that the distance M P is equal to a · OR.
7.4. CONSTRUCTIBLE NUMBERS 349

L0

P

a
L
M
A O R

a 1

a+1

The radius M P is M P = M A = 2 · OR and we also have

a−1
   
a+1
· OR − OR = · OR.
2 2

Now M OP is a right triangle with ∠M OP as the right angle. Hence, by Pythagoras,


s 2 2
a−1

a+1
OP = · OR2 − · OR2
2 2
1 p 1 √ √
= OR (a + 1)2 − (a − 1)2 = OR 4a = a · OR.
2 2

Thus, a is a constructible number. 

As an example of a construction problem, we provide a compass and straightedge construction


for a regular pentagon.

Example 7.4.3. The angle α = 2π 5 satisfies cos 3α = cos(2π − 3α) = cos 2α. Using addition
formulas, we find that cos 3α = 4 cos3 α − 3 cos α and cos 2α = 2 cos2 α − 1. Thus, cos α solves the
equation
4x3 − 2x2 − 3x + 1 = 0 ⇐⇒ (x − 1)(4x2 + 2x − 1) = 0.

Since cos(α) 6= 1, then cos(α) must be a root of 4x2 + 2x − 1. Using the quadratic formula and
reasoning that cos(2π/5) > 0, we deduce that

 2π  −1 + 5
cos = .
5 4

Having a value for cos(2π/5) suggests the following construction of the regular pentagon.

• Pick a center O and radius OR.

• Construct a circle Γ1 of center O and radius OR.


←→
• Let L be the line OR.

• Let Q be an intersection of L with Γ1 so that RQ is a diameter of Γ1 .

• Construct the line L0 perpendicular to L at O and let R0 and Q0 be the intersection points of
L0 with Γ1 .
350 CHAPTER 7. FIELD EXTENSIONS

R0
Γ1

L
Q O R

Q0
L0

• Construct (via Proposition 10 in Book I of the Elements) the midpoint M of OQ.

• Construct the circle Γ2 of center M and radius M Q0 .

• Γ2 intersects L in two points. Call D the point between M and R.

Γ2
R0
OM = 12 OR
Γ1

s 
 2
1 2
5
L MQ =  + 1 OR =
 OR
Q M O D R 2 2
√ !
5 1
OD = − OR
2 2
Q0
L0

• Construct the midpoint E of the segment OD.

• Construct the line L00 that is perpendicular to L through the point E.

• L intersects Γ1 in two points. Call one of the A1 .

L00
Γ2
R0
A1
Γ1

5−1
 
OE 2π
L = = cos
Q M O E D R OA1 4 5

Q0
L0
7.4. CONSTRUCTIBLE NUMBERS 351

5−1
 
OE 2π 2π
• Since = = cos , then ∠EOA1 = 5 . Consequently, the segment RA1 is
OA1 4 5
one edge of a regular pentagon.

• Construct a circle of center A1 and radius A1 R. This circle intersects Γ in two points: R and
another point we call A2 . Note that A1 A2 = A1 R.

• Construct a circle of center A2 and radius A2 A1 . This circle intersects Γ in two points: A1
and another point we call A3 . Note that A2 A3 = A2 A1 .

• Construct a circle of center A3 and radius A3 A2 . This circle intersects Γ in two points: A2
and another point we call A4 . Note that A3 A4 = A3 A2 .

• The polygon RA1 A2 A3 A4 is a regular pentagon.

L00
A1
A2
Γ1

5−1
 
OE 2π
L = = cos
O E R OA1 4 5

A3
A4
L0 4

Proposition 7.4.2 is already quite interesting. Since 1 ∈ C, then using addition, subtraction and
division repeatedly, we deduce that Q ⊆ C. Since C is closed under taking square roots of positive
numbers, we also can construct numbers like
r
√ √ √
q q
1
3, 10 − 2 3, or 2 + 3 + 5.
7
0
√ the set C defined
Consider recursively by 1 ∈ C 0 and for all positive a, b ∈ C 0 , then a ± b, ab, a/b,
and a are also in C . Proposition 7.4.2 establishes that C 0 ⊆ C.
0

However, the proposition falls short of the whole story. It does not yet determine whether C 0 ⊆ C
is a strict subset inclusion or whether C 0 = C. In other words, can Euclidean constructions lead to
other constructible numbers than those in C 0 ? A further application of algebra answers this question.

7.4.3 – Geometric Constructions and Algebra


The introduction of Cartesian coordinates provided a framework for algebraic methods to address
geometric problems. For a general algebraic study of construction problems, we need to analyze the
algebra behind intersections of lines and circles.
Points in the Euclidean plane are in one-to-one correspondence with pairs (x, y) ∈ R2 .

Proposition 7.4.4
Let P be a point with coordinates (x0 , y0 ). The segment OP is constructible if and only if
the numbers x0 and y0 are constructible.
352 CHAPTER 7. FIELD EXTENSIONS

Proof. We assume that O is the intersection of the x-axis and the y-axis and suppose that R is on
the x-axis.
Suppose that the point P is constructible, in the sense that the segment OP is constructible.
Construct (via Proposition 11 in Book I of the Elements) the line L1 perpendicular to the x-axis
through the point P . The line L1 intersects the x-axis in a point P1 with coordinates (x0 , 0). Since
the segment OP1 is constructible and OP1 = x0 · OR, then x0 ∈ C. Following a similar construction
for the projection of P onto the y-axis, we deduce that x0 and y0 are constructible numbers.
Conversely, since x0 and y0 are constructible, we can construct P1 on the x-axis and P2 on the
y-axis such that OP1 = x0 · OR and OP2 = y0 · OR. Construct the line L1 perpendicular to the
x-axis that goes through the point P1 and construct also the line L2 perpendicular to the y-axis that
goes through the point P2 . Call P the intersection of L1 and L2 . We have given a construction for
the segment OP . Furthermore, since OP1 P P2 is a rectangle, the coordinates of P are (x0 , y0 ). 

When tracing a line with a straightedge, we always draw a line that passes through two already
given points. The equation for a line through (x1 , y1 ) and (x2 , y2 ) is
y2 − y1
y = y1 + (x − x1 ) ⇐⇒ (x2 − x1 )(y − y1 ) = (y2 − y1 )(x − x1 ).
x2 − x2
When using a compass, we trace out a circle with center A with radius AB where A and B are
points already obtained in the construction or specified in the hypotheses. The equation for a circle
of center (x0 , y0 ) and radius of length r is

(x − x0 )2 + (y − y0 )2 = r2 .

Beyond these tracing operations, we also will consider the intersection points between: (1) two lines;
(2) two circles; (3) a line and a circle.
Let P1 , P2 , P3 , and P4 be four points obtain by some compass and straightedge construction
from the initial segment OR. Suppose that the coordinates of Pi are (xi , yi ). Now let L1 be the line
through P1 and P2 and let L2 be the line through P3 and P4 . Assuming that L1 and L2 are not
parallel, the point Q of intersection between L1 and L2 satisfies the system of linear equations
(
(y2 − y1 )x − (x2 − x1 )y = x1 (y2 − y1 ) − y1 (x2 − x1 )
(y4 − y3 )x − (x4 − x3 )y = x3 (y4 − y3 ) − y3 (x4 − x3 ).

Via Cramer’s rule, if xi , yi ∈ F for i = 1, 2, 3, 4, where F is some field extension of Q, then the point
Q has coordinates in F .
Consider now the intersection of two circles. Let P1 and P2 be two distinct points obtained
by some compass and straightedge construction from the initial segment OR. Suppose that the
coordinates of Pi are (xi , yi ) where xi , yi are in some field extension F of Q. Suppose also that r1
and r2 are two radii that are constructible numbers. The intersection points of the circle Γ1 with
center P1 and radius r1 and the circle Γ2 with center P2 and radius r2 satisfy the system of equations
(
(x − x1 )2 + (y − y1 )2 = r12
(x − x2 )2 + (y − y2 )2 = r22 .

The difference of the two equations is

(x2 − 2xx1 + x21 ) + (y 2 − 2y1 y + y12 ) − (x2 − 2xx2 + x22 ) − (y 2 − 2y2 y + y22 ) = r12 − r22
⇐⇒ 2(x2 − x1 )x + 2(y2 − y1 )y = r12 − r22 + x22 + y22 − x21 − y12 .

Depending on whether x2 − x1 6= 0 or y2 − y2 6= 0, it is possible to write y as an expression ax + b,


where a, b ∈ F , or to write x as an expression cy + d, where c, d ∈ F . Without loss of generality,
assume the former case. Replacing y with ax + b in the equation for either of the circles leads
to a quadratic equation in x. We interpret the situation when the quadratic equation has no real
7.4. CONSTRUCTIBLE NUMBERS 353

solutions as when the two circles do not intersect. Otherwise, if the circles do intersect, both the
points of intersection have coordinates in a field extension K of F with [K : F ] = 1 or 2.
Now consider the intersection of a circle and a line. Let Γ be a circle with center (x0 , y0 ) and
radius r and let (x1 , y1 ) and (x2 , y2 ) be two distinct points. Suppose that there is a field extension F
of Q such that xi , yi ∈ F and r ∈ F . The points of intersection of Γ and the line L through (x1 , y1 )
and (x2 , y2 ) satisfies the system of equations
(
(x − x1 )2 + (y − y1 )2 = r2
(y2 − y1 )x − (x2 − x1 )y = x1 (y2 − y1 ) − y1 (x2 − x1 ).

As in the previous case, x1 6= x2 , in which case we can solve in the equation for the line, for y in
terms of x, or y1 6= y2 , in which case we can solve in the equation for the line, for x in terms of
y. The the equation for the circle leads to a quadratic equation, either in y or in x. If there is no
solution in real numbers to the equation, then we interpret this case to mean that Γ and L do not
intersect. If the equation has solutions, then these solutions are in a field extension K of F with
[K : F ] = 1 or 2.
This discussion leads to the following strengthening of Proposition 7.4.2.

Theorem 7.4.5
If a real number α ∈ R is constructible then [Q(α) : Q] = 2k for some nonnegative integer k.
Furthermore, the set of constructible numbers is exactly the set C 0 of real numbers defined
0
√ as the set0 that contains 1 and for any positive elements a, b ∈ C , then a ± b, ab,
recursively
a/b and a are in C .

Proof. First, suppose that α is a constructible number. In the Euclidean construction of a segment
OP with OP = α · OR, we construct a sequence of points P1 , P2 , . . . , Pn with Pn = P and such that
every controlling parameter of any geometric object (center and radius of a circle, two points of a
line) is O, R, or one of these points. Let αi be the constructible number such that OP = αi · OP .
Then

[Q(α) : Q] = [Q(α) : Q(αn−1 )][Q(αn−1 ) : Q(αn−2 )] · · · [Q(α2 ) : Q(α1 )][Q(α1 ) : Q].

Furthermore, from the above discussion, we know that [Q(αi ) : Q(αi−1 )] is 1 or 2. Hence, [Q(α) :
Q] = 2k for some nonnegative integer k.
Proposition 7.4.2 established that C ⊆ C 0 . However, at each stage of a Euclidean construction
since [Q(αi ) : Q(αi−1 )] is 1 or 2. If [Q(αi ) : Q(αi−1 )] = 1 then a point obtained from previous
points at the ith stage of the construction has coordinates that result from addition, subtraction,
multiplication, or division of coordinates of previous points. If [Q(αi ) : Q(αi−1 )] = 2 then αi is the
root of some quadratic polynomial with coefficients in Q(αi−1 ). In particular, αi is the sum of an
element in Q(αi−1 ) with the square root of an element in Q(αi ). Hence, we conclude that C = C 0 .

One of the profound consequences of Theorem 7.4.5 is that it gives a way to show if certain geo-
metric configurations cannot be obtained by a compass and straightedge construction. For example,
Exercise 7.4.3 guides a proof that, unlike with a regular pentagon, it is impossible to construct a
regular heptagon with a compass and a straightedge. The following three corollaries, as innocuous
as they seem, when first stated answered long-standing open problems in geometry.

Corollary 7.4.6
It is impossible to double the cube by a compass and straightedge construction.

Proof. √Let OR be one edge of√a cube C. A cube with double √the volume would have an edge of
length 3 2 · OR. However, [Q( 3 2) : Q] = 3. By Theorem
√ 7.4.5, 3 2 is not a constructible number so
3
it is impossible to construct a segment of length 2 · OR with a compass and straightedge. 
354 CHAPTER 7. FIELD EXTENSIONS

Corollary 7.4.7
It is impossible to square the circle by a compass and straightedge construction.

Proof. Recall that “squaring the circle” refers to the construction of starting from a circle Γ of center
O and radius OR, to construct a square whose area is equal to that of Γ. The area of Γ is πOR2 .
The area of a square is a2 , where a is the length
√ of the side. Constructing a square as desired would
lead to constructing√a line segment of length πOR. However, by √ Lindemann’s theorem that π is
transcendental, [Q( π) : Q] is infinite. Hence, by Theorem 7.4.5, π is not a constructible number
and thus it is impossible to construct a segment of the desired length. 

For the last corollary we consider, we state a generalization of Theorem 7.4.5. The proof follows
from the same procedure as that given for Theorem 7.4.5.

Theorem 7.4.8
Suppose that O, R, C1 , C2 , . . . , Cn are points given in the plane such that OCi = γi OR.
If a point A can be obtained from O, R, C1 , C2 , . . . , Cn with a compass and straightedge
construction and OA = αOR, then

[Q(α, γ1 , γ2 , . . . , γn ) : Q(γ1 , γ2 , . . . , γn )] = 2k

for some nonnegative integer k.

Corollary 7.4.9
It is impossible to trisect every angle using a compass and straightedge construction.

Proof. Let θ = ∠AOR be an angle. There is no assumption that OA is constructible from OR.

θ L
O A

Using angle addition formulas, it is easy to show that for any angle α,
   
3 3 θ θ
cos(3α) = 4 cos α − 3 cos α =⇒ cos θ = 4 cos − 3 cos .
3 3

So cos(θ/3) is a root of the polynomial 4x3 − 3x − cos θ whose coefficients are in the field Q(cos θ).
For an arbitrary angle θ, 4x3 − 3x − cos θ is irreducible in Q(cos θ)[x] so

[Q(cos(θ/3)) : Q(cos θ)] = 3.

By Theorem 7.4.8, it is impossible to construct a point C from O, R, and A using a compass and
straightedge such that ∠ROC = 13 ∠ROA. 
7.5. CYCLOTOMIC EXTENSIONS 355

Exercises for Section 7.4


p √
1. Find a compass and straightedge construction for the number 3 + 5.
2. Discuss how to construct a regular dodecagon (12 sides) with a compass and straightedge.
3. This exercise guides a proof that it is impossible to construct a regular heptagon with a compass and
straightedge. Set α = 2π7
.
(a) Prove that cos(3α) = cos(4α).
(b) Deduce that cos α solves 8x3 + 4x2 − 4x − 1 = 0.
(c) Deduce that [Q(cos(α)) : Q] = 3 and explain clearly why this means that the regular heptagon
is not constructible with a compass and a straightedge.
4. Prove that it is impossible to construct a regular 9-gon with a compass and straightedge.
5. Given a circle Γ, is it possible to construct a circle with double the area? If so, provide a compass and
straightedge construction.
6. For each nonnegative integer n, let Tn (x) be the Chebyshev polynomial defined by Tn (cos θ) = cos(nθ)
for θ.
(a) Tmn (x) = Tm (Tn (x)).
(b) Use this to find the four roots of T4 (x).
(c) Deduce that T4 (x) is irreducible in Q[x].
(d) Also use (a) to find all 6 roots of T6 (x).
7. Is it possible to construct, using a compass and a straightedge, a triangle with sides in the ratio of
2 : 3 : 4 that has the same area of a given square? If so, suppose that one side of the square is a
segment OR; describe a compass and straightedge construction for the desired triangle. [Hint: Use
Heron’s Formula.]
8. Is it possible to construct, using a compass and a straightedge, a triangle with angles in the ratio of
2 : 3 : 4 that has the same area of a given square? If so, suppose that one side of the square is a
segment OR; describe a compass and straightedge construction for the desired triangle.

7.5
Cyclotomic Extensions
In the study of polynomial equations of higher order, there is arguably a simplest equation of a given
degree, namely
z n − 1 = 0.
The roots of this polynomial are called the nth roots of unity. Without an understanding of the
properties of the roots of this polynomial, we should not expect to have a clear understanding of
roots of polynomials of degree n. This section studies the roots of unity and extensions of Q that
involve adjoining a root of unity.

7.5.1 – Primitive Roots


Borrowing from the polar expression of complex numbers, we know that we can write 1 as e2πki for
any k ∈ Z. If a complex number z = reiθ satisfies z n = 1, then

rn einθ = 1e2πki .

With the condition that r is a positive real number, r = 1 and nθ = 2πk for some k. Thus,

2πk
θ= , for k ∈ Z.
n
356 CHAPTER 7. FIELD EXTENSIONS

e6πi/10 e4πi/10

e8πi/10 e2πi/10

−1 = e10πi/10 1

e12πi/10 e18πi/10

e14πi/10 e16πi/10

Figure 7.1: The 10th roots of unity

The values k = 0, 1, . . . , n−1 give n distinct complex numbers. Since a nonzero polynomial of degree
n in F [x] can have at most n roots in the field F , then this gives all the roots. The nth roots of
unity are    
2πk 2πk
e2πik/n = cos + i sin for k = 0, 1, 2, . . . , n − 1.
n n
We will often denote ζn = e2πi/n so that the nth roots of unity are ζnk . As in Figure 7.1, the elements
ζnk , with k = 0, 1, . . . , n − 1, form the vertices of a regular n-gon on the unit circle.
The set of nth roots of unity, denoted by µn , forms a subgroup of (C∗ , ×). Indeed, 1 ∈ µn , so µn
is nonempty and ζna (ζnb )−1 = ζna−b ∈ µn , so µn is a subgroup of (C∗ , ×) by the One-Step Subgroup
Criterion. Furthermore, µn is isomorphic to Z/nZ via a 7→ ζna .

Definition 7.5.1
A primitive nth root of unity is an nth root of unity that generates µn .

By Proposition 3.3.7, Z/nZ can be generated by a if and only if gcd(a, n) = 1. Therefore, the
primitive roots of unity are of the form ζna where 1 ≤ a ≤ n with gcd(a, n) = 1.
Any solution to the equation xn − 1 = 0 over a field F is an element of finite order in the
multiplicative group of units U (F ). Combining results from group theory with field theory gives the
following result about the group of units in any field F .

Proposition 7.5.2
Let F be a field. Any finite group Γ in U (F ) is cyclic.

Proof. Let |Γ| = n. Since Γ is a finite abelian group, the Fundamental Theorem of Finitely Generated
Abelian Groups applies. Suppose that as an invariant factors expression
Γ∼
= Zn1 × Zn2 × · · · × Zn` .
Since ni+1 | ni for 1 ≤ i ≤ ` − 1, then xn1 − 1 = 0 for all x ∈ Γ. Assume Γ is not cyclic. Then n1 < n
and all n elements of Γ solve xn1 − 1 = 0. This contradicts the fact that the number of distinct roots
of a polynomial is less than or equal to its degree. (Corollary 6.5.10.) Hence, Γ is cyclic. 

Definition 7.5.3
Let p be a prime number. An integer a such that a ∈ Z/pZ generates U (Fp ) is called a
primitive root modulo p.
7.5. CYCLOTOMIC EXTENSIONS 357

Primitive roots modulo p are not unique in Fp . Since U (Fp ) is cyclic, then U (Fp ) ∼
= Zp−1 , so
there are φ(p − 1) generators. However, the existence of a primitive root modulo p is not at all
directly obvious from modular arithmetic.
Example 7.5.4. Consider the prime p = 17. Then 2 is not a primitive root modulo 17 because 2
has order 8 and thus does not generated U (17), which has order 16. On the other hand, whether by
hand or assisted by computer, we can show that 3 has order 16 in U (17). Hence, 3 is a primitive
root modulo 17. 4

7.5.2 – Cyclotomic Polynomials

Definition 7.5.5
The nth cyclotomic polynomial Φn (x) is the monic polynomial whose roots are the primitive
nth roots of unity. Namely,
Y
Φn (x) = (x − ζna ).
1≤a≤n
gcd(a,n)=1

A priori, the cyclotomic polynomials are elements of C[x] but we will soon see that the Φn (x) ∈
Z[x] and satisfy many properties.
We note right away that deg Φn (x) = φ(n) where φ is Euler’s totient function. If we write
Y Y Y Y Y
xn − 1 = (x − ζni ) = (x − ζn(n/d)i ) = (x − ζdi ),
1≤i≤n d|n 1≤i≤d d|n 1≤i≤d
gcd(i,d)=1 gcd(i,d)=1

then we deduce the following implicit formula for the cyclotomic polynomials,
Y
xn − 1 = Φd (x). (7.13)
d|n

The product identity (7.13) provides a recursive formula for Φn (x), starting with Φ1 (x) = x − 1.
For example, we have

x2 − 1
x2 − 1 = Φ1 (x)Φ2 (x) =⇒ Φ2 (x) = = x + 1.
x−1
By the same token, we have

x3 − 1
x3 − 1 = Φ1 (x)Φ3 (x) =⇒ Φ3 (x) = = x2 + x + 1.
x−1
A few other examples of the nth cyclotomic polynomials are

x4 − 1 x4 − 1
Φ4 (x) = = 2 = x2 + 1,
Φ2 (x)Φ1 (x) x −1
x5 − 1
Φ5 (x) = = x4 + x3 + x2 + x + 1,
Φ1 (x)
x6 − 1 x3 + 1
Φ6 (x) = = = x2 − x + 1,
Φ3 (x)Φ2 (x)Φ1 (x) x+1
x7 − 1
Φ7 (x) = = x6 + x5 + x4 + x3 + x2 + x + 1,
Φ1 (x)
x8 − 1 x8 − 1
Φ8 (x) = = 4 = x4 + 1.
Φ4 (x)Φ2 (x)Φ1 (x) x −1
358 CHAPTER 7. FIELD EXTENSIONS

The first few calculations hint at a number of properties that cyclotomic polynomials might
satisfy. We might hypothesize that the Φn (x) are in Z[x] and furthermore, that they only involve
coefficients of 0, 1, and −1. We might also hypothesize that the polynomials are palindromic. Also,
since each Φn (x) is obtained by dividing out of xn − 1 all terms that we know must divide xn − 1,
we might also hope that the Φn (x) are irreducible. Some of these hypotheses are true and some are
not.

Proposition 7.5.6
The cyclotomic polynomial Φn (x) is a monic polynomial in Z[x] of degree φ(n).

Proof. We have already shown that these polynomials have degree φ(n). We need to prove that
Φn (x) ∈ Z[x]. We prove this by strong induction, noticing that Φ1 (x) = x − 1 ∈ Z[x].
Suppose that Φk (x) is monic and in Z[x] for all k < n. Let f (x) be the polynomial defined by
Y
f (x) = Φd (x).
d|n
d<n

By induction, the polynomial is monic and has coefficients in Z. We know that xn − 1 = Φn (x)f (x)
and so by polynomial division in Q[x], Φn (x) is a polynomial with coefficients in Q. By Gauss’
Lemma we conclude that Φn (x) ∈ Z[x]. 

Proposition 7.5.7
The polynomial Φn (x) is palindromic.

Proof. If z = ζna ∈ C is a root of Φn (x), then z1 = ζn−a = ζn n − a is also a root of Φn (x), since
n − a is relatively prime to n whenever a is. Therefore, Φn (x) = xϕ(n) Φn (1/x). However, given any
polynomial p(x)Q[x], xdeg p p(1/x) is the polynomial obtained from p(x) by reversing the order of
the coefficients. 

It is possible to prove that, as polynomials, xgcd(m,n) − 1 is the monic greatest common divisor
of xm − 1 and xn − 1. We say that the sequence of polynomials {xn − 1}∞ n=1 is a strong divisibility
sequence in Z[x]. In [8], the authors proved that the strong divisibility property is sufficient to define
the cyclotomic polynomials Φn (x) in Z[x] that satisfy the recursive formula (7.13), without reference
to roots of unity.
The hypothesis that all the coefficients of Φn (x) at −1, 0, or 1 is not true. The first cyclotomic
polynomial that has a coefficient different from −1, 0, or 1 is Φ105 (x). As Exercise 7.5.15 shows, it
is not a coincidence that the integer 105 happens to be the first positive integer that is the product
of three odd primes. A direct calculation gives,

Φ105 (x) = 1 + x + x2 − x5 − x6 − 2x7 − x8 − x9 + x12 + x13 + x14 + x15 + x16 + x17 − x20
− x22 − x24 − x26 − x28 + x31 + x32 + x33 + x34 + x35 + x36 − x39 − x40
− 2x41 − x42 − x43 + x46 + x47 + x48 .

The coefficients of x7 and of x41 are −2.

Theorem 7.5.8
For all n ∈ N∗ , the cyclotomic polynomial Φn (x) is an irreducible polynomial in Z[x] of
degree φ(n).

Proof. What is left to show is that Φn (x) is irreducible. We will show that if p is any prime not
dividing n, then Φn (x) is irreducible over Fp .
7.5. CYCLOTOMIC EXTENSIONS 359

Suppose that Φn (x) factors into f (x)g(x) over Q and, without loss of generality, we suppose that
f (x) is irreducible with deg f (x) ≥ 1. Let ζ be a primitive nth root of unity. Then ζ p is also a
primitive root (since p - n).
Assume now that g(ζ p ) = 0. Then ζ is a root of g(xp ) and since f (x) is the minimal polynomial
of ζ, we have g(xp ) = f (x)h(x). Reduction modulo p into Fp we get

f¯(x)h̄(x) = ḡ(xp ) = (ḡ(x))p ,

where the last equality holds by the Frobenius homomorphism. Therefore, f¯(x) and ḡ(x) have an
irreducible factor in common in the UFD Fp [x]. This implies that Φn (x) has a multiple root in
Fp and hence that xn − 1 has a multiple root in the finite field. We prove that this leads to a
contradiction. Recall the polynomial derivative D described in Exercise 6.5.23. If xn − 1 has a
multiple root in some field extension of Fp , then this multiple root must be a root of xn − 1 and of
the derivative D(xn −1) = nxn−1 . However, since p - n, then the only root of D(xn −1) is 0, whereas
0 is not a root of xn − 1. So by contradiction, we know that ζ p is not a root of g(x). Therefore, ζ p
must be a root of f (x).
Now let a be any integer that is relatively prime to n. Then we can write a = p1 p2 · · · pk as a
product of not necessarily distinct primes that are relatively prime to n, and hence which do not
divide n. From the above paragraph, if ζ is a root of f (x), then ζ p1 is also a root of f (x). Then
ζ p1 p2 = (ζ p1 )p2 is also a root of f (x). By induction, we deduce that ζ a is a root of f (x). This now
means that every primitive nth root is also a root of f (x) so g(x) is a unit and f (x) is an associate
of Φn (x). Hence, Φn (x) is irreducible. 

Definition 7.5.9
The field extension Q(ζn ) is called the nth cyclotomic extension of Q.

Since the roots of Φn (x) involve powers of ζn , then all the roots of Φn (x) are in the cyclotomic
extension Q(ζn ). Theorem 7.5.8 immediately gives the following result.

Corollary 7.5.10
For any integer n ≥ 2, the degree of the cyclotomic field over Q is

[Q(ζn ) : Q] = φ(n).

Example 7.5.11. In Exercise 2.1.16, we proved that if 2n −1 is prime then n is prime. The converse
implication is not necessarily true. With cyclotomic polynomials at our disposal, we see that
Y Y Y
2n − 1 = Φd (2) = (2 − 1) Φd (2) = Φd (2).
d|n d|n d|n
d>1 d>1

Therefore, if n is not prime, then 2n − 1 is certainly not prime because it is divisible by Φd (2) for
d > 1 and d|n. 4

7.5.3 – Möbius Inversion


Let {bn }n≥1 be a sequence in a ring R we define the associated product sequence {an }n≥1 by
Y
an = bd . (7.14)
d|n

Now given a product sequence, it is possible to recover the original sequence bn .


360 CHAPTER 7. FIELD EXTENSIONS

Theorem 7.5.12 (Möbius Inversion)


If a sequence {an }n≥1 is given by (7.14), then
Y
bn = (an/d )µ(d) ,
d|n

where µ is the Möbius function defined on the positive integers by



1
 if n = 1
µ(n) = 0 if n is not square-free
 `
(−1) if n = p1 p2 · · · p` , where the pi are distinct primes.

Proof. The proof involves manipulations of double products. We begin with the product on the
right of the Möbius inversion formula,
 µ(d)  
Y Y Y Y Y
(an/d )µ(d) =  be  =  (be )µ(d)  .
d|n d|n e|(n/d) d|n e|(n/d)

Note that the pairs (d, e) with d | n and e | (n/d) are the same as those with e | n and d | (n/e).
Hence,  
Y µ(d) Y Y Y P
an/d =  (be )µ(d)  = (be ) d|(n/e) µ(d) . (7.15)
d|n e|n d|(n/e) e|n
P
Now if k = 1, then d|k µ(d) = µ(1) = 1. On the other hand, suppose k is any integer with
P
k > 1. Then the nonzero terms in d|k µ(d) correspond to products of distinct prime divisors of
P
k. Suppose that p1 , p2 , . . . , pr are the distinct prime divisors of k. In the sum d|k µ(d), we group
together terms arising from products of i distinct prime divisors of k. Then
r  
X X r
µ(d) = (−1)i = (1 − 1)r = 0.
i=0
i
d|k

Hence, we deduce that (


X 1 if k = 1
µ(d) =
d|k
0 otherwise.

We conclude from (7.15) that Y


(an/d )µ(d) = bn
d|n

and the theorem follows. 

Though the Möbius inversion formula has many applications in number theory, we presented it
here to provide an alternative nonrecursive formula for the cyclotomic polynomials.

Proposition 7.5.13
For all positive integers n, Y
Φn (x) = (xn/d − 1)µ(d) .
d|n

Proof. Follows from (7.13) and the Möbius inversion formula. 


7.5. CYCLOTOMIC EXTENSIONS 361

7.5.4 – Useful CAS Commands


The following Maple commands are in the numtheory package.

Maple Function
primroot(p); Returns the least positive primitive root modulo p.
primroot(n,p); Returns the least positive primitive root a modulo p with n < a < p.
cyclotomic(n,x); Returns the nth cyclotomic polynomial.
mobius(n); Returns the Möbius function of n.

Exercises for Section 7.5


1. Show explicitly that 2 is a primitive root modulo 11. Use this to find all primitive roots modulo 11.
2. Use technology to find all primitive roots modulo 23.
3. Calculate Φ12 (x). Also find the roots of Φ12 (x) with radicals. [Hint: Exercise 7.2.9.]
4. Calculate Φ14 (x).
n−1
5. Prove that Φ2n (x) = x2 + 1.
6. Prove that if n is odd, then Φ2n (x) = Φn (−x).
7. Show that if ζn is a primitive nth root of unity, then all the roots of Φn (x) are in Q(ζn ).
8. Suppose that m and n are positive relatively prime integers. Let ζn be a primitive nth root of unity
and let ζm be a primitive mth root of unity. Prove that ζn ζm is a primitive mnth root of unity.
9. Prove that if ζn is a primitive nth root of unity, then ζnd is a primitive (n/d) root of unity.
10. Prove that in the ring Z[x], the sequence of polynomial {xn − 1}∞
n=1 satisfies the identity that

gcd(xm − 1, xn − 1) = xgcd(n,m) − 1,

where in the first greatest common divisor we take the monic polynomial.
11. Suppose that p | n. Prove that Φpn (x) = Φn (xp ).
α
12. Suppose n = pα 1 α2
1 p2 · · · p` , where pi are distinct primes. Prove that
`

α1 −1 α2 −1 α −1
···p` `
Φn (x) = Φp1 p2 ···p` (xp1 p2
).

[Hint: Use Exercise 7.5.11.]


13. Use Exercise 7.5.12 to determine Φ225 (x).
14. Determine Φ60 (x). [Hint: Use previous exercises in this section.]
15. In 1883, Migotti proved that if p and q are distinct primes then Φpq (x) only involves the coefficients
of −1, 0, and 1. (In 1996, Lam and Leung offered a much shorter proof of this fact [41].) Use this
theorem and previous exercises to prove that if Φn (x) has coefficients besides −1, 0, or 1, then it must
be divisible by at least three distinct odd primes.
16. Suppose that p is a prime number such that p | Φn (2). Prove that p | Φpn (2). Deduce that p | Φpα n (2)
for all positive integers α.
17. Let p be an odd prime dividing n. Suppose that a ∈ Z satisfying Φn (a) ≡ 0 (mod p). Prove that a is
relatively prime to p and the that order of a in U (p) is precisely n.
18. Let m and n be positive integers and let l = lcm(m, n) and d = gcd(m, n). Prove that
(a) Q(ζm )Q(ζn ) = Q(ζl ) as a composite field;
(b) Q(ζm ) ∩ Q(ζn ) = Q(ζd ).
19. Prove that the Möbius inversion formula can also be written as
Y Y µ(n/d)
an = bd ⇒ bn = ad .
d|n d|n
362 CHAPTER 7. FIELD EXTENSIONS

Exercises 7.5.20 through 7.5.25 deal with dynatomic polynomials defined as follows. Let F be a field and let
P (x) ∈ F [x]. Consider sequences in F that satisfy the recurrence relation xn+1 = P (xn ). A fixed point is an
element c in F or an extension of F that satisfies P (x) − x = 0. A 2-cycle is such a sequence that satisfies
x2 = x0 . An element on a 2-cycle satisfies the equation P (P (x)) − x = 0. However, fixed points also solve
the equation P (P (x)) − x = 0. Consequently, elements that are on a 2-cycle but are not fixed points are
solutions to the equation
P (P (x)) − x
.
P (x) − x
An n-cycle recurrence sequence as defined above such that xn = x0 . Points on n-cycles satisfy P n (x)−x = 0,
where by P n (x) we mean P (x) iterated n times. For example, P 3 (x) = P (P (P (x))). For any d that divides
n, all the points on a d-cycle also satisfy P n (x) − x = 0. Similar to cyclotomic polynomials, we define the
nth dynatomic polynomial of P (x) recursively by ΦP,1 (x) = P (x) − x and
Y
P n (x) − x = ΦP,d (x).
d|n

An n-cycle that is not also a d-cycle for any d that divides n is called a primitive n-cycle. Points on a
primitive n-cycle must be roots of ΦP,n (x). It is possible, though not easy to prove that ΦP,n (x) ∈ F [x] for
all P (x) ∈ F [x] [8].
20. Prove that polynomial iteration satisfies (a) P n (P m (x)) = P m+n (x); (b) (P n )m (x) = P mn (x); and
(c) deg P n (x) = kn where deg P (x) = k.
21. Suppose that deg P (x) = m > 1. Prove that
X
deg ΦP,n (x) = µ(d)mn/d .
d|n

1
µ(d)mn/d primitive n-cycles.
P
Deduce that a sequence satisfying xn+1 = P (xn ) can have at most n d|n

22. Let Q(x) = x2 − 2. Calculate ΦQ,2 (x), ΦQ,3 (x), and ΦQ,4 (x).
23. Let P (x) = x3 − 2. Calculate ΦP,2 (x), ΦP,3 (x), and ΦP,4 (x).
24. Let P (x) = x2 − 2x + 2. Calculate ΦP,2 (x), ΦP,3 (x), and ΦP,4 (x).
25. Let Q(x) = x2 − 45 . Prove by direct calculation that for this particular polynomial, ΦQ,2 (x) divides
ΦQ,4 (x).

7.6
Splitting Fields and Algebraic Closure
7.6.1 – Splitting Fields
Let K be a field extension of F . Proposition 7.2.2, one of the key propositions of the section, estab-
lished that for any element α ∈ K that is an algebraic element over F , there exists a unique monic
polynomial mα,F (x) of minimal degree in F [x] such that α is a root of mα,F (x). The motivating
observation of this section is that though F (α) contains the root α, it does not necessarily contain
all the roots of mα,F (x).

Example 7.6.1. Let F = Q and consider α = 3 7 ∈ R. The minimal polynomial for α is f (x) =
x3 − 7. However, the three roots of this polynomial are

√ √  −1 + i√3  √  −1 − i√3 
3 3 3
7, 7 , 7 .
2 2

Obviously, Q( 3 7) is a subfield of R but the other roots of the minimal polynomial are not in R. 4
7.6. SPLITTING FIELDS AND ALGEBRAIC CLOSURE 363

Definition 7.6.2
A field extension K of F is called a splitting field for the polynomial f (x) ∈ F [x] if f (x)
factors completely into linear factors in K[x] but f (x) does not factor completely into linear
factors in F 0 [x] where F 0 is any field with F ( F 0 ( K.

If K is an extension of F , we will also use the terminology that f (x) ∈ F [x] splits completely in
K to mean that f (x) factors into linear factors in K[x]. However, this does not mean that K is a
splitting field of f (x) but simply that K contains a splitting field of f (x).

Theorem 7.6.3
For any field F , if f (x) ∈ F [x], there exists an extension K of F that is a splitting field for
f (x). Furthermore, [K : F ] ≤ n! where n = deg f (x).

Proof. We proceed by induction on the degree of f . If deg f = 1, then F contains the root of f (x)
so F itself is a splitting field for f (x) and the degree of F over itself is 1.
Suppose that the theorem is true for all polynomials of degree less than or equal to n. Let f (x)
be a polynomial of degree n + 1. If f (x) is reducible, then f (x) = a(x)b(x) where deg a(x) = k with
1 ≤ k ≤ n. By induction, both a(x) and b(x) have splitting fields, E1 and E2 . Then the composite
of these two fields, E1 E2 , is a splitting field for f (x). Furthermore, by the induction hypothesis and
Proposition 7.2.21, [E1 E2 : F ] is less than or equal to [E1 : F ][E2 : F ] = k!(n + 1 − k)! ≤ (n + 1)!.
Suppose now that f (x) is irreducible. Then F 0 = F [x]/(f (x)) is a field extension of F in which
the element x̄ is a root of f (x). Note that [F 0 : F ] = (n + 1). In F 0 [t], (t − x̄) is a linear factor of
f (t). One obtains the factorization as follows. If
n+1
X
f (x) = ai xi ,
i=0

then f (t) = 0 is equivalent to f (t) − f (x̄) = 0 so


n+1
X n+1
X  X i−1 
0= ai ti − ai x̄i = (t − x̄) ai tj x̄i−1−j
i=1 i=1 j=0
n X
X i n X
X n 
= (t − x̄) ai+1 tj x̄i−j = (t − x̄) ai+1 x̄i−j tj .
i=0 j=0 j=0 i=j

Therefore, in F 0 [t], f (t) factors into f (t) = (t − x̄)q(t) where q(t) has degree n. By the induction
hypothesis, q(t) has a splitting field K over F 0 such that [K : F 0 ] ≤ n!. Therefore, f (x) splits
completely in K. Also,

[K : F ] = [K : F 0 ][F 0 : F ] = (n + 1)[K : F 0 ] ≤ (n + 1) n! ≤ (n + 1)!.

By induction, the theorem holds for all fields and for all polynomials f (x). 

We would like to shift the notion of a splitting field of a polynomial f (x) over F into a property
of a field extension, without necessarily referring to a specific polynomial f (x).

Definition 7.6.4
A normal extension is an algebraic extension K of F that is the splitting field for some
collection (not necessarily finite) of polynomials fi (x) ∈ F [x].

 √ 
1+ 5
Example 7.6.5. The splitting field of f (x) = x2 − x − 1 over Q is Q 2 . The degree of the
extension is 2. 4
364 CHAPTER 7. FIELD EXTENSIONS

In fact, a splitting field of any quadratic polynomial f (x) ∈ F [x] is F (α) where α is one of the
roots.
Example 7.6.6. Consider the polynomial f (x) = x3 − 7. Example 7.6.1 lists the three roots of
f (x) in C.
√ A splitting field √K for f (x) must contain all three of the roots. In particular,
√ K must
contain 3 7 and ζ3 = (−1 + −3)/2. This is sufficient so a splitting field is K = Q( 3 7, ζ3 ). Recall
that ζ3 is algebraic with degree 2 and with√minimal polynomial x2 + x + 1. Furthermore, √ ζ3 is not a
real number so it is not an element of Q( 3 7) which is a subfield of R. Hence, [K : Q( 3 7)] = 2 and
so the degree of the extension is
√3
√3

3
[K : Q] = [Q( 7, ζ) : Q] = [K : Q( 7)][Q( 7) : Q] = 2 · 3 = 6. 4

Example√7.6.7. We point out that the extension Q( 3 7) is not a normal extension of Q because it
contains 3 7 but not the other two roots of the minimal polynomial x3 − 7. 4
Example 7.6.8. Consider the cubic polynomial p(x) = x3 +6x2 +18x+18 ∈ Q[x] in√Example √ 7.3.5.
We prove that the splitting field K of p(x) has degree [K : Q] = 6. Let x1 = 2 + 3 4 − 3 2. Note
that x1 has degree 3 over Q since the polynomial is irreducible (by Eisenstein’s Criterion modulo
2). However, x1 ∈ R, whereas x2 and x3 have imaginary components. Thus, x2 , x3 ∈ / Q(x1 ) and the
splitting field of p(x) is a nontrivial extension of Q(x1 ). We can conclude from Theorem 7.6.3 that
the splitting field of p(x) has degree 6. However, we can also tell that
 
2 18
p(x) = (x − x1 ) x + (6 + x1 )x − .
x1
So x2 and x3 are the roots of the quadratic polynomial. 4
Example 7.6.9. Consider the polynomial g(x) = x4 +2x2 −2. We can find the roots by first solving
a quadratic polynomial for x2 . Thus,

−2 ± 4 + 8 √
x2 = = −1 ± 3.
2
p √ p √
Thus,
p the four roots of g(x) are ± −1 + 3 and p ± √ −1 − 3.pSince g(x) is irreducible,p we have
√ √ √
[Q( −1 + 3) : Q] = 4. It is easy to tell that −1 − 3 ∈
/ Q( −1 + 3) because Q( −1 + 3)
p √ √ p √
is a subfieldpof R whereas −1 − 3 is a complex number. Noticing first that 3 ∈ Q( −1 + 3),
√ √ p √ p √
as 3 = ( −1 + 3)2 + 1, we see that −1 − 3 is an algebraic element over Q( −1 + 3)
satisfying the polynomial equation √
x2 + 1 + 3 = 0.
p √ p √
Thus, a splitting field of g(x) over Q is K = Q( −1 + 3, −1 − 3). Furthermore,
√ √
 q   q  
[K : Q] = K : Q −1 + 3 Q −1 + 3 : Q = 2 · 4 = 8.

This degree is a strict divisor of the upper bound 4! = 24 as permitted by Theorem 7.6.3. 4
From the examples, the splitting field of some polynomials seems a natural construction so it
may seem puzzling why we have been saying “a” splitting field. From the construction of a splitting
field as described in Theorem 7.6.3 it is not obvious that splitting fields are unique. The following
theorem establishes this important property.

Theorem 7.6.10
Let ϕ : F ∼= F 0 be an isomorphism of fields. Let f (x) ∈ F [x] and let f 0 (x) ∈ F 0 [x] be the
polynomial obtained by applying ϕ to the coefficients of f (x). Let E be a splitting field
for f (x) over F and let E 0 be a splitting field for f 0 (x) over F 0 . Then the isomorphism ϕ
extends to an isomorphism σ : E ∼ = E 0 such that σ|F = ϕ.
7.6. SPLITTING FIELDS AND ALGEBRAIC CLOSURE 365

Proof. We proceed by induction on the degree n of f (x).


If n = 1, then E = F and E 0 = F 0 and we can use σ = ϕ as a trivial extension.
Suppose the induction hypothesis that the theorem holds for any field F , any isomorphism ϕ
and any polynomial f (x) with deg f (x) < n. Let p(x) be an irreducible factor of f (x) in F [x] with
deg p(x) ≥ 2 and let p0 (x) be the corresponding irreducible factor of f 0 (x) in F 0 [x]. If α is a root
of p(x) in E and β a root of p0 (x) in E 0 , then by Exercise 7.1.18 there exists a field isomorphism
σ 0 : F (α) ∼
= F 0 (β) that extends ϕ. Call F1 = F (α) and F10 = F 0 [x]/(p0 (x)) = F 0 (β). Then, over
F1 , we have f (x) = (x − α)f1 (x) while over F10 we have f 0 (x) = (x − β)f10 (x) for polynomials
f1 (x) ∈ F1 [x] and f10 (x) ∈ F10 [x] both of degree n − 1.
We know that f1 (x) splits completely in E but we can also conclude that the field E is a splitting
field for f1 (x) over F1 . Indeed, if L were any field F1 ( L ( E over which f1 (x) split completely,
then since α ∈ L, f (x) would also split completely in L. Since a splitting field is a minimal field
extension over which a polynomial splits, E must be the splitting field of f1 (x) over F1 . The same
holds for E 0 as a splitting field of f10 (x) over F10 .
By the induction hypothesis, there exists a field isomorphism σ : E ∼ = E 0 that extends σ 0 and
therefore σ also extends ϕ. 

Corollary 7.6.11
Any two splitting fields for a polynomial f (x) ∈ F [x] over a field F are isomorphic.

Because of Corollary 7.6.11, we talk about the splitting field.

Example 7.6.12 (Cyclotomic Fields). Recall that cyclotomic extensions are extensions of Q
that contain the n-roots of unity, i.e., the roots of xn − 1 = 0. As in Section 7.5, we call ζn
the complex number
 2π   2π 
ζn = e2πi/n = cos + i sin .
n n
Then all the roots of unity are of the form ζnk for 0 ≤ k ≤ n − 1. This shows, however, that all the
roots of unity are in Q(ζn ). Consequently, Q(ζn ) is the splitting field of xn − 1 and more precisely
of the cyclotomic polynomial Φn (x). 4

In some examples and exercises, we have seen that on occasion an extension F (α, β) of F is
nonetheless a primitive extension with F (α, β) = F (γ). Splitting fields allow us to prove that this
always happens under certain circumstances.

Theorem 7.6.13 (Primitive Element Theorem)


If char F = 0 and if α and β are algebraic over F , then there exists γ ∈ F (α, β) such that
F (α, β) = F (γ).

Proof. Let K be the splitting field of mα,F (x)mβ,F (x). Let α1 , α2 , . . . , αm be the roots of mα,F (x)
in K and let β1 , β2 , . . . , βn be the roots of mβ,F (x). We assume that α = α1 and β = β1 . Every
field of characteristic 0 has an infinite number of elements. Choose some element d ∈ F such that
α − αi
d 6= for i ≥ 1 and j > 1.
βj − β

Given this choice of d, set γ = α + dβ.


Obviously, F (γ) is a subfield of F (α, β). We wish to show that the converse inclusion holds:
F (α, β) ⊆ F (γ). In the polynomial ring F (γ)[x], both mβ,F (x) and p(x) = mα,F (γ − dx) satisfy

mβ,F (β) = 0 and p(β) = mα,F (α) = 0.

Hence, both polynomials are divisible by mβ,F (γ) (x).


366 CHAPTER 7. FIELD EXTENSIONS

Now, since mβ,F (γ) (x) divides mβ,F (x) in F (c)[x], then mβ,F (x) splits completely in K[x]. Fur-
thermore, the only zeros of mβ,F (γ) (x) must also be zeros of mβ,F (x), namely β1 , β2 , . . . , βn . How-
ever, a zero of mβ,F (γ) (x) must also be a zero of p(x), namely some x0 such that γ − dx0 = αi or in
other words, some x0 such that

α + dβ = αi + dx0 ⇐⇒ α − αi = d(x0 − β).

By definition of d, the only x0 that satisfies this and is a root of mβ,F (x) is x0 = β. Thus, in K[x]
and hence also in F (α, β), we deduce that β is the only root of mβ,F (γ) (x). Since it is irreducible,
mβ,F (γ) (x) = x − β. This shows that β ∈ F (γ) From this, we also deduce that α = γ − dβ ∈ F (γ).
Hence, F (α, β) ⊆ F (γ) and the theorem follows. 

7.6.2 – Algebraic Closure


As we consider algebraic extensions of a field, from an intuitive perspective, we often think of
adjoining a collection of algebraic elements to some base field. Is there ever a situation in which
there is nothing else we can adjoin? From another perspective, given a field F , is there some
extension of F [x] where every polynomial splits completely?

Definition 7.6.14
Let F be a field. A field L is called an algebraic closure of F if L is algebraic over F and
if every polynomial f (x) ∈ F [x] splits completely in L.

A related notion is the following property of a field within itself.

Definition 7.6.15
A field F is said to be algebraically closed if every polynomial f (x) ∈ F [x] has a root in F .

Notice that if F is algebraically closed, then every polynomial f (x) ∈ F [x] has a root α in F .
Consequently, in F [x], the polynomial factors f (x) = (x − α)p(x) for some p(x) ∈ F [x]. Since p(x)
and any subsequent factors must have a root, then by induction, every polynomial splits completely.
Consequently, a field F is algebraically closed if it is an algebraic closure of itself. This remark
motivates the following easy proposition.

Proposition 7.6.16
If L is an algebraic closure of F , then L is algebraically closed.

Proof. Let f (x) ∈ L[x] and let α be a root of f (x). Then α gives an algebraic extension L(α) of L.
However, since L is algebraic over F , by Theorem 7.2.24, L(α) is an algebraic extension of F . Thus,
α is algebraic over F and hence α ∈ L. This shows that L is algebraically closed. 

The concept of an algebraic closure of a field is a rather technical one. Though the previous
portion of this section outlined how to construct a splitting field of a polynomial over a field, finding
a field extension in which all polynomials split completely poses a problem of construction. Indeed,
though Definition 7.6.14 defines the notion of algebraic closure, it is not at all obvious that an
algebraic closure exists for a given field F . Also, if an algebraic closure of F exists, it is not readily
apparent whether algebraic closures are unique. This section provides answers to these questions
but the proofs of some of the results depend on Zorn’s Lemma, which is equivalent to the Axiom of
Choice.
It is also not at all clear that any algebraically closed fields exist. From the properties of complex
numbers, the quadratic formula, and Cardano’s cubic and quartic formula, one may hypothesize
that the field of complex numbers is algebraically closed. Indeed, as early as the 17th century,
mathematicians, including the likes of Euler, Laplace, Lagrange, and d’Alembert, attempted to
prove this. The first rigorous proof was provided by Argand in 1806. Since then, mathematicians
7.6. SPLITTING FIELDS AND ALGEBRAIC CLOSURE 367

have discovered proofs involving techniques from disparate branches of mathematics. Because of its
importance in algebra and the difficulty of proving it, the fact that C is an algebraically closed field
became known as the Fundamental Theorem of Algebra.

Theorem 7.6.17 (Fundamental Theorem of Algebra)


The field C is algebraically closed.

The “simplest” proof of the Fundamental Theorem of Algebra uses theorems from complex
analysis that are outside the scope of this text. We will provide an algebraic proof in Section 11.5,
but it requires more theory than we have yet provided. Consequently, for the moment, we accept
this result without proof.
It should not surprise us that analysis might be required to prove that C is algebraically closed.
The construction of C depends on the construction of R and properties of real functions are precisely
the purview of analysis. For example, the concept of continuity leads to the Intermediate Value
Theorem, which can be used to show that any polynomial p(x) ∈ R[x] of odd degree has a root. It
is the notion of continuity that ensures the polynomial does not change signs without having a root.
The proof of Theorem 7.6.10 showed how to construct a splitting field K of a single polynomial
f (x) ∈ F [x] over the field F . An algebraic closure of a field F must essentially be a splitting field
for all polynomials in F [x]. It is hard to imagine what such a field would look like and how to
describe such a field. If we had only a finite number of polynomials f1 (x), f2 (x), . . . , fk (x), then the
composite of the k splitting fields, which is also the splitting field of f1 (x)f2 (x) · · · fk (x), contains
all the roots of these polynomials. However, F [x] contains an infinite number of polynomials, so we
are faced with a problem of constructibility.
To keep track of all the polynomials in F [x], Artin devised the strategy of introducing a separate
variable for each polynomial. We give his proof here below.

Theorem 7.6.18
For any field F , there exists an algebraically closed field K containing F .

Proof. Let P be the set of associate classes of irreducible elements in F [x]. Every class in P can
be represented by a unique monic nonconstant irreducible polynomial p(x). Let S be a set of
indeterminate symbols that is in bijection with P via [p] ↔ xp , where p(x) is monic. Consider the
multivariable polynomial ring F [S]. In F [S] consider the ideal

I = (p(xp ) | [p] ∈ P and p(x) is monic).

We first prove that I is a proper ideal of F [S]. Assume that I = F [S]. Then there exist monic
irreducible polynomials p1 , p2 , . . . , pn (x) and polynomials g1 , g2 , . . . , gn ∈ F [S] such that

g1 p1 (xp1 ) + g2 p2 (xp2 ) + · · · + gn pn (xpn ) = 1. (7.16)

The polynomials g1 , g2 , . . . , gn can involve only a finite number of variables x1 , x2 , . . . , xm , some of


which must be the variables xp1 , xp2 , . . . , xpn . Let L be a field extension of F in which αi is a root
of pi (x) for each i = 1, 2, . . . , n. Evaluating the expression (7.16) at some point (c1 , c2 , . . . , cm ∈ Lm
for which the variable corresponding to xpi is αi , we get 0 = 1 in L. This contradicts the assumption
that I = F [S]. Hence, I is a proper ideal.
Since I is a proper ideal, by Krull’s Theorem (Theorem 5.7.2), I is contained in a maximal ideal
M ∈ F [S]. Then the quotient ring K1 = F [S]/M is a field containing F as a subfield. Furthermore,
for each monic irreducible polynomial p(x) ∈ F [x], the element xp in K1 is a root of p(x). Hence,
every polynomial f (x) ∈ F [x] has a root in K1 . This does not yet show that F is algebraically closed
since the roots are in a field extension.
Repeat the same construction now with K1 serving the role of F . This constructs a field extension
K2 of K1 in which every polynomial q(x) ∈ K1 [x] has a root in K2 . Similarly, for all integers i > 0,
368 CHAPTER 7. FIELD EXTENSIONS

construct the field extension Ki+1 of Ki in the same way. This iterated construction creates a chain
of nested field extensions of F ,

F = K0 ⊆ K1 ⊆ K2 ⊆ · · · ⊆ Kn ⊆ · · ·

in which every polynomial q(x) ∈ Ki [x] has a root in Ki+1 . Let K be the union of all the fields
[
K= Ki .
i≥0

The field K is an extension of F . Let q(x) ∈ K[x] be a polynomial

q(x) = qk xk + · · · + q1 x + q0 .

For 0 ≤ ` ≤ k, the coefficient q` is in some Ki` . If N is the maximum of {i1 , i2 , . . . , i` }, then


q(x) ∈ KN [x]. Then q(x) has a root in KN +1 , which is in K. Thus, K is an algebraically closed
field that is an extension of F . 

It is interesting to observe that the existence of a maximal ideal M containing I follows from
Zorn’s Lemma.
The field K constructed in the above proof may seem woefully large. Indeed, the strategy of
the proof simply provides a well-defined construction of a field extension that is large enough to
be algebraically closed. However, this could be far larger than an algebraic closure. The following
proposition pares down the algebraically closed field K to an algebraic closure of F .

Proposition 7.6.19
Let L be an algebraically closed field and let F be a subfield of L. The set K of elements
in L that are algebraic over F is an algebraic closure of F .

Proof. By definition, K is algebraic over F . Furthermore, every polynomial f (x) ∈ F [x] ⊂ L[x]
splits completely over L. But each root α of f (x) is algebraic of F so is an element of K. Therefore,
all the linear factors (x − α) of the factorization of f (x) are in K[x]. Hence, f (x) splits completely
in K[x] and hence K is an algebraic closure of F . 

Theorem 7.6.18 coupled with Proposition 7.6.19 establish the existence of algebraic closures for
any field F . This has not yet answered the important question of whether algebraic closures of a
field are unique (up to isomorphism). In order to establish this, we need an intermediate theorem.

Theorem 7.6.20
Let F be a field, let E be an algebraic extension of F and let f : F → L be an embedding
(injective homomorphism) of F into an algebraically closed field L. Then there exists an
embedding λ : E → L that extends f , i.e., λ|F = f .

Proof. Let S be the set of all pairs (K, σ), where K is a field with F ⊆ K ⊆ E such that σ extends
f to an embedding of K into L. We define a partial order 4 on S where (K1 , σ1 ) 4 (K2 , σ2 ) means
if K1 ⊆ K2 and σ2 extends σ1 , i.e., σ2 |K1 = σ1 . The set S is nonempty since it contains the pair
(F, f ). For any chain
{(Ki , σi )}i∈I
in the poset (S, 4) define K 0 = i∈I . Every element α0 ∈ K is in Ki for some I ∈ I. Define the
S
function σ 0 : K 0 → L by σ 0 (α) = σi if α ∈ Ki . This function is well-defined because if Ki ⊆ Kj ,
then σj |Ki = σi so σj (α) = σi (α). Therefore, the choice of index i to use for defining σ 0 is irrelevant.
The pair (K 0 , σ 0 ) is an upper bound for the described chain. Consequently, Zorn’s Lemma applies
and we conclude that S contains maximal elements.
7.6. SPLITTING FIELDS AND ALGEBRAIC CLOSURE 369

For a maximal element (K, λ) in S, the field K is a subfield of E and the function λ : K → L is
an embedding of K into L that extends f . Assume that there exists α ∈ E − K. Since E is algebraic
over F , by Corollary 7.2.3, it is algebraic over K. By Exercise 7.1.18, λ : K → L can be extended
to an embedding K(α) → L, contradicting the maximality of the pair (K, λ). Hence, E − K = ∅, so
K = E and the function λ : E → L is an extension of F → L. 
Theorem 7.6.20 gives the following important Proposition.

Proposition 7.6.21
Let F be a field and let E and E 0 be two algebraic closures of F . Then E and E 0 are
isomorphic.

Proof. Let f : F → E 0 be an embedding of F into E 0 . By Proposition 7.6.16, E 0 is algebraically


closed. Since E is algebraic over F , by Theorem 7.6.20, there exists an embedding λ : E → E 0 that
extends f . Since E is algebraically closed and E 0 is algebraic over f (F ), then λ(E) is algebraically
closed. By Corollary 7.2.3, E 0 is algebraic over λ(E). As an algebraic extension of an algebraically
closed field, E 0 = λ(E). Thus, λ is a surjective embedding, so λ is an isomorphism. 
In light of this proposition, algebraic closures of a field are unique up to isomorphism. Conse-
quently, we talk about the algebraic closure of a field F and we denote it by F . The fact that the
algebraic closure of the algebraic closure of F is just the algebraic closure of F can be succinctly
stated by F = F .
The field of complex numbers C is a field extension of Q that is algebraically closed. By Propo-
sition 7.6.19, the set of algebraic elements in the extension C/Q is the algebraic closure of Q. The
field Q is called the field of algebraic numbers. Though this describes the algebraic closure of Q, the
extension C is not necessarily the field K constructed in the proof of Theorem 7.6.18. Hence, though
we use Proposition 7.6.19 to show that the algebraic numbers are the algebraic closure Q, we did
not need the construction in the proof of Theorem 7.6.18 but rather the Fundamental Theorem of
Algebra.
In contrast, we point out that the Fundamental Theorem of Algebra does not help us to construct
F2 . Instead, we must refer to Theorem 7.6.18. The field F2 is an algebraically closed field of
characteristic 2. Proposition 7.6.19 leads to the intuitive perspective that F2 is a field that contains
all the roots of all polynomials f (x) ∈ F2 [x]. This is interesting because a priori, from the proof
of Theorem 7.6.18, one needs to worry about the roots of polynomials with coefficients in every
algebraic extension of F2 being back in F2 .
We observe that the algebraic closure of a field F does not necessarily have infinite degree. For
example, the algebraic closure of R is C but [C : R] = 2.

Exercises for Section 7.6


1. Find the splitting field of x4 − 3x2 + 1 ∈ Q[x].
2. Find the splitting field of x6 − 2x3 − 1 ∈ Q[x].
3. Find the splitting field of (x2 − 2)(x3 − 2) ∈ Q[x].
4. Find the splitting field of (x3 − 2)(x3 − 7) ∈ Q[x].
5. Find the splitting field of x6 − 5 ∈ Q[x] and determine the degree of the splitting field over Q.
6. Describe the splitting field of x4 + 2x2 + 1 ∈ F7 [x] as a quotient ring of F7 [x].
7. Let F be a field and let a ∈ F . Show that the splitting field of a polynomial p(x) is the same as the
splitting field of the polynomial q(x) = p(x − a).
8. Let p be a prime number and let q be a prime number in N. Prove that the splitting field of xp − q is
an extension of degree p(p − 1) for Q.
9. Let F be a field and let f (x) be an irreducible cubic polynomial in F [x]. Prove that the splitting field
K of f (x) has degree
(
3 if the discriminant ∆ is the square of an element in F ,
[K : F ] =
6 otherwise.
370 CHAPTER 7. FIELD EXTENSIONS

10. Let p(x) ∈ F [x] be a polynomial of degree n and let K be the splitting field of p(x) over F . Prove
that [K : F ] in fact divides n!.
11. Let p(x), q(x) ∈ F [x] be two polynomials with deg p(x) = m and deg q(x) = n. Notice that p(q(x))
is a polynomial of degree mn. Prove that the splitting field E of p(q(x)) has a degree that satisfies
[E : F ] ≤ m!(n!)m . Prove also that for for m, n ≥ 2, this quantity strictly divides (mn)!.
12. Let P (x) be a polynomial in F [x]. Suppose that a dynatomic polynomial ΦP,n (x) has degree k. (See
Exercises 7.5.20 through 7.5.25.) Prove that if the roots of a dynatomic polynomial are only primitive
n-cycles, then k is divisible by n and the degree of the splitting field E of ΦP,n (x) has an index [E : F ]
that is less than or equal to
k(k − n)(k − 2n) · · · (2n) · n · 1.

13. Let p(x) ∈ Q[x] be a palindromic polynomial of even degree 2n. Let K be the splitting field of p(x).
Prove that [K : Q] ≤ 2n n!. [Hint: See Exercise 7.2.9.] [Note: This degree is less than the value of
(2n)! allowed by Theorem 7.6.3.]
14. Prove that a field F is algebraically closed if and only if the only the irreducible polynomials in F [x]
are precisely the polynomials of degree 1.
15. Prove that a field F is algebraically closed if and only if it has no proper algebraic extension.
16. Let K be an algebraic extension of a field F . Prove that K = F .
17. Prove that there exists no algebraically closed field F such that Q ( F ( Q(π).

7.7
Finite Fields
Fields of characteristic 0 and fields of characteristic p have a number of qualitative differences. This
section builds on theorems of Section 7.6 to analyze finite fields. In particular, the main theorem
of this section is that finite fields of a given cardinality are unique up to isomorphism. However, in
order to establish this foundational result, we must take a detour into the concept of separability.

7.7.1 – Separable and Inseparable Extensions


Let F be a field. Let f (x) ∈ F [x] be a polynomial of degree m and let K be a splitting field of f (x)
over F . Suppose that f (x) factors into linear terms in K[x] as

f (x) = am (x − α1 )n1 (x − α2 )n2 · · · (x − αk )nk ,

where αi 6= αj for i 6= j and n1 + n2 + · · · + nk = deg f (x) = m.

Definition 7.7.1
A root αi is said to have multiplicity ni if (x − αi )ni divides f (x) in K[x] but (x − αi )ni +1
does not divide f (x). If ni > 1, we will say that αi is a multiple root.

Since K[x] is a UFD, we can use the ordπ : K[x] → N function and say that α is a root of f (x) if
ord(x−α) f (x) > 0 and that the multiplicity of α is n = ord(x−α) f (x). According to the definition,
α is a multiple root whenever ord(x−α) f (x) > 1.

Definition 7.7.2
A polynomial f (x) ∈ F [x] is called separable if it has no multiple roots in its splitting field
over F .
7.7. FINITE FIELDS 371

Definition 7.7.3
An algebraic extension K/F is called separable if for all α ∈ K, the minimal polynomial
mα,F (x) is a separable polynomial. An algebraic extension that is not separable is called
inseparable.

It may at first seem difficult to imagine a field extension that is not separable. In this section,
we will show that many field extensions that we have studied so far are separable. The following
example illustrates an inseparable extension.

Example 7.7.4. Consider√the infinite field √ F = F3 (x), which


√ has characteristic 3. Consider also
the field extension K = F [ 3 x]. The element 3 x ∈ / F and 3 x has minimal polynomial m(t) = t3 −x.
However,
√ √
m(t) = t3 − 3t2 3 x + 3tx2/3 − x = (t − 3 x)3

so 3 x is a triple root of its own minimal polynomial. Hence, K is an inseparable extension. 4

Exercise 6.5.23 presented the concept of a derivative of a polynomial. In essence, let p(x) ∈ F [x].
def
We define the derivative of p(x) with respect to x as the polynomial Dx (p(x)) = p0 (x), where p0 (x)
is the derivative encountered in calculus. We know that deg Dx (p(x)) < deg p(x) regardless of the
field. Furthermore, the derivative of a polynomial satisfies the addition rule and the Leibniz rule for
multiplication. The polynomial derivative is particularly useful for the following proposition.

Proposition 7.7.5
A polynomial f (x) ∈ F [x] is separable if and only if f (x) and Dx f (x) are relatively prime.

Proof. Suppose that f (x) is not separable. Then there exists a root α of f (x) such that f (x) =
(x − α)2 q(x) in the splitting field K of f (x). Then by the properties of the derivative,

Dx (f (x)) = 2(x − α)q(x) + (x − α)2 Dx (q(x)).

We see that α is a root of Dx (f (x)), so mα,F (x) divides Dx (f (x)). Then mα,F (x) divides both f (x)
and Dx (f (x)) so these two polynomials are not relatively prime.
Conversely, suppose that f (x) and Dx (f (x)) are not relatively prime. Then there exists a monic
irreducible polynomial a(x) of degree greater than 1 that divides them both. Let α be a root of a(x)
in the splitting field K of f (x). Then f (x) = (x − α)q(x) for some polynomial q(x) ∈ K[x]. Thus,

Dx (f (x)) = q(x) + (x − α)Dx (q(x)).

Since (x − α) divides Dx (f (x)), then (x − α) divides

q(x) = Dx (f (x)) − a(x)Dx (q(x)).

Consequently, q(x) = (x − α)g(x) for some polynomial g(x) ∈ K[x] and we deduce that

f (x) = (x − α)2 g(x).

Then f (x) is not separable. 

This proposition leads immediately to the following important proposition.

Proposition 7.7.6
If char F = 0, then every irreducible polynomial is separable.
372 CHAPTER 7. FIELD EXTENSIONS

Proof. Let a(x) be an irreducible polynomial in F [x]. If deg a(x) = 1, then a(x) is separable trivially.
Suppose that deg a(x) ≥ 2. If LT(a(x)) = an xn , then the leading term of Dx (a(x)) is nan xn−1 .
Hence, deg Dx (a(x)) = n − 1 ≥ 1. Since a(x) is irreducible, any polynomial b(x) that divides a(x)
must either be a nonzero multiple of a(x) or a nonzero constant. If b(x) must also divide Dx (a(x)),
then deg b(x) ≤ n − 1. Hence, b(x) cannot be a nonzero constant multiple of a(x). So it must be a
nonzero constant. Thus, Dx (a(x)) and a(x) are relatively prime and so by Proposition 7.7.5, a(x)
is separable. 

Corollary 7.7.7
Let F be a field of characteristic 0. Every algebraic extension of F is separable.

Proof. Let K be an algebraic extension of F and let α ∈ K. Then mα,F (x) is irreducible and by
Proposition 7.7.6, mα,F (x) is separable. Thus, K is separable. 

The proof of Proposition 7.7.6 might not work on all polynomials in F [x] when F has a positive
characteristic. In characteristic 0, if deg a(x) = n ≥ 1, then we know that deg Dx (a(x)) = n − 1.
However, if char F = p and deg a(x) = pk, then the derivative of the leading term is

Dx (LT(a(x))) = Dx (apk xpk ) = pkapk xpk−1 = 0.

Hence, deg Dx (a(x)) < n−1. Furthermore, the derivative of any monomial whose power is a multiple
of p has a derivative that is identically 0. This leads to the following important point.

Proposition 7.7.8
Let F be a field of characteristic p. Suppose that a(x) is irreducible. Then a(x) is separable
if and only if one of the monomials of a(x) has a degree that is not a multiple of p. Fur-
thermore, for any irreducible polynomial, there exists an irreducible separable polynomial
b(x) and a nonnegative integer k
k
a(x) = b xp .

Proof. By Proposition 7.7.5, a(x) is not separable if and only if a(x) and Dx (a(x)) are divisible by
a factor of degree greater than 0 in F [x]. However, since a(x) is irreducible, the only divisor of a(x)
of degree greater than 0 is any nonzero multiple of itself. Hence, a(x) is not separable if and only
if a(x) divides Dx (a(x)). Since either Dx (a(x)) = 0 or deg Dx (a(x)) < deg a(x), we conclude that
a(x) is not separable if and only if Dx (a(x)) = 0 if and only all monomials of a(x) have a degree
that is a multiple of p. This proves the first claim of the proposition.
Consequently, if a(x) is not separable, then a(x) = a1 (xp ) for some polynomial a1 (x).
Let k be the greatest nonnegative integer such that pk divides the degree of all monomial of a(x).
`
Then a(x) = b xp and at least one term of b(x) has a degree not divisible by p.. By the first part
of the theorem, then b(x) is separable. Furthermore, if b(x) is reducible with b(x) = b1 (x)b2 (x), then
k k
a(x) is reducible with a(x) = b1 (xp )b2 (xp ). By a contrapositive, since a(x) is irreducible, then
b(x) is irreducible. 

7.7.2 – Classification of Finite Fields


As we will soon see, our strategy to classify all finite fields relies on the notion of separability.
Consequently, we are now in a position to establish the main result of this section.
Every field F of a positive characteristic has char F = p where p is a prime number. By Propo-
sition 7.1.3, F is an extension of the finite field Fp of p elements and there is a unique field of order
p up to isomorphism. Since a field extension K/F is a vector space, then a finite field K with index
[K : F ] = n has |K| = pn elements. This proves the first important result on finite fields.
7.7. FINITE FIELDS 373

Proposition 7.7.9
Let F be a finite field with |F | = q. Then q = pn for some prime p and some positive
integer n. In this case, F is an extension of Fp of degree n.

Exercise 5.4.8 introduced the Frobenius homomorphism σp : R → R on a ring R of characteristic


p defined by σp (α) = αp . If F is a field of characteristic p, then σp is an injective homomorphism.
As we will see, this is an important function in the context of finite fields. If F is a finite field of
characteristic p, then σp is also an automorphism.

Definition 7.7.10
If F is finite the function σp : F → F is called the Frobenius automorphism. If F is not
finite, σp is called the Frobenius endomorphism.

By Fermat’s Little Theorem (Theorem 2.2.16), the Frobenius automorphism σp is the identity
function on Fp . However, on field extensions of Fp , the automorphism is nontrivial. For example,
consider the field of order 9 defined by F = F3 [x]/(x2 +x+2). Let us call θ the element corresponding
to x in F . Notice that θ2 = 2θ + 1. Then F = {a + bθ | a, b ∈ F3 }. Obviously, σ3 (a) = a for all
a ∈ F3 . However,

σ3 (θ) = θ3 = (2θ + 1)θ = 2θ2 + θ = 2(2θ + 1) + θ = 2θ + 2.

The Frobenius automorphism helps us to establish the following proposition.

Proposition 7.7.11
Every irreducible polynomial over a finite field F is separable.

Proof. Let a(x) be a inseparable polynomial over a finite field of characteristic p. By Proposi-
tion 7.7.8, a(x) = b(xp ) for some polynomial b(x). Then

a(x) = b(xp ) = bn (xp )n + · · · + b1 (xp ) + b0 .

However, since the Frobenius automorphism is a bijection on the finite field, for each i = 0, 1, . . . , n,
there exist ci ∈ F such that cpi = bi . Hence,

a(x) = (cn )p (xn )p + · · · cp1 xp + cp0


= (cn xn )p + · · · + (c1 x)p + cp0
= (cn xn + · · · + c1 x + c0 )p = (c(x))p .

In particular, a(x) is reducible. By a contrapositive, if a(x) is irreducible then it is separable. 

Proposition 7.7.9 pointed out that any finite field has order pn . The converse is the main theorem
of this section.

Theorem 7.7.12
For all primes p and for all positive integers n, there exists a unique (up to isomorphism)
finite field of order pn . Furthermore, every finite field is isomorphic to one of these.

Proof. Proposition 7.7.9 established the second part of the theorem. We need to prove the first part.
n
Consider the polynomial xp − x ∈ Fp [x]. Then
n
Dx (xp − x) = −1,
n n
which has no roots. Hence, xp −x and −1 are relatively prime and so xp −x is separable. Therefore,
this polynomial has pn distinct roots in its splitting field K.
374 CHAPTER 7. FIELD EXTENSIONS

Call S the set of distinct roots. Let α, β ∈ S. Then


n n n n
(α − β)p = σpn (α − β) = αp + (−1)p β p = α − β,
n
where the last equality holds because α and β are in S and because (−1)p = −1 for all primes
p. Hence, (S, +) is a subgroup of (K, +) by the One-Step Subgroup Criterion. Similarly, if α, β ∈
S − {0}, then
 p n
α αp α
= pn =
β β β
where the last equality follows from the property that α, β ∈ S. Hence, (S − {0}, ×) is a subgroup
of (K − {0}, ×). Thus, S is a subring of K. However, since K is a smallest subfield by inclusion
n n
in which xp − x splits. Then S = K and the roots of xp − x are precisely all the elements in the
splitting field K. Thus, we have established the existence of a field of order pn .
Conversely, let F be a field of cardinality pn . The prime subfield of F is Fp and [F : Fp ] = n. By
Proposition 7.5.2, U (F ) is a cyclic group of order pn − 1. Thus, every element of F − {0} satisfies
the polynomial equation
n
xp −1 − 1 = 0.
n n n
The polynomials xp −1 − 1 and xp − x are in Fp [x]. Therefore, F is the splitting field of xp − x.
Finally, by Theorem 7.6.10, splitting fields are unique up to isomorphism so any two fields of
cardinality pn are isomorphic. 

This theorem allows us to make the following definition.

Definition 7.7.13
If q is a prime power q = pn , we denote by Fq or Fpn the unique field of order q.

The uniqueness of finite fields of a given finite cardinality is not obvious from how we construct a
finite field. As a simple example, let us consider the field of 8 elements. The polynomials x3 + x + 1
and x3 + x2 + 1 are irreducible cubic polynomials in F[x]2 . Consequently, we could construct a field
of eight elements by

K1 = F2 [x]/(x3 + x + 1) or K2 = F2 [x]/(x3 + x2 + 1).

Let us call α ∈ K1 as an element such that α3 + α + 1 = 0 and β ∈ K2 such that β 3 + β 2 + 1 = 0.


It is easy to check that α + 1 ∈ K1 does not satisfy x3 + x + 1 = 0 but rather x3 + x2 + 1 = 0.
Similarly, β + 1 ∈ K2 does not satisfy x3 + x + 1 = 0 but rather x3 + x2 + 1 = 0. Consequently,
K2 = F2 [α + 1] and K1 = F2 [β + 1]. This shows that K1 ∼ = K2 via the isomorphism that extends
the identity on F2 via α 7→ β + 1.
To see the strategy of Theorem 7.7.12 at work, we remark that x8 − 1 factors into irreducibles
in F2 [x] as
x8 − 1 = x8 + 1 = x(x + 1)(x3 + x + 1)(x3 + x2 + 1).

Theorem 7.7.12 established the unique field of 8 elements is the splitting field of x8 + 1. Using K1
as a reference,

• 0 is the root of x = 0;

• 1 is the root of x + 1 = 0;

• α, α2 , and α2 + α are roots of x3 + x + 1 = 0;

• α + 1, α2 + 1, and α2 + α + 1 are roots of x3 + x2 + 1 = 0.


7.7. FINITE FIELDS 375

Theorem 7.7.12 affirms that a similar partitioning occurs in the construction of every finite field.
The above remark about the factorization of x8 −1 generalizes to any prime p and any field extension
of degree n. Denote by Ψp,n (x) the product of all irreducible polynomials of degree n in Fp [x]. Then
n Y
xp − x = Ψp,n (x).
d|n

By Theorem 7.5.12, applying Möbius inversion gives


Y  n/d µ(d)
Ψp,n (x) = xp −x ,
d|n

where µ(n) is the Möbius function on positive integers. In particular, this implies that
X
deg Ψp,n (x) = µ(d)pn/d .
d|n

However, each irreducible factor of Ψp,n (x) has degree n so we have proved the following result.

Proposition 7.7.14
There are
1X
µ(d)pn/d
n
d|n

monic irreducible polynomials of degree n in Fp [x].

We conclude the section with a brief comment on the subfield structure of finite fields.
Exercise 7.7.8 asks the reader to prove that, for any prime p, the field Fpd is a subfield of Fpn if
and only if d|n. Consequently, the Hasse diagram representing the subfield structure of Fpn is the
same as the Hasse diagram of the partial order of divisibility on the divisors of n. For example, if
n = 100, for any prime p the subfield structure of Fp100 has the following Hasse diagram.

Fp100

Fp20 Fp50

Fp4 Fp10 Fp25

F p2 F p5

Fp

Exercises for Section 7.7


1. Give the addition and multiplication tables for F9 .
2. Give the multiplication table for F16 .
3. Define F8 as F2 [x]/(x3 + x + 1). Let θ ∈ F8 be an element such that θ3 + θ + 1 = 0.
(a) Write down the addition and the multiplication tables for elements in this field.
(b) In F8 , solve the equation (θ2 + 1)(α − (θ + 1)) = θα + 1.
(c) Solve the following system of two linear equations in two variables in the field F8 :
(
θ2 α + (θ + 1)β =θ
(θ2 + 1)α + θ2 β = 1
376 CHAPTER 7. FIELD EXTENSIONS

4. Write x9 − x as a product of irreducible polynomials in F3 [x].


5. Show that every element besides the identity 1 in F32 is a generator of U (F32 ).
6. Let p be an odd prime. Prove that
Y
xp−1 − 1 = (x − α).
α∈U (Fp )

Deduce that (p − 1)! ≡ −1 (mod p). (This fact is called Wilson’s Theorem.) Prove also that if n is a
positive composite integer greater than 4, then (n − 1)! ≡ 0 (mod n).
7. Let a ∈ Fp − {0}. Prove that the polynomial xp − x + a ∈ Fp [x] is irreducible.
n
8. Suppose that d | n. Prove that Fpd ⊆ Fpn and that [Fpn : Fpd ] = d
.
4
9. Consider the polynomial p(x) = x + x + 1 ∈ F2 [x].
(a) Show that p(x) is irreducible.
(b) Show that p(x) factors into two quadratics over F4 and exhibit these two quadratic polynomials.
10. Prove that a polynomial f (x) over a field F of characteristic 0 is separable if and only if it is the product
of irreducible polynomials that are not associates of each other. [Note: Consequently, separable
polynomials over a field of characteristic 0 are polynomials that are square-free in F [x].]
11. Find a generator of U (F27 ).
12. The polynomial p1 (x) = x2 + x + 1 ∈ F5 [x] is irreducible. Call θ an element in F25 = F5 [x]/(p1 (x))
that satisfies θ2 + θ + 1 = 0.
(a) Find all other irreducible monic quadratic polynomials in F5 [x].
(b) For each of the 10 polynomials found in the previous part, write the two roots in F25 as aθ + b
for a, b ∈ F5 .
13. Let q = pn . Prove that the Frobenius automorphism ϕ = σp : Fq → Fq is a Fp -linear transformation.
Prove also that ϕn is the identity transformation.
14. Consider the Frobenius map ϕ from the previous exercise. Determine the eigenvalues and all corre-
sponding eigenspaces for ϕ.
15. Consider the Frobenius automorphism σ3 : F9 → F9 . Show how σ3 maps the elements of F9 . [Hint:
Use the identification F9 = F3 [x]/(x2 + x + 2).]
16. Prove that (1 + xp )n = (1 + x)pn in Fp [x]. Deduce that pn ≡ nk (mod p).
 
pk

17. Let Fq be a finite field and let f (x) be an irreducible polynomial of degree n in Fq [x]. Suppose that
α is one of the roots of f (x) in the field Fqn . Prove that
2 n−1
α, αq , αq , . . . , αq

are the n distinct roots of f (x) in Fqn .


18. Prove that a polynomial of degree m = 2k over F2 [x] is irreducible if and only if it divides

2k 2k−1
(x2 + x)/(x2 + x).

7.8
Projects
Project I. Field Extensions in Mn (F ). Revisit Exercises 7.1.19 and 7.1.20 Try to generalize
these results to other or any simple extension F (α) of a field F . Use your results to illustrate
interesting multiplications and divisions in the field F (α).
7.8. PROJECTS 377

Project II. Cardano’s Triangle. Recall Cardano’s method to solve the cubic equation. When
the discriminant is negative, so that the solution has three real roots, a geometric interpretation
of the method shows the roots arising as the projections onto the x-axis of the vertices of some
equilateral triangle rotated around some point on the x-axis. (For the equation, x3 +px+q = 0,
that point is the origin.) Explore the solution of the cubic from a geometric perspective. Can
you see how to start from the geometry of projecting vertices of an equilateral triangle into a
cubic equation? Explain the solution to a cubic equation from this geometric perspective.
Project III. Cardano’s Method in C. Section 7.3 presented Cardano’s method for solving the
cubic and quartic equation with the assumption that the coefficients of the polynomial area
real. Discuss the method and the content of the section assuming that the coefficients of the
polynomial are complex numbers. How much changes and how much stays the same?

Project IV. Constructing a Regular 17-gon. The prime number 17 has φ(17) = 16 = 24 .
Hence, [Q(ζ17 ) : Q] = 16. Call ζ = ζ17 . Explain why Theorem 7.4.5 does not rule out the
possibility of constructing cos(2π/17) = 21 (ζ + ζ −1 ). In fact, we will see in Section 11.2 that
this guarantees that cos(2π/17) is constructible. Show that:
• α1 = ζ + ζ 2 + ζ 4 + ζ 8 + ζ 9 + ζ 13 + ζ 15 + ζ 16 is real and is the root of a quadratic polynomial
in Q;
• α2 = ζ + ζ 4 + ζ 13 + ζ 16 is real and is the root of a quadratic polynomial in Q(α1 );
• α3 = ζ + ζ 16 = 2 cos(2π/17) is the root of a quadratic polynomial in Q(α2 ).
 

Use this sequence to write cos as a combination of nested square root expressions. Also
17
use this sequence to find a straightedge and compass construction of the regular 17-gon. Justify
your construction.
Project V. Irreducible Polynomials in F2 [x]. In certain applications of cryptography, it is
particularly useful to have irreducible polynomials of degree n in F2 [x]. Providing at least one
for each n, attempt to find as many irreducible polynomials of degree k = 2, 3, . . . , n. Do you
see any patterns in which polynomials will be irreducible? Can you devise a fast algorithm to
find an irreducible polynomial of degree n in F2 [x].
Project VI. Epicycloids in Z/nZ? Let n be a somewhat large, say n ≥ 40 integer and consider
the group µn of nth roots of unity. This is a finite subgroup of (U (C), ×). Consider plotting
the elements of µn on the unit circle in C. For various n and for a given small positive integer
m trace an edge between z and z m for all z ∈ µn . The edges create an envelope of a certain
epicycloid. Explain why this is true? Study properties of the epicycloid depending on m and
n. If this graph were created by nail and string artwork, is it ever possible to create the work
with a single piece of string? (Why or why not?)
Project VII. Frobenius Automorphism. For various values of p and n, find a matrix corre-
sponding to the Frobenius automorphism σp on Fpn as a linear transformation on Fpn as a
vector space over Fp . Can you identify patterns in this associate matrix?
8. Group Actions

Though this textbook waited until this point to introduce group actions, from a historical perspec-
tive, group actions came first and motivated group theory. Mathematicians did not define groups ex
nihilo and study their properties from their axioms. Evariste Galois, often credited with defining a
group in the modern sense (Definition 3.2.1), formalized the axioms of groups while studying proper-
ties of symmetry among roots of a polynomial. Subsequent work by mathematicians simultaneously
revealed the richness of the algebraic structure of groups, discovered group-theoretic patterns in
many areas, and developed Galois theory, which applies group actions to the study of polynomial
equations. This textbook covers group actions in this chapter and Galois theory in Chapter 11.
Historically, mathematicians arrived at groups as a set S of bijective functions f : X → X (that
perhaps preserved some interesting property), in which S is closed under composition and taking
function inverses. In broad strokes, group actions involve viewing a group G as a subgroup of the
group of permutations on a set. More precisely, if a group G acts on a set X, then each group
element is a bijective function on X. Section 8.1 defines group actions in the modern sense and
offers many examples.
The perspective of group actions that simultaneously consider properties in the group G and in
the set X on which it acts leads to more information that is available simply from the group itself.
Section 8.2 presents orbits and stabilizers, which specifically considers this interplay between the set
and the group. Section 8.3 presents some properties that are specific to transitive group actions,
including block structures in group actions.
The general theory of group actions proves to be particularly fruitful when we consider a group
acting on itself in some manner. Section 8.4 presents results pertaining to the action of a group on
itself by left multiplication and by conjugation, resulting in Cayley’s Theorem, Cauchy’s Theorem,
and the Class Equation. Section 8.5 introduces a specific action of a group on certain subsets
of its subgroups, which leads to Sylow’s Theorem, a profound result in group theory with many
consequences for the classification of groups.
Section 8.6 offers a brief introduction to the representation theory of finite groups. Though
representation theory is a broad branch of algebra, this section gives a glimpse into how the interplay
between a group acting on a set with some other structure often uncovers interesting results about
both structures. The section serves the dual role of further illustrating groups actions and whetting
the reader/student’s appetite for further study in that area.

8.1
Introduction to Group Actions
To introduce group actions, we consider the dihedral group as first presented in Section 3.1. Chap-
ter 3 presented many examples involving the dihedral group simply in terms of internal group
structure. However, at the outset, we introduced Dn as a set of bijections on the vertices of the
regular n-gon. Hence, if we label the vertices of the regular n-gon as {1, 2, . . . , n}, then Dn can be
viewed as a subgroup of Sn , the set of bijections on the vertices.
Group actions generalize as broadly as possible the perspective of viewing groups as sets of
functions on a set. More precisely, we take the algebraic structure of groups and connect them to
the algebraic structure of sets by seeing how groups can be understood as transformations on a set.
We warn the reader that since group actions arise in so many different contexts within mathe-
matics, there exist a variety of different notations and expressions.

379
380 CHAPTER 8. GROUP ACTIONS

8.1.1 – Group Actions: Definition

Definition 8.1.1
A group action of a group G on a set X is a function from G × X → X, with outputs
written as g · x or simply gx, satisfying

(1) (Compatibility) g1 · (g2 · x) = (g1 g2 ) · x, for all g1 , g2 ∈ G and x ∈ X;


(2) (Identity) 1 · x = x, for all x ∈ X.
If there exists a group action of G on X, we say that G acts on X.

If a group G acts on a set X, then X is sometimes called a G-set. As another point of terminology,
the function G×X → X is sometimes also called a pairing. Some authors use the shorthand notation
G X to mean “the group G acts on the set X.”
The axioms capture the desired intuition for groups as sets of functions on X. In essence, every
group element behaves like a function on X in such a way that function composition corresponds to
the group operation and the identity of the group behaves as the identity function. More precisely,
for each g ∈ G, the operation g · x is a function we can denote by σg : X → X with σg (x) = g · x.
Recall that we denote by SX the set of bijective functions from a set X to itself.

Proposition 8.1.2
Let G be a group acting on a set X.

(1) For all g ∈ G, the function σg is a permutation of X.


(2) The map ρ : G → SX defined by ρ(g) = σg is a homomorphism.

Proof. For all g ∈ G, and for all x ∈ X,

σg−1 (σg (x)) = g −1 · (g · x) = (g −1 g) · x = 1 · x = x.

Similarly, σg (σg−1 (x)) = x. Hence, the function σg : X → X is bijective with inverse function
(σg )−1 = σg−1 .
To show that ρ is a homomorphism, let g1 , g2 ∈ G. Then ρ(g1 ) ◦ ρ(g2 ) is a bijection X → X such
that for all x,

ρ(g1 ) ◦ ρ(g2 )(x) = σg1 (σg2 (x)) by definition of ρ


= σg1 (g2 · x)
= g1 · (g2 · x)
= (g1 g2 ) · x by compatibility axiom
= σg1 g2 (x)
= ρ(g1 g2 )(x).

Thus, ρ(g1 g2 ) = ρ(g1 ) ◦ ρ(g2 ) and therefore ρ is a homomorphism. 

In other words, actions of a group G on a set X are in one-to-one correspondence with homo-
morphisms from G to SX . Any homomorphism ρ : G → SX is called a permutation representation
because it relabels the elements of G with permutations. We say that a group action induces a
permutation representation of G. This inspires us to give an alternate definition for a group action
that is briefer than Definition 8.1.1.
8.1. INTRODUCTION TO GROUP ACTIONS 381

Definition 8.1.3 (Alternate)


A group action is a triple (G, X, ρ) where G is a group, X is a set, and ρ : G → SX is a
homomorphism. The image element ρ(g)(x) is simply often denoted by gx.

In every group action, the group identity acts as the identity function on X. However, in an
arbitrary group action, many other group elements could have no effect on X. It is an important
special case when the group identity is the only group element that acts as the identity.

Definition 8.1.4
Suppose that a group action of G on X has a permutation representation of ρ. Then the
action is called faithful if Ker ρ = {1}. We also say that G acts faithfully on X.

Since Ker ρ = {1}, then ρ is injective. This means that ρ(g) 6= ρ(h) for all g 6= h in G.
Therefore, the action is faithful if and only if each distinct group element corresponds to a different
function on X. Furthermore, by the First Isomorphism Theorem, if an action is faithful, then
G∼= G/(Ker ρ) ∼ = Im ρ, which presents G as a subgroup of SX .
Definition 8.1.1 is sometimes called a left group action of G on the set X to reflect the notational
habit of applying functions on the left of the domain element. Beyond notation habit, in a left group
action, the composed element (g1 g2 ) · x involves first acting on x by g2 and then by g1 .

8.1.2 – Examples of Group Actions


Arguably, every application that involves groups to study some other problem involves a group
action. Consequently, there are countless examples of group actions. This section offers a few basic
examples.
Example 8.1.5 (Dihedral Group). If we label the vertices as shown below, Dn acts on the ver-
tices of a regular n-gon by performing the geometric transformation.

3 2

4 1

5 6

Figure 8.1: A hexagon

With the specific example of a hexagon, if we label the vertices of the hexagon {1, 2, 3, 4, 5, 6} as
in Figure 8.1, then the corresponding permutation representation ρ satisfies
ρ(r) = (1 2 3 4 5 6)
ρ(s) = (2 6)(3 5).
Note that if we labeled the vertices of the hexagon differently, then we would induce a different
homomorphism of D6 into S6 .
It is evident from the construction that Dn acts faithfully on {1, 2, . . . , n}. Indeed, when defining
elements in Dn we only care about bijections on the set of vertices and we do not consider two
functions different if they have the same effect on all elements of {1, 2, . . . , n}. 4
Example 8.1.6 (Permutation Group). As mentioned in the introduction to this section, the
permutation group Sn acts on the set {1, 2, . . . , n} by viewing each permutation σ ∈ Sn as a bijection
382 CHAPTER 8. GROUP ACTIONS

on {1, 2, . . . , n}. This example is not surprising since Sn was essentially defined by bijections on
{1, 2, . . . , n} compose with each other. The permutation action of Sn on {1, 2, . . . , n} is faithful. 4

Example 8.1.7 (Linear Algebra). Let F be a field and consider the vector space V = F n over
the field F , for some positive integer n. Then the group GLn (F ) acts on V by multiplying a vector
by an invertible matrix. Explicitly, the pairing of the action GLn (F ) × V → V is A · ~v = A~v , where
the right-hand side is matrix-vector multiplication.
Note that though the notation A~v is familiar from linear algebra, this is the first time we give
precise algebraic context to matrix-vector multiplication.
We can show that this action is faithful by considering how GLn (F ) acts on the standard basis
vectors ei in which the n-tuple that is all 0s except for a 1 in the ith entry. Recall that Aei is the
ith column of A. Therefore, if Aei = ei for all i = 1, 2, . . . , n, then the ith column of A is ei (as a
column), so A = I. 4

Example 8.1.8 (Trivial Action). Let G be a group and X any set. Then the action gx = x for
all g ∈ G and x ∈ X is called the trivial action of G on X. Every group element g acts as the identity
on X. In this sense, every group can act on every set. Intuitively, a trivial action is opposite from
a faithful action in that Ker ρ = G for a trivial action, whereas Ker ρ = {1} for a faithful action. 4

Example 8.1.9. Consider the group D6 and how it acts on the diagonals of the hexagon. The
diagonals of the hexagon are d1 = {1, 4}, d2 = {2, 5}, and d3 = {3, 6}. For any of these 2-element
subsets of vertices, any dihedral symmetry of the hexagon maps a diagonal into another diagonal.
Therefore, D6 acts on {d1 , d2 , d3 }.
This action is not faithful because r3 · di = di for i = 1, 2, 3. Note that s ∈
/ Ker ρ because even
though s · d1 = d1 , we also have s · d2 = d3 . The permutation representation ρ of this group action
is completely defined by ρ(r) = (1 2 3) and ρ(s) = (2 3). 4

Example 8.1.10. Consider the group D5 and consider the set of 11 polygonal regions inside the
pentagon bordered by the complete graph on the set of vertices as shown below. The dihedral group
D5 acts on the set of polygonal regions. If we label the regions with the integers 1, 2 . . . , 11, the
action induces a homomorphism ρ : D5 → S11 .
v2
v3

v1

v4
v5 4

Example 8.1.11 (Rigid Motions of the Cube). Consider the group G of rigid motions (solid
rotations) of the cube. There are many actions that are natural to consider. G acts on:

• the set of vertices of the cube (8 elements);

• the set of edges of the cube (12 elements);

• the set of faces of the cube (6 elements);

• the set of diagonals on the faces of the cube (12 elements);

• the set of diagonals through the centroid of the cube (4 elements);

• the set of segments that connect centroids of opposite faces on the cube (3 elements).
8.1. INTRODUCTION TO GROUP ACTIONS 383

8
7
5
6
M0
180◦ M

4
3
1
2

Figure 8.2: Rigid motion that transposes two diagonals

As one action of particular interest, consider the action of G on the diagonals through the center
of the cube. This action correspond to a homomorphism ρ : G → S4 . Let us label the vertices of the
cube by {1, 2, 3, 4, 5, 6, 7, 8} and label these long diagonals by d1 = {1, 7}, d2 = {2, 8}, d3 = {3, 5},
and d4 = {4, 6}. See Figure 8.2.
Pick an edge e. Let e0 be the edge of the cube that centrally symmetric to e through the middle
(centroid) O of the cube. Let M and M 0 be the midpoints of e and e0 respectively. Let Re be the
rotation of the cube by 180◦ around the line (M M 0 ). The rigid motion Re interchanges the two
diagonals that are on the square defined by the edge e and e0 (the plane defined by e and O) but it
leaves unchanged the diagonals that are in the plane perpendicular to the plane defined by e and e0 .
Thus, ρ(Re ) is a transposition in S4 , the symmetric group on {d1 , d2 , d3 , d4 }. There are six pairs of
centrally symmetric edges, which lead to six distinct rigid motions of the form Re , which induce the
6 transpositions in S4 . Since S4 is generated by its transpositions, we deduce that ρ(G) = S4 .
Furthermore, it is not hard to show (see Exercise 3.3.27) that |G| = 24. Thus, therefore the
homomorphism ρ is a surjective function between two finite groups of the same size. We deduce
that ρ is bijective, so ρ is an isomorphism. We conclude that the group of rigid motions of the cube
is isomorphic to S4 . 4

Example 8.1.12 (Sets of Functions). Let Fun(X, Y ) be the set of functions from the set X to
a set Y .
If G acts on Y , then there is a natural action of G on Fun(X, Y ) via

(g · f )(x) = g · (f (x)).

It is an easy exercise to see that this is a group action. If G acts on X, then there also exists a
natural action of G on Fun(X, Y ) via

(g · f )(x) = f (g −1 · x). (8.1)

It is crucial that the right-hand side involve g −1 . We check the compatibility axiom for this action.
Let g, h ∈ G and let f ∈ Fun(X, Y ). Writing h · f = f 0 , then the function g · (h · f ) satisfies

(g · (h · f ))(x) = f 0 (g −1 · x) = f (h−1 · (g −1 · x))


= f ((h−1 g −1 ) · x) = f ((gh)−1 · x) = ((gh) · f )(x).

It easy to check the identity axiom. Hence, (8.1) does indeed define an action on Fun(X, Y ). 4

Example 8.1.13 (Rearrangement of n-tuples). Let A be a set, let n be a positive integer, and
let X = An be the set of n-tuples of A. Consider the pairing Sn × X → X that permutes the entries
of (a1 , a2 , . . . , an ) ∈ An according to a permutation σ. In other words, in the action of σ on the
384 CHAPTER 8. GROUP ACTIONS

n-tuple (a1 , a2 , . . . , an ), the ith entry is sent to the σ(i)th position. Note that in σ · (a1 , a2 , . . . , an ),
the ith entry is the σ −1 (i)th entry of (a1 , a2 , . . . , an ). Thus,
σ · (a1 , a2 , . . . , an ) = (aσ−1 (1) , aσ−1 (2) , . . . , aσ−1 (n) ). (8.2)
We show that this defines a group action of Sn on X = An . First, for all τ, σ ∈ Sn , we have
τ · (σ · (a1 , a2 , . . . , an )) = τ · (aσ−1 (1) , aσ−1 (2) , . . . , aσ−1 (n) )
= (aσ−1 (τ −1 (1)) , aσ−1 (τ −1 (2)) , . . . , aσ−1 (τ −1 (n)) )
= (a(τ σ)−1 (1) , a(τ σ)−1 (2) , . . . , a(τ σ)−1 (n) )
= (τ σ) · (a1 , a2 , . . . , an ).
Also, 1 · (a1 , a2 , . . . , an ) = (a1 , a2 , . . . , an ) since it does not permute the elements.
If (8.2) seems counterintuitive at first, observe that as sets An is equal to Fun({1, 2, . . . , n}, A)
and the action described in (8.2) is precisely the action defined in Example 8.1.12.
In contrast, it is important to realize that the function Sn × X → X defined
σ · (a1 , a2 , . . . , an ) = (aσ(1) , aσ(2) , . . . , aσ(n) )
is not a group action of Sn on X as it fails the compatibility axiom. 4
There are many other types of group actions of considerable interest. The examples provided so
far just scratch the surface. The following subsection presents a few important actions of a group
acting on itself. Following that, the reader is encouraged to peruse the exercises for many other
examples.

8.1.3 – Group Actions as an Algebraic Structure


With a fixed group G, we can understand groups actions of G as an algebraic structure. However,
this perspective focuses on the sets and so it is common to refer to the algebraic structure of G-
sets. In light of this, it is natural to discuss subobjects and morphisms related to group actions.
Intuitively, a subobject of a G-set is a subset that is a G-set in its own right and a morphism between
G-sets should preserve the group action. We make these definitions precise.

Definition 8.1.14
Let G be a group and let X be a G-set. A G-subset of X is a subset S ⊆ X such that
g · x ∈ S for all g ∈ G and all x ∈ S. We also say that S is closed under the action of G or,
equivalently, that S is invariant under G.

Whenever a subset S of X is closed under the action of G, then the axioms of the action of G
on X restrict to S, giving S the structure of a G-set.
As an example, consider the plane R2 equipped with an origin O and with a labeled x-axis and
y-axis. Consider the natural action of the dihedral group Dn on R2 , where r corresponds to rotation
by 2π/n around the origin O and where s corresponds to reflection about the x-axis. Any Dn -subset
of R2 is a subset of R2 that has dihedral symmetry, i.e., is invariant under the action of Dn .
Example 8.1.15. Let G be the group of rigid motions of a cube described in Example 8.1.11 and
let V be the set of vertices of the cube. There is a natural action of G on V by how the rotation maps
the vertices. Namely for every vertex v ∈ V , g · v is the image of the vertex v under the rotation g.
Let P(V ) be the set of subsets of vertices of V . We equip P(V ) with the G-action defined by
g · {x1 , x2 , . . . , xn } = {g · x1 , g · x2 , . . . , g · xn }.
The set of edges E of the cube is a G-subset of P(V ) since every solid rotation of the cube maps an
edge to another edge. The G-set P(V ) has many other G-subsets, e.g., the set of unordered pairs of
vertices {S ⊆ V | |S| = 2}, the set of faces, the set of long diagonals, etc. However, not all subsets
of P(V ) are G-subsets. For example, given a fixed vertex v0 , the singleton set {v0 } is not a G-subset
since any solid rotation g ∈ G that does not leave v0 fixed satisfies g · v0 ∈
/ {v0 }. 4
8.1. INTRODUCTION TO GROUP ACTIONS 385

Definition 8.1.16
Let G be a group and let X and Y be two G-sets (i.e., there is an action of G on X and
on Y ). A G-set homomorphism between X and Y is a function f : X → Y such that

f (g · x) = g · f (x) for all g ∈ G and all x ∈ X.

An isomorphism of G-sets if a G-set homomorphism that is also bijective.

Exercises 8.1.13 and 8.1.14 establish some results about G-set homomorphisms that we might
expect from standard results about group homomorphisms and ring homomorphisms.
Note that in Definition 8.1.16 the group G acting on X and Y is the same. In this perspective, if G
and G0 are nonisomorphic groups, then we consider the collection of G-sets and G0 -sets as two distinct
algebraic structures. Because of this restriction, the above definition might feel unsatisfactory. For
example, suppose that a group G acts on a set X and a group H acts on a set Y . We might consider
the group actions as equivalent if that action is identical after a relabeling of the elements in G with
elements in H and a parallel relabeling of elements in X with elements in Y . To name this desired
phenomenon, we use the following definition.

Definition 8.1.17
A group action homomorphism between two group actions (G, X, ρ1 ) and (H, Y, ρ2 ) is a
pair (ϕ, f ), where ϕ : G → H is a homomorphism and f : X → Y is a function such that

f (g · x) = ϕ(g) · f (x) for all g ∈ G and all x ∈ X.

A group action isomorphism (or permutation isomorphism) is a group action homomor-


phism (ϕ, f ) such that ϕ is an isomorphism and f is a bijection.

If a group action isomorphism exists between two group actions, they are called isomorphic (or
permutation equivalent).
Example 8.1.18. Consider the natural action of GL2 (F2 ) on the vector space X = F22 of four
elements over F2 . Also consider the action of S3 on the set Y = {0, 1, 2, 3} by fixing 0 and permuting
{1, 2, 3} as usual. Let ϕ : GL2 (F2 ) → S3 be the isomorphism described in Example 3.7.20. Then
the bijection f : X → Y that maps
       
0 1 0 1
7−→ 0 7−→ 3 7−→ 2 7−→ 1
0 0 1 1
makes the pair (ϕ, f ) into an isomorphism between group actions. 4

Exercises for Section 8.1


1. Consider the natural action of D7 on labeled vertices of a regular heptagon. Consider the induced
permutation representation ρ : D7 → S7 . Determine ρ(r) and ρ(s) with respect to your labeling of
the vertices and choice of reflection for s.
2. Let n be a positive integer and consider the group GLn (R). Prove that the pairing GLn (R) × R → R
defined by A · x = det(A)x is a group action. Prove also that it is not faithful.
3. Let G = D6 and let H = hr2 i. Since H EG, conjugation of G on H is an action of G on H. Label the H
elements 1, r2 , and r4 as 1, 2, and 3 respectively and consider the induced permutation representation
ρ : D6 → S3 .
(a) Exhibit the images under ρ of all elements in D6 .
(b) State the kernel of ρ and show that the action is not faithful.
4. Let G be a group acting on a set X. Show that defining
def
g · S = {g · s | s ∈ S}
for all g ∈ G and all S ⊆ X induces an action of G on P(X).
386 CHAPTER 8. GROUP ACTIONS

5. Let F be a field, let G = GLn (F ), and let X = Mn×n (F ) be the set of n × n matrices with entries in
F . Discuss how the relation of similarity on square matrices in Mn×n (F ) is related to a group action
of G on X.
6. Fix a positive integer n. Let X = {1, 2, . . . , n} and consider the mapping Sn × P(X) → P(X) defined
by
σ · {x1 , x2 , . . . , xk } = {σ(x1 ), σ(x2 ), . . . , σ(xk )}.
(a) Prove that this pairing defines an action of Sn on P({1, 2, . . . , n}).
(b) For a given k with 0 ≤ k ≤ n, define Pk (X) as the set of subsets of X of cardinality k. Prove
that Pk (X) are closed under the action of Sn on P(X).
(c) Prove that a subset Y of P(X) is closed under the action of Sn if and only if Y is the union of
some Pk (X).
7. Consider the action defined in Exercise 8.1.6 where n = 4 and k = 2. The induced permutation
representation is a homomorphism ρ : S4 → S6 . Label the elements in P2 ({1, 2, 3, 4}) according to the
following chart.
label 1 2 3 4 5 6
subset {1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4}
Give ρ(σ) as a permutation in S6 for the permutations σ = 1, (1 2), (1 2 3), (1 2)(3 4), and (1 2 3 4).
8. Let Pn be the of polynomials in R[x] that have degree n or less (including the 0) polynomial. Consider
σ ∈ Sn+1 as a permutation on {0, 1, 2, · · · , n} and define

σ · (an xn + · · · + a1 x + a0 ) = aσ−1 (n) xn + aσ−1 (n−1) + · · · + aσ−1 (1) x + aσ−1 (0) .

(a) Prove that this defines an action of Sn+1 on Pn .


(b) Decide with proof whether σ · (a(x) + b(x)) = σ · a(x) + σ · b(x).
(c) Decide with proof whether σ · (a(x)b(x)) = (σ · a(x))(σ · b(x)).
9. Suppose that H is a group acting on a set X and let ϕ : G → H be a group homomorphism. Prove
that the pairing G × X → X defined by g · x = ϕ(g) · x, where the action symbol on the right is the
action of H on X, defines an action of G on X.
10. Let G be a group acting on a set X and let ρ : G → SX be the induced permutation representation.
Prove that the mapping (G/ Ker ρ) × X → X defined by

(g Ker ρ) · x = g · x

is an action of G/ Ker ρ on X. Prove also that this action is faithful.


11. In this exercise, we explore the actions described in Example 8.1.12. Let X = Fun({0, 1, 2}, {0, 1, 2})
be the set of functions from {0, 1, 2} to itself. The set X has cardinality 27. We label the functions
in X by fk with 0 ≤ k ≤ 26 as the function fk (i) = ai where k = (a2 a1 a0 )3 as an integer expressed in
base 3. Hence, f14 satisfies
f14 (0) = 2, f14 (1) = 1, f14 (2) = 1
because in base 3, the integer 14 is 14 = (112)3 . Consider elements in S3 as permutations on {0, 1, 2}.
Using the described labeling of functions in X,
(a) Give ρ1 (σ) as a permutation on {0, 1, 2, . . . , 26} for the permutation representation ρ1 induced
from the action of S3 on X defined by (σ · f )(x) = σ · f (x).
(b) Give ρ2 (σ) as a permutation on {0, 1, 2, . . . , 26} for the permutation representation ρ2 induced
from the action of S3 on X defined by (σ · f )(x) = f (σ −1 · x).
12. Suppose that a group G acts on a set X and let Y = Fun(X, X) be the set of all functions from X to
X.
(a) Prove that the pairing ? : G × Y → Y defined by

(g ? f )(x) = g · f (g −1 · x)

is an action of G on Fun(X, X).


(b) Prove also that the the subset of bijections from X to X is invariant under this action.
13. Let G be a group. Let φ : X → Y and ψ : Y → Z be G-set homomorphisms between G-sets. Prove
that the composition ψ ◦ φ is a G-set homomorphism.
8.2. ORBITS AND STABILIZERS 387

14. Let f : X → Y be a G-set homomorphism and let S be a G-subset of Y .


(a) Prove that if S is a G-subset of X, then f (S) is a G-subset of Y .
(b) Prove that if T is a G-subset of Y , then f −1 (T ) is a G-subset of X.
15. Let V be the set of vertices of a cube and let D be the long diagonals. Define the function f : V → D
so that f (v) is the unique long diagonal that contains the vertex v. Let G be the group of rigid
motions of the cube and consider the natural actions of G on V and on D. Prove that f is a G-set
homomorphism.
16. Let X be a set and let G and H be subgroups of SX . Prove that the actions of G on X and H on X
are permutation isomorphic if and only if G and H are conjugate subgroups in SX .

8.2
Orbits and Stabilizers
The action of a group on a set creates some interaction between the algebraic structure of sets and
the algebraic structure of groups. Nontrivial actions of a group G on a set X connect information
about the set X with information about the group G in interesting ways.
For example, if G and X are finite and ρ : G → SX is the permutation representation induced
from an action of G on X, then by Lagrange’s Theorem, | Im ρ| divides |SX |. Furthermore, by the
First Isomorphism Theorem, Im ρ ∼ = G/ Ker ρ so | Im ρ| = |G|/| Ker ρ| and thus | Im ρ| also divides
|G|. Just this connection puts some constraints on what can occur for actions of a given group G
on a set X. For example, if G ∼= Zp , where p is a prime number and if |X| = n < p, then the only
action of G on X is trivial.

8.2.1 – Orbits
One of the first nonobvious connections between groups and sets is that a group action of G on X
defines an equivalence relation.

Proposition 8.2.1
Let G be a group acting on a nonempty set X. The relation defined by x ∼ y if and only
if y = g · x for some g ∈ G is an equivalence relation.

Proof. Let x ∈ X and let 1 be the identity in G. Since 1 · x = x, then x ∼ x for all x ∈ X. Hence,
∼ is reflexive.
Suppose that x, y ∈ X with x ∼ y. Then there exists g ∈ G such that y = g · x. Consequently,

g −1 · y = g −1 · (g · x) = (g −1 g) · x = 1 · x = x.

This shows that y ∼ x since x = g −1 · y.


Suppose that x, y, z ∈ X with x ∼ y and y ∼ z. Then for some group elements g, h ∈ G, we have
y = g · x and z = h · y. Then
z = h · (g · x) = (hg) · x.
This shows that x ∼ z and establishes transitivity. 

Definition 8.2.2
Let G be a group acting on a nonempty set X. The ∼-equivalence class {g · x | g ∈ G},
denoted by G · x (or more simply Gx), is called the orbit of G containing x.
388 CHAPTER 8. GROUP ACTIONS

Figure 8.3: Flows for a differential equation over R2

Recall that the equivalence classes of an equivalence relation form a partition of X. Consequently,
the orbits of G on X partition X.
The terminology of “orbit” might appear strange at first pass. Indeed, many algebraists use
the term freely without regularly recalling the etymology. The etymology evokes an application of
group actions to dynamical systems. Though the following example uses a fundamental result of
differential equations to illustrate the terminology, it is not essential for the reader to be familiar
with differential equations. Furthermore, we can only provide an intuitive treatment as the technical
details would take us far afield.
Example 8.2.3 (Dynamical Systems). Let X = Rn . A parametric curve in Rn has the form
~x(t) = (x1 (t), x2 (t), . . . , xn (t)) for real-valued functions xi (t). A first-order differential equation in
the vector function ~x(t) is an equation

~x 0 (t) = F~ (~x, t) (8.3)

where F~ is a function F~ : Rn+1 → Rn . More explicitly, a first-order differential equation in the


vector function ~x(t) is a system of differential equations


 x01 (t) = F1 (x1 , x2 , . . . , xn , t)
x02 (t) = F2 (x1 , x2 , . . . , xn , t)


..


 .

 0
xn (t) = F2 (x1 , x2 , . . . , xn , t),

where each function Fi involves that n space variables and the parameter (time) variable t. A
solution to the differential equation (or system of parametric equations) is any parametric curve
~x(t) that satisfies (8.3) for all t in some nonempty interval.
Existence and uniqueness theorems (see [11, Theorem 7.1.1] or [3]) in the theory of differential
equations establish conditions on the functions Fi under which solutions exist and are unique given
an initial condition ~x(t0 ) = ~a for a given initial parameter value t0 and some initial condition ~a. This
leads to the concept of a flow which is defined as a one-parameter family of functions φt : Rn → Rn ,
with t ∈ R, such that
φt ◦ φs = φt+s
8.2. ORBITS AND STABILIZERS 389

and φ0 is the identity function on Rn . The solutions to (8.3) give a flow on Rn by defining φt as

φs (~a) = ~x(s),

where ~x(t) is the unique solution to (8.3) with initial condition ~x(0) = ~a. A flow on Rn is an action
of the group (R, +) on the set Rn . Furthermore, for a fixed ~a, the flow φt (~a) describes the trajectory
(orbit) of a particle that is governed by the differential equation (8.3) as t evolves and that starts at
the initial point ~a. Therefore, the orbit for ~a as a group action is precisely the orbit as a trajectory
of a particle governed by this dynamical system starting at the point ~a.
Figure 8.3 shows the vector field F~ (x, y) = 0.8x(1 − y)2 , 0.3y(x − 1) along with the orbits


(trajectories) two solutions of the differential equation

(x0 (t), y 0 (t)) = F~ (x, y),

namely the solution with ~x(0) = (0.3, 2) = ~a and ~x(0) = (0.3, 1) = ~b. These trajectories show the
flow φt of the differential equation applied to the two points ~a and ~b.
The fact that solutions to differential equations of this form (under appropriate conditions) form
a flow, means that if we watch how points in Rn evolve during an interval of time t and then during
a subsequent interval of time s, then the points will have evolved as if we simply considered how
they evolved during an interval of time of duration t + s. 4

Since flows refer to group actions with the group (R, +), the reader should understand that the
intuition provided by the above example, which motivated the term “orbit,” is limiting for what can
happen in group actions in general. Indeed, we regularly consider consider group actions for finite
groups, nonabelian groups, or groups that do not have a natural total order on them.

Definition 8.2.4
Suppose that a group G acts on a set X. We say that G fixes an element x ∈ X if g · x = x
for all g ∈ G. Furthermore, the action is called
(1) free if g · x = h · x for some x ∈ X, then g = h;
(2) transitive if for any two x, y ∈ X, there exists g ∈ G such that y = g · x;
(3) regular if it is both free and transitive;

(4) r-transitive if for every two r-tuples {x1 , x2 , . . . , xr } and {y1 , y2 , . . . , yr } of distinct
elements in X, there exists an element g ∈ G such that

(y1 , y2 , . . . , yr ) = (g · x1 , g · x2 , . . . , g · xr ).

Note that an element x is fixed by G if and only if {x} is an orbit of G. On the opposite
perspective, the action of G on X is transitive if and only if there is only one orbit, namely all of X.
A group action is free if and only if the only element in G that fixes any element in X is the group
identity.

Example 8.2.5. Consider the action of a group G on its set of subgroups Sub(G) by conjugation.
(See Exercise 8.4.10.) There exists a natural bijection between H and gHg −1 . In particular, if
G is a finite group, then the orbits of this action stay within subgroups of G of fixed cardinality.
A subgroup H is fixed by this action if and only if gHg −1 = H for all g ∈ G. Hence, the fixed
subgroups are precisely the normal subgroups of G.
As a specific example, consider the the group D6 , whose lattice of subgroups is given in Exam-
ple 3.6.8. The orbits of the action of D6 on Sub(D6 ) are

{D6 }, {hs, r2 i}, {hr6 i}, {hsr, r2 i}, {hs, r3 i}, {hsr, r3 i, hsr2 , r3 i},
{hr2 i}, {hsi, hsr2 i, hsr4 i}, {hr3 i}, {hsri, hsr3 i, hsr5 i}, {h1i}. 4
390 CHAPTER 8. GROUP ACTIONS

y y

Q
x x

Figure 8.4: Orbits of points in R2 under D6 action

Example 8.2.6. Consider the action of D6 as linear transformations on R2 , where r acts as a


rotation around the origin O by 60◦ and s acts as a reflection through the x-axis. The orbit of O
is just the singleton set {O} and this is the only fixed point. The orbit of any point P whose polar
coordinates (r, θ) has θ = k π3 for some integer k has exactly six points. The orbits of all other points
in the plane have 12 points. See Figure 8.4.

Example 8.2.7. Continuing with examples associated to group actions of D6 on various sets, con-
sider the standard action of D6 on the vertices of a regular hexagon, labeled as in Example 8.1.5.
This action is transitive. In particular, if a, b ∈ {1, 2, 3, 4, 5, 6} then the group element rb−a maps a
into b. However, the action is not 2-transitive (or r-transitive for r ≥ 2). Indeed, if g · a = b with
a < 6, then g · (a + 1) can only be one of the two vertices on either side of b. 4

Example 8.2.8. Let F be a field and consider the action of GLn (F ) on the vector space V = F n
by matrix-vector multiplication. See Example 8.1.7. This action has exactly two orbits. One orbit
corresponds to the fixed point ~0. On the other hand, consider any two nonzero vectors ~a and ~b. Let
M1 , M2 ∈ GLn (F ) be matrices such that the first column of M1 is ~a and the first column of M2 is
~b. Then M = M2 M −1 ∈ GLn (F ) and M~a = ~b. Thus, V − {~0} is an orbit of this action.
1
We point out that the action of GLn (F ) on V − {~0}, though transitive is not 2-transitive. We
can see this by taking ~a and ~b to be collinear vectors and ~u and ~v to be linearly independent vectors.
There is no invertible matrix g ∈ GLn (F ) such that g~a = ~u and g~b = ~v . 4

Example 8.2.9. Consider the set X = Mm×n (R) of real m×n matrices. The group G = GLm (R)×
GLn (R) acts on Mm×n (R) by
(A, B) · M = AM B −1 . (8.4)
Let T : Rn → Rm be a linear transformation and suppose that M is the matrix of T with respect to
a basis of Rn and a basis of Rm . Then the action described above corresponds to effecting a change
of basis on Rm with basis change matrix A and a change of basis on Rn with basis change matrix
B.
If N = AM B −1 , then AM = N B. Since A and B are invertible, Im B = Rn and Im A = Rm .
Hence, rank AM = rank N B implies that rank M = rank N . We proceed to prove the converse.
Suppose that M has rank r. By definition, the linear transformation T (~x) = M~x, expressed
with respect to the standard bases on Rn and Rn has dim Im T = r. By the Rank-Nullity Theorem
dim Ker T + dim Im T = n so dim Ker T = n − r. Let {~ur+1 , ~ur+2 , . . . , ~un } be a basis of Ker T as
a subspace of Rn . Complement this basis with a set of vectors {~u1 , ~u2 , . . . , ~ur } to make an ordered
basis B = (~u1 , ~u2 , . . . , ~un ) of Rn .
8.2. ORBITS AND STABILIZERS 391

Define ~vi = T (~ui ) for 1 ≤ i ≤ r as vectors in Rm . Consider the trivial linear combination

c1~v1 + c2~v2 + · · · + cr ~vr = ~0.

Then, T (c1 ~u1 + c2 ~u2 + · · · + cr ~ur ) = ~0. However, Span(~u1 , ~u2 , . . . , ~ur ) ∩ Ker T = {~0}. So we deduce
that c1 ~u1 + c2 ~u2 + · · · + cr ~ur = ~0 in Rn . Since the ~ui are linearly independent, we deduce that ci = 0
for 1 ≤ i ≤ r. This shows that {~v1 , ~v2 , . . . , ~vr } is a linearly independent set. We now complement
this set with m − r vectors to make an ordered basis B 0 = (~v1 , ~v2 , . . . , ~vm ) on Rm .
By construction, the matrix of T with respect to the basis B on Rn and B 0 on Rm is the matrix
 
 Ir 0 
 .
00
Consequently, every matrix M of rank r has the above matrix in its orbit. This example has
shown that the orbits of GLm (R) ⊕ GLn (R) acting on Mm×n (R) by (8.4) are the sets

Or = {M ∈ Mm×n (R) | rank M = r},

where r is an integer satisfying 0 ≤ r ≤ min(m, n). 4

8.2.2 – Stabilizers
Let G act on a set X. An element g ∈ G is said to fix an element x ∈ X if g · x = x. The axioms of
group actions lead to strong interactions between groups and subsets of X, especially in relation to
elements in G that fix an element x ∈ X or conversely all the elements in X that are fixed by some
element g ∈ G.

Proposition 8.2.10
Suppose that a group G acts on a set X. For any element x ∈ X, the subset Gx = {g ∈
G | g · x = x} is a subgroup of G.

Proof. Obviously, 1 · x = x, so 1 ∈ Gx and in particular Gx is nonempty. Suppose that g, h ∈ Gx .


Then g · (h · x) = g · x = x but by the compatibility axiom x = g · (h · x) = (gh) · x, so gh ∈ Gx and
Gx is closed under the group operation. Finally, if g ∈ Gx , then

g −1 · (g · x) = g −1 · x =⇒ (g −1 · g) · x = g −1 · x =⇒ x = g −1 · x.

Therefore, Gx is closed under taking inverses. 

Definition 8.2.11
Given x ∈ X, the subgroup Gx = {g ∈ G | g · x = x} is called the stabilizer of x in G.

Group properties lead to the following important theorem and the subsequent Orbit Equation.

Theorem 8.2.12 (Orbit-Stabilizer Theorem)


Let G be a group acting on a set X. The size of orbit G · x satisfies |G · x| = |G : Gx |.

Proof. We need to show a bijection between the elements in the equivalence class G · x of x and left
cosets of Gx .
Consider the function f from the orbit G · x to the cosets of Gx in G defined by

f : y 7−→ gGx where y = g · x.


392 CHAPTER 8. GROUP ACTIONS

We first verify that this association is even a function. Suppose that g1 ·x = g2 ·x. Then (g2−1 g1 )·x = x
so g2−1 g1 ∈ Gx and hence the cosets g1 Gx and g2 Gx are equal. This shows that any image of f is
independent of the orbit representative so f is a function from Gx to the set of left cosets of Gx .
Now suppose that f (y1 ) = f (y2 ) for two elements in G · x, with y1 = g1 · x and y2 = g2 · x. Then
g1 Gx = g2 Gx so g2−1 g1 ∈ Gx . Hence, (g2−1 g1 ) · x = x and, by acting on both sides by g2 , we get
g1 · x = g2 · x. This proves that f is injective.
Finally, to prove that f is also surjective, let hGx be a left coset of Gx in G. Then hGx = f (h·x).
The element h · x is in the orbit G · x so f is surjective. 

Interestingly enough, the proof of the Orbit-Stabilizer Theorem does not assume that the group
or the set X is finite. The equality of cardinality holds even if the cardinalities are infinite.
We notice as a special case that if G acts transitively on a set X, then |G : Gx | = |X| for all
elements x ∈ X. In particular, if X and G are both finite, then |G| = |X||Gx |, which implies that
|G| is a multiple of |X|. Furthermore, a group action can only be regular if |G| = |X|.
The Orbit-Stabilizer Theorem leads immediately to the following important corollary. The Orbit
Equation is a generic equation, applicable to such an action, that flows from the fact that orbits of
G partition X. However, the Orbit Equation often gives rise to interesting combinatorial formulas.

Corollary 8.2.13 (Orbit Equation)


Let G be a group acting on a finite set X. Let T be a complete set of distinct representatives
of the orbits. Then X X
|X| = |G · x| = |G : Gx |.
x∈T x∈T

Example 8.2.14. As a simple illustration of the Orbit-Stabilizer Theorem, we prove that a group
G of order 15 acting on a set X of size 7 has a fixed point. By the Orbit-Stabilizer Theorem, the
orbit of an element x has order |G : Gx |, where Gx is the stabilizer of x. Hence, |G : Gx | can be
equal to 1, 3, 5, or 15. If |G : Gx | = 1, then G = Gx so all of G fixes x and hence x is a fixed point
of the action. Obviously, there can be no orbit of size 15 in a set of size 7. Assume there is no fixed
point. There the orbits must have order 3 or 5. Hence, if there are r orbits of size 3 and s orbits of
size 5, then 3r + 5s = 7 for r, s ∈ N. If s = 0 then r = 7/3 ∈ / N; if s = 1, then r = 2/3 ∈/ N; and
if s ≥ 2, then r < 0. We have shown that there exist no solutions in nonnegative integers to the
equation 3r + 5s = 7. We conclude by contradiction that there must be a fixed point. 4

Example 8.2.15. Consider the action of G = Sn on P({1, 2, . . . , n}) by

σ · {x1 , x2 , . . . , xk } = {σ(x1 ), σ(x2 ), . . . , σ(xk )}.

We can interpret the result of Exercise 8.1.6 by saying that the orbits of this action consist of the set
of subsets of a given cardinality k, for k ranging from 0 to n. For a fixed k, define A = {1, 2, . . . , k}.
The stabilizer GA is the set of permutations σ ∈ Sn that map {1, 2, . . . , k}. Hence,

|GA | = k!(n − k)!

because there are k! ways σ can permute {1, 2, . . . , k} and (n−k)! ways σ can permute the remaining
elements of {1, 2, . . . , n}. We recover the fact that there are
|G|
 
n! n
|G : GA | = = =
|GA | k!(n − k)! k
subsets of X of size k. Furthermore, since |P({1, 2, . . . , n})| = 2n , the orbit equation for this action
is the well-known combinatorial formula
n  
n
X n
2 = .
k 4
k=0
8.2. ORBITS AND STABILIZERS 393

8.2.3 – The Lemma That Is Not Burnside’s


The Orbit-Stabilizer Theorem establishes an interesting connection between the size of the set X
and orders of stabilizers. Similarly, by considering the set of points in X fixed by any given group
element g, one arrives at another interesting relationship, called the Cauchy-Frobenius Lemma. In
the literature, this result is more often called the Burnside Lemma. Though Burnside presented it in
his landmark text [12, Theorem VII, Chapter X], he was not the first to prove it. In [49], Neumann
chronicles how this incorrect naming arose, not an uncommon occurrence in mathematics.

Definition 8.2.16
Let G be a group acting on a set X. For any g ∈ G, define the fixed subset

X g = {x ∈ X | g · x = x}.

We note that the fixed subset X g serves a parallel role as the stabilizer of a set element x.

Theorem 8.2.17 (Cauchy-Frobenius Lemma)


Let G be a finite group acting on a finite set X. Suppose that G has m orbits on X. Then
X
m|G| = |X g |.
g∈G

Proof. Consider the set of elements S = {(x, g) ∈ X × G | g · x = x}. We count the number of
elements of S in two different ways, by summing first through elements in G and then by summing
first through X. By summing first through G, we get
X
|S| = |X g |.
g∈G

By summing first through elements in x, we get


X
|S| = |Gx |.
x∈X

Now let O1 , O2 , . . . , Om be the orbits of G on X and let x1 , x2 , . . . , xm be a complete set of distinct


orbit representatives. By the Orbit-Stabilizer Theorem, |Oi | = |G : Gxi |. Thus, |Gxi | = |G|/|Oi |.
Hence,
m X m X m m
X X X |G| X |G| X
|S| = |Gx | = |Gx | = = |Oi | = |G| = m|G|
i=1 x∈Oi i=1
|Oi | i=1
|Oi | i=1
x∈X x∈Oi

and the result follows by identifying the two ways of counting |S|. 

The Cauchy-Frobenius Lemma has many interesting applications in counting problems and com-
binatorics. In particular, if a counting problem can be phrased in a manner to count orbits of a
group acting on a set, then the lemma provides a strategy to compute the number of orbits m.
Example 8.2.18. Suppose that we wish to design a bracelet with 8 beads using beads of 3 different
colors, such as the one in Figure 8.5. We consider two bracelets equivalent if the arrangement of
bead colors of one can be obtained from the other simply by rotating it. We propose to determine
how many inequivalent bracelets of 8 beads can be made using 3 colors.
The rotation on a bracelet with 8 beads corresponds to the action of Z8 . There are 38 different
ways of putting beads on the bracelet without considering the equivalence. Any group elements in Z8
of the same order d will fix the same number of bracelet colorings. Hence, instead of summing over
elements in Z8 , we can sum over the divisors of 8, corresponding to the order of various elements in
394 CHAPTER 8. GROUP ACTIONS

Figure 8.5: A colored-bracelet counting problem

Z8 . Note that for any d|8, there are φ(d) elements of order d. Finally, note that a bracelet coloring
will be fixed by an element of order d if and only if the color pattern repeats every d8 beads. But
then a contiguous set of d8 can be colored in any way. Hence, the number of inequivalent bracelets
is m, where
X 1
8m = φ(d)38/d =⇒ m = (38 + 34 + 2 · 32 + 4 · 31 ) = 834.
8
d|8

To connect this bead-coloring problem to more theoretical language of group actions, we can view
the set of possible bracelets as the set of functions X = Fun({1, 2, . . . , 8}, {1, 2, 3}). The domain
corresponds to the bead position and the codomain is the bead color. The action of rotating the
bracelet corresponds to the action of Z8 on X by

(σ · f )(x) = f (σ −1 · x),

where σ ∈ Z8 acts in the natural way on {1, 2, . . . , 8}, as Z8 = h(1 2 3 4 5 6 7 8)i. 4

Exercises for Section 8.2


1. Let X = {1, 2, 3}3 = {(i, j, k) | 1 ≤ i, j, k ≤ 3}.
(a) Consider the action of S3 on X by σ · (a1 , a2 , a3 ) = (σ(a1 ), σ(a2 ), σ(a3 )). Explicitly list all the
orbits of this action.
(b) Consider the action of S3 on X by σ · (a1 , a2 , a3 ) = (aσ−1 (1) , aσ−1 (2) , aσ−1 (3) ). Explicitly list all
the orbits of this action.
2. Let X be the set of bit strings of length 6, namely X = {0, 1}6 .
(a) Consider the action of S8 on X by σ · (b1 , b2 , . . . , b6 ) = (bσ−1 (1) , bσ−1 (2) , . . . , bσ−1 (6) ). Give a
unique representative for each orbit and determine how many elements are in each orbit.
(b) View Z6 as a subgroup of S6 via Z6 ∼ = h(1 2 3 4 5 6)i. Determine all the orbits of order 1, 2, 3,
and 6.
3. Consider the group G of rigid motions of a cube and consider the action of G on the set of points C
that make up the surface of the rotated cube. (See Example 8.1.11.) For each point of C, describe its
orbits. In particular, for all divisors d of |G| = 24, determine which points of C have orbits of size d.
4. Let G be a group. Prove that the action of G on itself by left multiplication is a regular group action.
5. Consider the group G of invertible affine transformations of the plane acting on R2 . Recall that in
coordinates, an invertible affine transformation on R2 has the form
     
x x e
7−→ A + ,
y y f

where A ∈ GL2 (R) and e, f ∈ R. Show that the natural action of G on R2 is 2-transitive.
8.2. ORBITS AND STABILIZERS 395

6. Let G act on a nonempty set X. Suppose that x, y ∈ X and that y = g · x for some g ∈ G. Prove that
Gy = gGx g −1 . Deduce that if the action is transitive then the kernel of the action is
\
gGx g −1 .
g∈G

7. Let G be a group acting on a set X. Prove that a subset S of X is a G-invariant subset of X if and
only if S is a union of orbits.
8. Suppose that a group G acts on a set X and also on a set Y . Prove that a G-set homomorphism
f : X → Y maps orbits of G in X to orbits of G in Y .
9. Show that a group of order 55 acting on a set of size 34 must have a fixed point.
10. Suppose G is a group of order 21. Determine with proof the positive integers n such that an action of
G on a set X of size n must have a fixed point.
11. Let X = R[x1 , x2 , x3 , x4 ]. Consider the action of G = S4 on X by

σ · p(x1 , x2 , x3 , x4 ) = p(xσ(1) , xσ(2) , xσ(3) , xσ(4) ).

(a) Find the stabilizer of the polynomial x1 + x2 and give its isomorphism type.
(b) Find a polynomial q(x1 , x2 , x3 , x4 ) whose stabilizer is isomorphic to D4 .
(c) Explicitly list the elements in the orbit of x1 x2 + 5x3 .
(d) Explicitly list the elements in the orbit of x1 x22 x33 .
12. Let G be a group acting on a set X. Let H be a subgroup of G. It acts on X with the action of G
restricted to H. Let O be an orbit of H is X.
(a) Prove that for all g ∈ G, the set gO is an orbit of the conjugate subgroup gHg −1 .
(b) Deduce that if G is transitive on X and if H E G, then all the orbits of H are of the form gO.
13. Let A = {1, 2, . . . , n} and consider the set X = {(a, S) ∈ A × P(A) | a ∈ S}. Consider the action of
Sn on X by σ · (a, S) = (σ(a), σ · S), where σ · S is the power set action. (See Exercise 8.1.4.) Prove
that the Orbit Equation for this action gives the combinatorial formula
n
!
X n
k = n2n−1 .
k
k=0

14. Let A be a finite set of size k and consider the action of Sn on X = An via

σ · (a1 , a2 , . . . , an ) = (aσ−1 (1) , aσ−1 (2) , . . . , aσ−1 (n) ).

(See Example 8.1.13.) Prove that the Orbit Equation for this action is
X n!
kn = ,
s1 +···+sk =n
s1 !s2 ! · · · sk !

where the summation is taken over all (s1 , s2 , . . . , sk ) ∈ Nk that add up to n.


15. Let A and B be finite disjoint sets of cardinality a and b, respectively. Let X = Pk (A ∪ B) be the
set of subsets of A ∪ B of cardinality k. Let SA and SB be the groups of permutations on A and B,
respectively. Consider the action of G = SA ⊕ SB on A ∪ B by
(
σ(c) if c ∈ A
(σ, τ ) · c =
τ (c) if c ∈ B.

Define the action of G = SA ⊕ SB on Pk (A ∪ B) as the standard set of subsets action. Prove that the
Orbit Equation of G on X gives the Vandermonde Identity
! ! !
a+b X a b
= .
k i j
i+j=k

16. Let X be the set of functions from {1, 2, . . . , n} into itself and let G = Sn ⊕ Sn . Define the pairing
G × X → X by
((σ, τ ) · f )(a) = σ · f (τ −1 · a) for all a ∈ {1, 2, . . . , n}.
396 CHAPTER 8. GROUP ACTIONS

(a) Prove that this pairing is an action of G on X.


(b) Prove that the set of orbits is in bijection with the partitions of n, where for any partition
λ = (λ1 , λ2 , . . . , λs ), the corresponding orbit Oλ consists of functions in which the orders of the
distinct fibers are λ1 , λ2 , . . . , λs .
(c) Determine the number of functions in the orbit Oλ .
(d) Supposing the n = 5, find the stabilizer of the function f such that f (1) = f (2) = 1, f (3) =
f (4) = 2 and f (5) = 3.
17. Consider the action of G = S3 ⊕ S3 on X = {(i, j, k) | 1 ≤ i, j, k ≤ 3} by

(σ, τ ) · (a1 , a2 , a3 ) = (τ (aσ−1 (1) ), τ (aσ−1 (3) ), τ (aσ−1 (3) )).

This action has the effect of rearranging the elements in the triple according to σ and then permuting
the outcome by τ . Explicitly calculate X g for all g ∈ G and verify the Cauchy-Frobenius Lemma.
18. Recall that the group of rigid motions of a tetrahedron is A4 . Suppose that we color the faces of a
tetrahedron with colors red, green, or blue. We consider two colorings equivalent if one coloring can be
obtained from another by rotating the tetrahedron. How many inequivalent such colorings are there?
19. We consider colorings of the vertices of a square as equivalent if one coloring can be obtained from
the other by any D4 action on the square. How many different colorings are there with (a) 3 colors;
(b) 4 colors; (c) 5 colors?
20. We consider colorings of the vertices of an equilateral triangle as equivalent if one coloring can be
obtained from the other by any D3 action on the triangle. How many different colorings are there
using p colors?
21. Repeat Example 8.2.18 but consider bracelet colorings equivalent if one is obtained from another under
the action of some D8 element. (Note that the bracelet in Figure 8.5 is only fixed by 1 under the Z8
action but by {1, sr} in D8 .)

8.3
Transitive Group Actions
The previous section discussed primarily the orbits of a group action on a set. From the emphasis
of the section, the reader might get the impression that transitive group actions are not interesting
since in that case there is only one orbit, namely the whole set. This could not be further from the
truth.
There still exists considerable structure within a transitive group action. In fact, a group acts
transitively on each orbit, so we may view any group action as a union of transitive group actions.
In order to look deeper into the analysis of group actions, we must address properties of transitive
group actions.

8.3.1 – Blocks and Primitivity


Consider the Example 8.1.11, which discussed the group G of solid rotations of a cube. The example
pointed out how G acts, among other things, on the faces of the cube, and on the long diagonals of
the cube. There is a qualitative difference between the action of G on the set of faces and on the
set of long diagonals. In particular, the intersection of any two distinct long diagonals is ∅ whereas
some distinct faces intersect along an edge.

Definition 8.3.1
Let G be a group that acts transitively on a set X. A block is a nonempty subset B ⊆ X
such that for all g ∈ G, either g · B = B or (g · B) ∩ B = ∅.
8.3. TRANSITIVE GROUP ACTIONS 397

For every transitive action of a group G on a set X, the singleton sets {x} in X are blocks as is
the whole set X. Since these subsets are always blocks, we call them trivial. However, as the group
of rigid motions of the cube illustrates, some group actions may possess other blocks. Any of the
long diagonals of a cube is a block whereas a face is not a block. Some group actions do not possess
any nontrivial blocks. We give these a specific name.

Definition 8.3.2
A transitive action of a group G on a set X is called primitive if the only blocks of the
action are the trivial ones.

Example 8.3.3. Consider the group Z8 and its action on X = {1, 2, 3, 4, 5, 6, 7, 8} where Z8 is given
as h(1 2 3 4 5 6 7 8)i in S8 . This action is obviously transitive. Besides the trivial blocks and X, the
subsets
{1, 3, 5, 7}, {2, 4, 6, 8}, {1, 5}, {2, 6}, {3, 7}, {4, 8}
are blocks as well and the action of Z8 on X has no other blocks besides these. 4

Example 8.3.4. For n ≥ 3, the action of Sn on {1, 2, . . . , n} is primitive. Let {a1 , a2 , . . . , ak }


be any subset of {1, 2, . . . , n} with 2 ≤ k ≤ n − 1. There exists ` ∈ X − {a1 , a2 , . . . , ak } and
σ = (a1 a2 . . . ak `). Then

(σ · {a1 , a2 , . . . , ak }) ∩ {a1 , a2 , . . . , ak } = {a2 , . . . , ak },

which is neither empty, since k ≥ 2, nor {a1 , a2 , . . . , ak }. 4

If B is a block of a transitive group action of G on X, then for every y ∈ X, there exists


some g ∈ G with y ∈ g · B. Furthermore, since B ∩ (g · B) = B or ∅, then the set of subsets
Σ = {g · B | g ∈ G} is a partition of X. A set of subsets of X defined as {g · B | g ∈ G}, where B is
a block, is called a system of blocks on X.
For all g ∈ G, if B is a finite block, then |g·B| = |B|. Consequently, in the partition on X induced
from the system of blocks associated to B, all the other blocks (equivalence classes associated to the
partition) have the same cardinality. Since the union of all the blocks in Σ is precisely X, we have
proven the following result.

Proposition 8.3.5
If B is a block in the action of a group G on a finite set X, then |B| divides |X|.

If Σ is a system of blocks, then the group G acts transitively in the obvious way on Σ, thereby
inducing another group action but on a smaller set. Comparing the action of G on X with the action
of G on a system of blocks Σ, we introduce the following two types of stabilizers.

Definition 8.3.6
Let G act on X and let B ⊆ X, not necessarily a block. We define the pointwise stabilizer
of B as
G(B) = {g ∈ G | g · x = x for all x ∈ B}
and the setwise stabilizer as

G{B} = {g ∈ G | g · B = B}.

Note that if B is a block in the action of G on X, then G{B} is the usual stabilizer of B in the
action of G on Σ. In contrast, for the pointwise stabilizer
\
G(B) = Gx .
x∈B
398 CHAPTER 8. GROUP ACTIONS

5 6

9 7 1 2

8 3

Figure 8.6: Visualizing a block structure

It is not hard to see that G(B) and G{B} are both subgroups and that G(B) E G{B} . (See
Exercise 8.3.9.) Obviously, if B is a singleton B = {b}, then G(B) = G{B} = Gb .

Example 8.3.7. Consider the standard action of S9 on X = {1, 2, 3, 4, 5, 6, 7, 8, 9}. We propose to


find a subgroup H ≤ S9 that acts transitively on X and has {1, 2, 3} as a block.
Because of transitivity, the subgroup H must contain a permutation that maps 1 to 2 and a
permutation that maps 1 to 3. The subgroup h(1 2 3)i is transitive on the block B1 = {1, 2, 3}. Since
H is transitive, it must contain a g such that g · 1 = 4. But then the element g must map the block
{1, 2, 3} to another block of size 3. Without loss of generality, let us assume that this second block
is B2 = {4, 5, 6}. In that case, the third block is B3 = {7, 8, 9}.
The permutation σ = (1 4 7)(2 5 8)(3 6 9) cycles through blocks B1 , B2 , and B3 . Consequently, a
subgroup of S9 that is transitive on X and has {1, 2, 3} as a block is

H = h(1 2 3), (1 4 7)(2 5 8)(3 6 9)i.

The subgroup H has the effect of cycling through the blocks individually and independently cycling
through the blocks as a whole.
Figure 8.6 gives an intuitive picture of the action of H on {1, 2, 3, 4, 5, 6, 7, 8, 9}. The permutation
(1 2 3) corresponds to a clockwise rotation of the triangle {1, 2, 3}; the permutation σ corresponds
to a counterclockwise rotation of 120◦ of the whole figure; and the permutations

σ(1 2 3)σ −1 = (4 5 6) and σ 2 (1 2 3)σ −2 = (7 8 9)

correspond to clockwise rotations by 120◦ individually in the triangles {4, 5, 6} and in {7, 8, 9}.
The setwise stabilizer of B1 is

H{B1 } = h(1 2 3), (4 5 6), (7 8 9)i

and this is the same setwise stabilizer for the other two blocks B2 and B3 . The pointwise stabilizer
is
H(B1 ) = h(4 5 6), (7 8 9)i.
We point out that H is not the only subgroup of S9 that has {1, 2, 3} as a block. The cyclic
subgroup h(1 4 7)(2 5 8)(3 6 9)i of order 3 simply cycles through the blocks and has the same system of
blocks as H. The cyclic subgroup h(1 4 7 2 5 8 3 6 9)i of order 9 also has the same system of blocks.4

The following proposition gives a characterization of primitive groups. It relies on the notion of
a maximal subgroup. We call a subgroup H of a group G a maximal subgroup if for all subgroups
K with H ≤ K ≤ G, either K = H or K = G.
8.3. TRANSITIVE GROUP ACTIONS 399

Proposition 8.3.8
Let G act transitively on a set X. Then the action is primitive if and only if Gx is maximal
for all x ∈ X.

Proof. (⇐=) First, suppose that G has a nontrivial block B. Let x ∈ B. If g ∈ Gx , then gB ∩ B 6= ∅
so gB = B and thus g ∈ G{B} . Consequently, Gx ≤ G{B} ≤ G. Since B 6= X, we know that
G{B} 6= G. Furthermore, |B| = 6 1 so there exists another element y 6= x in the block B. Again, since
G acts transitively, there exists h ∈ G such that gx = y. Since y ∈ B and B is a block, hB ∩ B 6= ∅
so hB = B. Therefore, h ∈ G{B} but h ∈ / Gx . Therefore, Gx is a strict subgroup of G{B} . We have
shown that if the action is not primitive then there exists some x such that Gx is not a maximal
subgroup and the contrapositive gives us the desired implication.
(=⇒) Conversely, suppose that Gx is not maximal for some x ∈ X. Then let H be some subgroup
such that Gx H G. Let B be the orbit B = Hx. Since H − Gx 6= ∅, then the orbit B contains
more than one element, so |B| > 1. Assume that B = X. Then H would be transitive and hence
|X| = |G : Gx | = |H : Gx |, which is a contradiction since H G. Thus, B ( X. Now suppose that
gB ∩ B 6= ∅ for some g ∈ G. Let b ∈ gB ∩ B. Then there exist h1 , h2 ∈ H such that b = h1 x = gh2 x.
Therefore, h−1 −1
1 gh2 x = x so h1 gh2 ∈ Gx ≤ H. Consequently, we deduce that if gB ∩ B 6= ∅ then
g ∈ H so gB = B. Hence, B is a strict subset of X that is not a singleton set such that gB = B or
gB ∩ B = ∅. Hence, B is a nontrivial block. Again, we have proven the contrapositive of the desired
implication. 

Recall that if g ∈ G and x ∈ X, then the stabilizer of the element gx is

Ggx = gGx g −1 . (8.5)

Consequently, we can rephrase the above proposition with the stronger result.

Corollary 8.3.9
A transitive group action (G, X, ρ) is primitive if and only if Gx is maximal for some x ∈ X.

Proof. Suppose that for some x the stabilizer Gx is not maximal and that Gx H G for some
subgroup H. Since the action is transitive, for all y ∈ X, there exists g ∈ G with y = gx. Then
Gy = Ggx = gGx g −1 so Gy gHg −1 G and Gy is not maximal. Hence, there exists x such that
Gx is maximal if and only if Gx is maximal for all x ∈ X. 

8.3.2 – Blocks and Normal Subgroups


If a group G acts transitively on a set X, then any subgroup H ≤ G also acts on X. The action of
H on X might or might not be transitive. If H does not act transitively, the orbits of H partition
X. However, more can be said if the subgroup is normal.

Proposition 8.3.10
Let (G, X, ρ) be a transitive group action and let N E G. If N fixes a point x, then
N ≤ Ker ρ.

Proof. If x is fixed by N , then Nx = N . Let y ∈ X be arbitrary and let g ∈ G with y = gx. Then
by (8.5), Ny = gNx g −1 = gN g −1 = N . The result follows. 
400 CHAPTER 8. GROUP ACTIONS

Proposition 8.3.11
Let (G, X, ρ) be a transitive group action and let N E G. Then
(1) the orbits of N form a system of blocks for G;

(2) if G acts primitively, then N acts transitively on X or N ≤ Ker ρ;


(3) if O and O0 are two orbits of H, then they are permutation equivalent (but not
necessarily isomorphic as N -set).

Proof. (1) Let O be an orbit of N action on X as a subset of G and define the set Σ = {gO | g ∈ G}.
By Exercise 8.2.12, all the orbits of H acting on X are of the form gO. Hence, Σ is the set of orbits
of N and, since G is transitive, Σ is a partition of X. Thus, Σ is a system of blocks for the action
of G on X.
(2) If G acts primitively, then the action has two systems of blocks, namely {{x} | x ∈ X} and
{X}. By part (1), the set of orbits of N must be one of these two options. In the first case, N
stabilizes all x ∈ X so N ≤ Ker ρ. In the second case, N is transitive.
(3) By part (1), if O and O0 are two orbits of N , then O0 = gO for some g ∈ G. Since N is a
normal subgroup, conjugation on N given by ψg (n) = gng −1 is an automorphism. Furthermore, the
function between H-orbits f : O → O0 defined by fg (x) = gx is a bijection. Then for all n ∈ N and
all x ∈ O,
ψg (n) · fg (x) = (gng −1 ) · (gx) = (gn)x = g(nx) = fg (nx).

Therefore, (ψg , fg ) is a group action isomorphism. 

In Example 8.3.3, we saw that the natural action of Z8 = hz | z 8 = 1i on {1, 2, . . . , 8} has two
nontrivial systems of blocks. They correspond directly to the proper (normal) subgroups of Z8 ,
namely hz 2 i and hz 4 i.
Proposition 8.3.11 generalizes the defining property of normal subgroups. Consider the action of
G on itself by left multiplication and let H ≤ G be any subgroup. Left multiplication is a transitive
action. The orbits of the action of H on G by left multiplication are precisely the right cosets Hg.
The right cosets form a system of blocks if and only if for all g, x ∈ G, we have x(Hg) = Hg or
x(Hg) ∩ Hg = ∅. If H E G, then x(Hg) = x(gH) = (xg)H = H(xg) so x(Hg) is another right
coset, thereby satisfying the requirements for a block. Conversely, if H 5 G, then there exists g ∈ G
such that gH 6= Hg. However, gH ∩ Hg is never the empty set since it contains g. Hence, H is not
a block. Therefore, a subgroup H is normal if and only if its right cosets form a system of blocks
under the action of left multiplication of G on itself.

Example 8.3.12 (Rigid Motions of the Cube). Consider again the group G of rigid motions
of a cube and its action on the set of vertices. (See Example 8.1.11 and Figure 8.2.) We know that
G∼ = S4 . Furthermore, S4 has only two normal subgroups, namely A4 and K = h(1 2)(3 4), (1 3)(2 4)i.
As rigid motions, A4 is generated by the rotations of degree 120◦ around the axes along long
diagonals. Note that these rotations all have order 3 and correspond to the 3-cycles in S4 . Expressed
as rigid motions, the subgroup K consists of the identity and the three rotations of 180◦ around the
three different axes that join centers of opposite faces.
It is easy to check that (using the labeling as in Figure 8.2) the orbits of A4 are {1, 3, 6, 8} and
{2, 4, 5, 7}. Interestingly enough, the orbits of K are also {1, 3, 6, 8} and {2, 4, 5, 7}. Geometrically,
these orbits correspond to two regular tetrahedra of vertices that are separated from each other by
a diagonal edge in a face. We also observe that no normal subgroup of G has for orbits the system
of blocks of long diagonals through the cube. 4

The above example shows that the converse to part (1) of Proposition 8.3.11 is not true. In other
words, given a transitive group action with a system of blocks Σ, there does not necessarily exist a
normal subgroup N whose orbits are the blocks in Σ.
8.3. TRANSITIVE GROUP ACTIONS 401

When N is a normal subgroup of G, it is always natural to consider the quotient group G/N .
Let Σ be the set of orbits of N in X. For all n ∈ N and all g ∈ G,

(gn) · (N x) = g(N x),

so the pairing on G/N × Σ defined by

(gN ) · (N x) = (gN )x

is a transitive action of the quotient group G/N on the set orbits of N (which is the quotient set of
the equivalence class of the action of N on X). This action might not be trivial but we are led to
the following proposition.

Proposition 8.3.13
Let (G, X, ρ) be a transitive group action and let N E G. Then N has at most |G : N |
orbits and if |G : N | is finite, then the number of orbits of N divides |G : N |.

Proof. Since the action of G/N on Σ is transitive, the number of orbits of N , namely |Σ|, is less
than |G/N |. Let N x be one orbit. By the Orbit-Stabilizer Theorem
|G : N |
|Σ| = |(G/N ) : (G/N )N x | = .
|(G/N )N x |
Hence, |G : N | divides |Σ|. 

Proposition 8.3.11(2) gives a strategy for finding simple groups.

Corollary 8.3.14
If (G, X, ρ) is a faithful, transitive, primitive group action such that no normal subgroup
H E G is transitive, then G is simple.

8.3.3 – Multiple Transitivity


A transitive group action is called multiply transitive if it is r-transitive for some r ≥ 2. We commonly
say that an action is doubly transitive if it is 2-transitive and triply transitive if it is 3-transitive.
Note that if a group action is r-transitive it is also k-transitive for all k with 1 ≤ k ≤ r.
Example 8.3.15. In Exercise 8.2.5 the reader was invited to show that the group G of affine
transformations of R2 acts doubly transitively on R2 . However, invertible affine transformations
cannot map a set of three collinear points into a nondegenerate triangle. Hence, the group does not
act triply transitively on R2 . Though this example is over the field R, the same result holds over in
the context of any field. 4

Example 8.3.16. We have already seen that the standard action of Sn on X = {1, 2, . . . , n} is
n-transitive. However, consider the standard action of An on X. We can show that it is (n − 2)-
transitive but not (n−1)-transitive. Let (a1 , a2 , . . . , an−2 ) and (b1 , b2 , . . . , bn−2 ) be two (n−2)-tuples
of distinct elements in X. Complete both to ordered n-tuples (a1 , a2 , . . . , an ) and (b1 , b2 , . . . , bn ) of
distinct elements. Consider the permutations

bi
 if 1 ≤ i ≤ n − 2
σ(ai ) = bi and τ (ai ) = bn if i = n − 1

bn−1 if i = n.

They both map the (n − 2)-tuple (a1 , a2 , . . . , an−2 ) to (b1 , b2 , . . . , bn−2 ). However, τ = (bn−1 bn )σ so
either one or the other is even. Hence, An is (n − 2)-transitive on {1, 2, . . . , n}. On the other hand,
402 CHAPTER 8. GROUP ACTIONS

any subgroup of Sn that acts (n − 1)-transitively on {1, 2, . . . , n} also acts n-transitively. However,
An does not act n-transitively because the only permutation that maps the n-tuple (1, 2, . . . , n) into
(2, 1, 3, 4, . . . , n) is odd. Hence, An does not act (n − 1)-transitively. 4

Proposition 8.3.17
Let G act transitively on a set X. If r ≥ 2, the G acts r-transitively if and only if for all
x ∈ X, the stabilizer Gx acts (r − 1)-transitively on X − {x}.

Proof. First, suppose that G is r-transitive on X. Let (a1 , a2 , . . . , ar−1 ) and (b1 , b2 , . . . , br−1 ) be two
ordered (r − 1)-tuples in X − {x}. For all x ∈ X, there exists g ∈ G such that

g · (x, a1 , a2 , . . . , ar−1 ) = (x, b1 , b2 , . . . , br−1 ).

Obviously, g ∈ Gx and this g maps (a1 , a2 , . . . , ar−1 ) into (b1 , b2 , . . . , br−1 ).


Conversely, suppose that Gx acts (r − 1)-transitively on X − {x}. Let (a1 , a2 , . . . , ar ) and
(b1 , b2 , . . . , br ) be two ordered r-tuples in X − {x}. Since G acts transitively on X, then there
exists g such that b1 = ga1 . Since Gb1 acts (r − 1)-transitively on X − {b1 }, there exists h ∈ Gb1
such that bi = h(gai ) for 2 ≤ i ≤ r. Then the element (hg) satisfies

(hg) · (a1 , a2 , . . . , ar ) = (b1 , b2 , . . . , br ).

Hence, G acts r-transitively on all of X. 

Exercise 8.3.13 asks the reader to prove that every 2-transitive group action is primitive. Con-
sequently, every group action that is r-transitive with r ≥ 2 is primitive. Again, coupled with
Proposition 8.3.11, this result gives a strategy to prove that a group is simple. We will use this
strategy in Section 9.2.4 to prove the simplicity of an important family of groups.

Exercises for Section 8.3


1. Let G be a group acting transitively on a set X with |X| prime. Prove that the action is primitive.
2. Show that if a group G acts transitively on a set X, then there exists some g ∈ G such that X g = ∅.
3. We showed that the group G of rigid motions of a cube acting on its set of vertices has the long
diagonals for blocks. We claim that this group action has more nontrivial blocks. Find all of them
and prove you have found them all.
4. Find the largest subgroup of S6 that acts transitively on {1, 2, 3, 4, 5, 6} and has {1, 2, 3} as a block.
5. Let K = h(1 2 3)(4 5 6)(7 8 9), (1 4 7)(2 5 8)(3 6 9)i be a subset of S9 and consider its natural action on
X = {1, 2, . . . , 9}. Prove that K acts transitively and that it has the same system of blocks as the
group H in Example 8.3.7.
6. The group described in Example 8.3.7 is not the largest subgroup of S9 that acts on X = {1, 2, . . . , 9}
with {1, 2, 3} as a block. Find this largest subgroup.
7. Consider the group G of rigid motions on a cube. Show that G has a normal subgroup of order 4 and
describe the associated system of blocks of vertices described in Proposition 8.3.11.
8. Show that no subgroup G of S5 acts transitively on X = {1, 2, 3, 4, 5} in such a way that Gx is an
elementary abelian 2-group for any x ∈ X.
9. Let B be a nontrivial block of a transitive action of G on a set X. Prove that the pointwise stabilizer
G(B) and the setwise stabilizer G{B} are subgroups of G and that G(B) E G{B} .
10. Let G be a group acting transitively on a set X and let α ∈ X. Let B be the set of all blocks B in the
group action such that α ∈ B, and let S = {H ∈ Sub(G) | Gα ≤ H}. The function Ψ : B → S defined
by Ψ(B) = G{B} is a poset isomorphism between the posets (B, ⊆) and (S, ≤).
11. Let G act transitively on a set X and let x ∈ X. Prove that the fixed set X Gx is a block of X. Deduce
that if G acts primitively, then X Gx = {x} or else Gx = {1} and X is a finite set with |X| prime.
12. Prove that if G acts r-transitively on a set X with |X| = n, then |G| is divisible by n!/r!.
8.4. GROUPS ACTING ON THEMSELVES 403

7 6
1

3 5 4

Figure 8.7: The Fano plane

13. Prove that a 2-transitive group action is primitive.


14. Suppose that (G, N, ρ) is an r-transitive group action. Prove that if N E G then N is (r − 1)-transitive.
15. Let G act transitively on a finite set X. Suppose that for some element x ∈ X, the stabilizer Gx has
s orbits on X. Prove that
X g2
|X | = s|G|.
g∈G

Deduce that G acts 2-transitively if and only if


X
|X g |2 = 2|G|.
g∈G

16. Let F be a finite field of order pm for some prime p. Let G be the set of functions in Fun(F, F ) of the
form f (x) = αx + β such that α ∈ F − {0} and β ∈ F .
(a) Prove that G is a nonabelian group of order pm (pm − 1).
(b) Prove that the action of G on F via f · α = f (α) is faithful and transitive.
(c) Prove that G acts 2-transitively on F .
(d) Prove that G contains a normal subgroup of order pm that is abelian.
(e) Determine all the maximal subgroups of G.
17. Consider the elements in X = {1, 2, . . . , 7} and the diagram shown in Figure 8.7. The diagram is called
the Fano plane and arises in the study of finite geometries. Consider the subset L (called lines) of
P(X) whose elements are the subsets of size 3 depicted in the Fano plane diagram either as a straight
line or the circle. So for example {1, 2, 5} and {5, 6, 7} are in L. Let G be the largest subgroup of S7
that maps lines to lines, i.e., acts on the set L.
(a) Prove that the action of G on X is 2-transitive.
(b) Prove that |G| = 168.
(c) Prove that G is simple.

8.4
Groups Acting on Themselves
A fruitful area of investigation comes from considering ways in which groups can act on themselves
or act on their own internal structure. Properties of group actions combined with considering actions
of groups on themselves lead to new results about the internal structure of groups. Two natural
actions of a group G on itself are the action of left multiplication and the action of conjugation.
404 CHAPTER 8. GROUP ACTIONS

8.4.1 – Groups Acting on Themselves by Left Multiplication


A group acts on itself by left multiplication by the pairing G × G → G given by g · h = gh. In other
words, every g ∈ G corresponds to a function σg : G → G with σg (x) = gx. It is quite obvious that
this is a group action since for all g, h, x ∈ G,

g · (h · x) = ghx = (gh) · x

and 1x = x for all x ∈ G. Furthermore, the action of left multiplication if faithful because σg (x) =
gx = x for all x ∈ G is the definition of g = 1.
If G is a finite group with |G| = n, we can label all the group elements as G = {g1 , g2 , . . . , gn }.
Then the left multiplication action corresponds to an injective homomorphism ρ : G → Sn , via
ρ(g) = τ where ggi = gτ (i) .
Example 8.4.1. Example 8.1.5 presented the standard action of Dn on the labeled set of vertices of
the regular n-gon. This action is always different from the action of Dn on itself by left multiplication.
To compare with that example, take n = 6 and label the elements in D6 with integers {1, 2, . . . , 12}
listed in the same order as
1, r, r2 , . . . , r5 , s, sr, . . . , sr5 .
Then the permutation representation of D6 has

ρ(r) = (1 2 3 4 5 6)(7 12 11 10 9 8)
ρ(s) = (1 7)(2 8)(3 9)(4 10)(5 11)(6 12). 4

The group action by left multiplication on itself leads immediately to a powerful result about
symmetric groups.

Theorem 8.4.2 (Cayley’s Theorem)


Every group is isomorphic to a subgroup of some symmetric group. If |G| = n, then G is
isomorphic to a subgroup of Sn .

Proof. Since the permutation representation ρ : G → SG is injective, then by the First Isomorphism
Theorem, G ∼ = G/ Ker ρ ∼
= Im ρ. 

Because of this important theorem, the action of G on itself by left multiplication is also called
the Cayley representation.
Cayley’s Theorem is valuable for computational reasons. It is difficult to devise algorithms that
perform group operations for an arbitrary group. However, it is easy to devise algorithms to perform
the group operation in Sn . Cayley’s Theorem guarantees that a group G can be embedded into some
Sn , which reduces computations in G to computations in some corresponding Sn .
Since the action of G on itself by left conjugation is transitive, we are inspired to consider the
possibility of blocks in this action. It is not hard to see that for any subset H ≤ G, the set of left
cosets forms a system of blocks for this action. This is because for any left coset, xH the product
g(xH) = (gx)H is another left coset. The set of left cosets of H forms a partition of G, so the set
of left cosets of H forms a system of blocks in this action. However, the converse holds.

Proposition 8.4.3
Let a group G act on itself by left conjugation. A set Σ of subsets of G is a system of blocks
for this action if and only if Σ is the set of left cosets of some subgroup H ≤ G.

Proof. We have already shown one direction. We now assume that Σ is a system of blocks for this
action. Let H ∈ Σ be the block that contains the identity 1. Note that the set g · H contains the
element g. Hence, if g · H = H then g ∈ H since 1 ∈ H. Conversely, suppose that g ∈ H. Then
g · H is a block that contains the element g · 1 = g. Since g · H ∩ H 6= ∅, we deduce that g · H = H.
8.4. GROUPS ACTING ON THEMSELVES 405

Let x, y ∈ H. Then (xy) · H = x · (y · H) = x · H = H. Hence, xy ∈ H. Finally, let x ∈ H. Then


since x · H = H, by multiplying on the left by x−1 , we get H = x−1 · H. Thus, x−1 ∈ H. Therefore,
H is a subgroup of G and the system of blocks is the set of left cosets of H. 

8.4.2 – Groups Acting on Themselves by Conjugation


Another type of action of a group on itself is by conjugation. In other words, let X = G and consider
the pairing which the pairing G × X → G defined by g · x = gxg −1 . It is easy to see that

g · (h · x) = g · (hxh−1 ) = ghxh−1 g −1 = (gh)x(gh)−1 = (gh) · x.

Furthermore, 1 · g = 1g1−1 = g. This shows that conjugation satisfies the axioms of a group action.
The permutation representation is a homomorphism ρ : G → SG . We have already seen that for
each g ∈ G, the function ψg (x) = gxg −1 is an automorphism on G so ρ(G) ≤ Aut(G) ≤ SG . In
fact, we called the image subgroup ρ(G), the group of inner automorphisms of G and denote it by
Inn(G). (See Exercise 3.7.38.)
This action is not faithful in general. The kernel of the action is

Ker ρ = {g ∈ G | gxg −1 = x for all x ∈ G} = Z(G).

Hence, the action of G on itself by conjugation is faithful if and only if Z(G) = {1}.
Note that if A is a subset of a group G, then G might not necessarily act on A by conjugation.
Indeed, gag −1 might not be in A for some a ∈ A. However, the normalizer NG (A) is the largest
subgroup of G that acts on A by conjugation.
If G is not the trivial group, then the action of G on itself by conjugation is not a transitive
action. The orbits are the conjugacy classes of G. The orbit equation for this action turns out to
lead to another identity pertaining to the internal structure of a group that we could not get without
the formalism of group actions.
The fixed elements (singleton orbits) in the action by conjugation are precisely the elements in
the center Z(G). Now suppose that x ∈ / Z(G). The stabilizer of x is

Gx = {g ∈ G | gxg −1 = x} = CG (x)

so by the Orbit-Stabilizer Theorem, the conjugacy class of x has order |G : CG (x)|. This shows the
surprising result that the cardinality of every conjugacy must divide |G|. Furthermore, by grouping
the subset of fixed elements as a single term, the Orbit Equation immediately gives the following
result.

Proposition 8.4.4 (Class Equation)


Let G be a finite group and let K be a complete set of distinct representatives of conjugacy
classes that are of size 2 or greater. Then
X
|G| = |Z(G)| + |G : CG (x)|.
x∈K

As another example of a group action relevant to group theory, let G be a group and consider
the associated group of automorphisms, Aut(G). Of course, Aut(G) acts on G by ψ · g = ψ(g) but
Aut(G) also acts on the set of subgroups Sub(G) with the pairing Aut(G) × Sub(G) → Sub(G)
defined by ψ · H = ψ(H). Recall the concept of a characteristic subgroup of a group G: a subgroup
H such that ψ(H) = H for all automorphisms ψ ∈ Aut(G). So a characteristic subgroup of G is a
subgroup that remains unchanged by the action of Aut(G) on Sub(G) as we just described.
We can contrast the notion of characteristic subgroup to a normal subgroup by the fact that
normal subgroups are the subgroups that remain unaffected by the action of Inn(G), the subgroup
of Aut(G) that consists of automorphisms of the form ψg , where ψg (x) = gxg −1 .
406 CHAPTER 8. GROUP ACTIONS

8.4.3 – Cauchy’s Theorem


Another application of the Orbit-Stabilizer Theorem establishes Cauchy’s Theorem, an important
result in the classification of groups. We provide a clever proof given by James McKay [45]. The
proof involves a group acting on a set closely related to itself.

Theorem 8.4.5 (Cauchy’s Theorem)


If p is a prime number dividing the order of a finite group G, then G has an element of
order p.

Proof. Consider the set X = {(g1 , g2 , . . . , gp ) ∈ Gp | g1 g2 · · · gn = 1}. In the p-tuple (g1 , g2 , . . . , gp ) ∈


X, the group elements g1 , g2 , . . . , gp−1 can be arbitrary and gp = (g1 , g2 , . . . , gp−1 )−1 . Hence, |X| =
|G|p−1 .
Consider the action of the cyclic group Zp = hz | z p = 1i on X defined by

z · (g1 , g2 , . . . , gp ) = (g2 , g3 , . . . , gp , g1 ).

An element in X that is fixed by the action of H = Zp on X has the form (g, g, . . . , g). Such elements
have the property that g p = 1. Let us suppose that there are r such fixed elements in X. If an
element x ∈ X is not fixed by the action of Zp , then the stabilizer Hx is a proper subgroup of H,
which implies that the stabilizer is trivial and that the orbit Hx has cardinality |Hx| = |H : Hx | = p.
Let us suppose that there are s such nontrivial orbits in X. The Orbit-Stabilizer Theorem implies
that
r + sp = |G|p−1 .

Since p divides |G|, then p divides r. We know that (1, 1, . . . , 1) is a fixed point of the Zp action
on X so r ≥ 1. Since r is a nontrivial multiple of p, there are at least p − 1 more elements g ∈ G
satisfying g p = 1. All such elements have order exactly p. 

We have seen that Lagrange’s Theorem does not have a full converse, in the sense that if d
divides G, there does not necessarily exist a subgroup H ≤ G such that |H| = d. However, Cauchy’s
Theorem gives a partial converse in the sense that if d is a prime number dividing |G|, then there
exists a subgroup H ≤ G such that |H| = d.

8.4.4 – Useful CAS Commands


Various CAS have group theory packages. However, many of them rely on describing a group G by
a permutation representation, i.e., an embedding of G into some Sn using a faithful action of G on
a set of size n. Cayley’s Theorem affirms that this is always possible. Cayley’s Theorem uses the
action of G on itself by left multiplication, which gives an embedding of G into Sn with n ≤ |G|.
For particular groups, there might by an n small that |G| such that G embeds into Sn .
The following Maple 16 packages and commands support group theory calculations.

Maple 16 Function
with(group); Loads the group theory package. (Many commands.)
permgroup(n,gens); Defines a subgroup of Sn using the list gens of generators.
grouporder(G); Calculates the order of a permutation group G.
groupmember(s,G): Tests whether the permutation s is in the permutation group G.

Starting with version 17, Maple included a much larger group theory package. We only give a
few elementary commands and invite the reader to explore the package further.
8.4. GROUPS ACTING ON THEMSELVES 407

Maple 17 Function
with(GroupTheory); Loads the (new) group theory package. (Many commands.)
Perm(list); Defines a permutation given a list that represents the cycle type.
PermProduct(a,b); Calculates the product of the permutations.
Group(list); Given a few permutations as created from the previous command,
creates the subgroup generated by that list of permutations.
GroupOrder(G); Calculates the order of any group, defined as a permutation group
or some other way.
IsTransitive(G); Returns true or false depending on whether G, as a subgroup of
Sn , acts transitively on {1, 2, . . . , n}.

Exercises for Section 8.4


1. Consider the action of Q8 on itself by left multiplication. Consider the induced permutation represen-
tation ρ : Q8 → S8 . Determine ρ(−1), ρ(i), ρ(j), and ρ(k).
2. Consider the action of D4 on itself by conjugation. After labeling the elements in D4 , define the
induced permutation representation ρ : D4 → S8 , and write down ρ(g) for all g ∈ D4 .
3. Label the S3 elements 1, (1 2), (1 3), (2 3), (1 2 3), and (1 3 2) with the integers 1, 2, 3, 4, 5, and 6
respectively. Consider the action of S3 on itself by left multiplication and write the image of each
element under the induced permutation representation as an element in S6 .
4. Let G be a group and let X = G. Show that the pairing G × X → X given by (g, x) 7→ xg is not
generally a (left) group action. Show, however, that the pairing G × X → X given by (g, x) 7→ xg −1
does give a group action of G on itself.
∼ Zp ,
5. Prove that the action of a group on itself by left multiplication is primitive if and only if G =
where p is prime.
6. Prove that the action of a group on itself by conjugation is trivial if and only if G is abelian.
7. Let ρ : G → SG be the homomorphism induced from G acting on itself by left multiplication. Suppose
x ∈ G with |x| = n and that |G| = mn. Prove that ρ(x) is a product of m disjoint n-cycles.
8. Let G be a group and let H be a subgroup of G.
def
(a) Prove that the mapping g · (xH) = (gx)H defines an action of G on the set of left cosets of H.
(b) Denote by ρH the homomorphism of G onto the set of permutations on left cosets of H. Prove
that the kernel of this action is \
Ker ρH = xHx−1 .
x∈G

(c) Prove that Ker ρH is the largest normal subgroup contained in H.


9. Use Exercise 8.4.8 to prove the following theorem. Suppose that p is the smallest prime dividing the
order of G. Then if H ≤ G with |G : H| = p, then H E G. [Hint: By contradiction. Assume Ker ρH is
a strict subgroup of H and then show that |H : Ker ρH | must divide (p − 1)!. Explain why this implies
a contradiction.]
10. Let G be a group and let Sub(G) be the set of subgroups of G. For all H ∈ Sub(G) and all g ∈ G,
define the pairing g · H = gHg −1 . Prove that this pairing defines a group action of G on Sub(G).
11. Consider the action of G on itself by conjugation. Prove that the invariant subsets of G are unions of
conjugacy classes. Conclude that a subgroup of G is invariant under the action if and only if it is a
normal subgroup.
12. Let G be a group. The automorphism group Aut(G) acts on G by ψ · g = ψ(g) for all ψ ∈ Aut(G)
and all g ∈ G.
(a) Find the orbits of this action if G = Z12 .
(b) Find the orbits of this action if G = Z7 .
(c) Find the orbits of this action if G = D4 . [Hint: First determine Aut(D4 ).]
13. Let p < q be primes and let G be a group of order pq. Prove that G has a nonnormal subgroup of
index q. Deduce that there exists an injective homomorphism of G into Sq .
14. Let p be a prime. Use the Class Equation to prove that every p-group, i.e., a group of order pk for
some integer k, has a nontrivial center.
408 CHAPTER 8. GROUP ACTIONS

15. Let p be prime. Use Exercise 8.4.14 to prove that every group of order p2 is abelian. In particular, if
|G| = p2 , then G is isomorphic to Zp2 or Zp ⊕ Zp .
16. If G is a p-group and H is a proper subgroup, show that the normalizer NG (H) properly contains H.
[Hint: Use Exercise 8.4.14.]
17. Let G be a group. Show that the pairing (G ⊕ G) × G → G defined by (g, h) · x = gxh−1 is an action
of G ⊕ G on G. Show that the action is transitive. Also determine the stabilizer of the identity 1.
18. (Cauchy’s Theorem) The original proof to Cauchy’s Theorem did not use the group action described
in the proof we gave, but it relied instead on the Class Equation. Let p be a prime that divides the
order of a finite group G.
(a) Prove Cauchy’s Theorem for finite abelian groups. [Hint: Use induction on |G|.]
(b) Prove Cauchy’s Theorem for finite nonabelian groups by induction on |G| and using the Class
Equation.
19. Suppose that G is a finite group with m conjugacy classes. Show that the number of ordered pairs
(x, y) ∈ G × G such that yx = xy is equal to m|G|.

8.5
Sylow’s Theorem
Sylow’s Theorem is a partial converse to Lagrange’s Theorem in that it states that a group has a
subgroup of a certain order. Sylow’s Theorem leads to a variety of profound consequences for the
internal structure of a group simply based on its order. Therefore, it also provides vital information
in the classification problems (Section 9.4)—theorems that decide what groups exist of a given order.
We present Sylow’s Theorem in this section because it follows from a clever application of a
group action on certain sets of subgroups within the group.

Example 8.5.1. Before presenting the necessary group action and proving the theorem, we illus-
trate Sylow’s Theorem with an example. Consider the group G = S6 . Obviously, |G| = 720 =
24 · 32 · 5. Sylow Theorem will guarantee us that G has a subgroups of order 16, 9, and 5. The
theorem also gives us a condition on how many of such subgroups G has. That such subgroups exist
is not immediately obvious.

• Finding a subgroup of order 5 is easy. Indeed, h(12345)i works. There are 65 4!/4 = 36


different groups of order 5.

• Finding a subgroup of order 9 = 32 is not hard either. H = h(123), (456)i works. In fact, since
there are no 9-cycles in S6 , every subgroup of order 9 must be isomorphic to Z3 ⊕ Z3 . Such
subgroups must be generated by two nonoverlapping 3-cycles. It is easy to show that there
are 21 63 = 10 such subgroups.

• Finding a subgroup of order 16 = 24 is more challenging. Such a subgroup will be nonabelian


and it cannot have an element of order 8. We can obtain a subgroup H of order 16 by
embedding D4 into the the subgroup of S6 that fixes 5 and 6 and then tack on the remaining
transposition (5 6). In other words,

H = h(1 2 3 4), (2 4), (5 6)i ∼


= D 4 ⊕ Z2 .

It is not hard to show that there are 45 different subgroups of this form. 4
8.5. SYLOW’S THEOREM 409

8.5.1 – Sylow’s Theorem

Definition 8.5.2
Let G be a group and p a prime.
• A group of order pk for some k ∈ N∗ is called a p-group. Subgroups of G that are
p-groups are called p-subgroup.

• If G is a group of order pk m, where p - m, then a subgroup of order pk is called a


Sylow p-subgroup of G.
• The set of Sylow p-subgroups of G is denoted by Sylp (G) and the number of Sylow
p-subgroups of G is denoted by np (G).

If p is a prime that does not divide |G|, then the notion of a p-subgroup is not interesting.
However, to be consistent with notation, if p does not divide |G| then trivially Sylp (G) = {h1i} and
np (G) = 1.

Theorem 8.5.3 (Sylow’s Theorem, Part 1)


For all groups G and all primes p that divide |G|, Sylow p-subgroups of G exist, i.e.,
Sylp (G) 6= ∅.

Proof. We use (strong) induction on the size of G. If |G| = 1, there is nothing to do, and the
theorem is satisfied trivially. Assume that the theorem holds for all groups of size strictly less than
n. We prove that the theorem holds for all groups G with |G| = n.
Let p be a prime and assume that |G| = pk m with p - m. If p divides Z(G), then by Cauchy’s
Theorem Z(G) contains an element of order p and hence a subgroup N of order p. But then the
group |G/N | = pk−1 m is smaller than G, hence by the induction hypothesis, G/N contains a Sylow
p-subgroup P̄ of order pk−1 . By the Fourth Isomorphism Theorem, there exists a subgroup P of G
such that P̄ = P/N . Then |P | = pk and hence G contains a Sylow p-subgroup.
We are reduced now to the case where p - |Z(G)|. Consider the Class Equation (Proposi-
tion 8.4.4),
Xr
|G| = |Z(G)| + |G : CG (gi )|
i=1
where {g1 , g2 , . . . , gr } is a complete list of distinct representatives of the nontrivial conjugacy classes.
Since p divides |G| but not |Z(G)|, there exists some gi0 such that p does not divide |G : CG (gi0 )|.
Then CG (gi0 ) has order pk ` where p - `. Thus, again by strong induction, CG (gi0 ) has a Sylow
p-subgroup of order pk , which is a subgroup of G. 
Before establishing the rest of Sylow’s Theorem, we require two lemmas. The first gives a property
of Sylow-p subgroups concerning intersections with other p-subgroups.

Lemma 8.5.4
Let P ∈ Sylp (G). If Q is any p-subgroup of G, then Q ∩ NG (P ) = Q ∩ P .

Proof. Since P ≤ NG (P ), then Q ∩ P ≤ Q ∩ NG (P ) and so we need to show the opposite inclusion.


Call H = Q ∩ NG (P ). Since H ≤ NG (P ), then the subset P H is a subgroup of G and
|P | · |H|
|P H| = .
|P ∩ H|
Since H ≤ Q, the prime p divides |H| and hence all the orders on the right-hand side are powers
of p. Hence, P H is a p-group. Furthermore, P H contains P and hence since P has a maximal p
410 CHAPTER 8. GROUP ACTIONS

power dividing |G|, then P = P H. Thus, P ∩ H = H so H ≤ P . Since H ≤ Q, we conclude that


H = Q ∩ NG (P ) ≤ Q ∩ P . This proves the reverse inclusion so the result follows. 
The second part of Sylow’s Theorem follows from considering the action of G on Sylp (G) by
subgroup conjugation. Theorem 8.5.3 established the key step that Sylp (G) is nonempty. Suppose
that P ∈ Sylp (G). This is the orbit of P under the conjugation action is

SP = {gP g −1 | g ∈ G} = {P1 = P, P2 , . . . , Pr }.
By definition of orbits, G acts transitively on SP . Let H be any subgroup of G. It also acts on
SP by conjugation but perhaps not transitively. Then under the action of H, the set SP may
get partitioned into s(H) distinct orbits {O1 , O2 , . . . , Os(H) }, where s(H) is a positive integer that
depends on H. Obviously, r = |O1 | + |O2 | + · · · + |Os(H) |. The Orbit-Stabilizer Theorem applied to
the action of H on SP states that if Pi is any element in the orbit Oi , then
|Oi | = |H : NH (Pi )|. (8.6)
If H happens to be another p-subgroup, this formula simplifies.

Lemma 8.5.5
Let P ∈ Sylp (G) and let Q be any p-subgroup of G. Suppose that Q acts on the orbit SP
by subgroup conjugation. If the orbit of some Pi ∈ SP is Oi , then

|Oi | = |Q : Q ∩ Pi |.

Proof. By (8.6) and Lemma 8.5.4,


|Oi | = |Q : NQ (Pi )| = |Q : NG (Pi ) ∩ Q| = |Q : Pi ∩ Q|
for 1 ≤ i ≤ s(Q). 
We can now establish the second part of Sylow’s Theorem.

Theorem 8.5.6 (Sylow’s Theorem, Part 2)


Let G be a group of order pk m, with k ≥ 1 and where p is a prime not dividing m.
(1) If P is a Sylow p-subgroup, then any p-subgroup Q is a subgroup of some conjugate
of P . (In other words, G acts transitively on Sylp (G) and every p-subgroup is a
subgroup of some Sylow p-subgroup.)
(2) The number of Sylow p-subgroups satisfies

np ≡ 1 (mod p).

(3) Furthermore, np = |G : NG (P )|, for any Sylow p-subgroup P , so np divides m.

Proof. By Theorem 8.5.3, we know that Sylp (G) is nonempty. Let P be a Sylow p-subgroup of G
and let SP be the orbit of P in the action of G acting on the set of subgroups of G by conjugation.
Let r = |SP |.
We first show that r ≡ 1 (mod p) as follows. Apply Lemma 8.5.5 with Q = P itself. Then
O1 = {P } so |O1 | = 1. Then for all integers i with 1 < i ≤ s(P ), the orbit Oi satisfies
|Oi | = |P : Pi ∩ P |,
which is divisible by p. Thus,
r = |O1 | + |O2 | + · · · + |Os(P ) | ≡ 1 (mod p).
8.5. SYLOW’S THEOREM 411

We prove by contradiction that the action of G by conjugation on Sylp (G) is transitive. As above,
let P be an arbitrary Sylow p-subgroup. Suppose that there exists a Sylow p-subgroup P 0 that is
not conjugate to P . Now consider the action of P 0 on SP by conjugation and apply Lemma 8.5.5
with Q = P 0 .
Then for 1 ≤ i ≤ s(P 0 ), the p-group P 0 ∩ Pi is a strict subgroup of P 0 so by
|Oi | = |P 0 : P 0 ∩ Pi | > 1.
Thus, p divides all |P 0 : P 0 ∩ Pi | which implies that p divides r. Since r ≡ 1 (mod p), we have a
contradiction. Thus, we conclude that there does not exist a Sylow p-subgroup that is not conjugate
to P . These results prove (1) and (2).
For part (3), notice that since the action of G on Sylp (G) by conjugation is transitive, then
r = np so np ≡ 1 (mod p). Also, the Orbit-Stabilizer Theorem tells us that np = |G : NG (P )|. Then
by the chain of subgroups
P ≤ NG (P ) ≤ G,
we deduce that
pk m = |G| = |G : NG (P )| |NG (P ) : P | |P | = np |NG (P ) : P |pk
so np divides |G : P | = m. 
By parts (3), np = 1 means that the one Sylow p-subgroup P has NG (P ) = G so it is a normal
subgroup. (Also, in Exercise 4.2.12, we saw that if there is only one subgroup of a given order,
then that subgroup is normal.) In particular, if np = 1 for some prime p that divides |G|, then we
immediately conclude that G is not simple. This result often gives a quick way to determine that
no group of a certain order is simple. The following example illustrates this.
Part 1 of Theorem 8.5.6 implies that for a given prime p, all Sylow p-subgroups are conjugate to
each other. This implies that they are all isomorphic to each other.
Example 8.5.7. We revisit the Example 8.5.1 discussing S6 provided as a motivation at the begin-
ning of this section. Sylow’s Theorem affirms that there exist subgroups of order 16, 9, and 5. Since
for a given p, all Sylow p-subgroups are isomorphic, then every Sylow p-subgroup is isomorphic to
the Sylow p-subgroups that we illustrated for p = 2, p = 3, and p = 5. We had determined that
(1) there are 36 Sylow 5-subgroups, which conforms to n5 ≡ 1 (mod 5); (2) there are 10 Sylow
9-subgroups, which conforms to n3 ≡ 1 (mod 3); (3) there are 45 subgroups of order 16, which
conforms to n2 ≡ 1 (mod 2). 4
Example 8.5.8. Consider groups of order 385. Note that 385 = 5 · 7 · 11. By part (2), n11 ≡ 1
(mod 11) and while by part (3) we also have n11 | 35. The divisors of 35 are 1, 5, 7, and 35. The
only divisor of 35 that satisfies both conditions is n11 = 1. Hence, every group of order 385 has a
normal subgroup of order 11.
If we continue similar analysis with the other prime factors of 385, we notice that n7 ≡ 1 (mod 7)
and n7 | 55. Again, the only possibility is n7 = 1 so groups of order 385 must also possess a normal
subgroup of order 7. However, for the prime p = 5, the conditions give n5 ≡ 1 (mod 5) and n5 | 77.
Here, we have two possibilities, namely that n5 = 1 or 11. 4
The situation in which np (G) = 1 is particularly important for determining the structure of the
group G. We already commented that np (G) = 1 implies that G has a normal Sylow p-subgroup.
However, the converse is also true.

Proposition 8.5.9
Let P be a Sylow p-subgroup of a group G. The following are equivalent:
(1) np (G) = 1;

(2) P E G;
(3) P is a characteristic subgroup of G.
412 CHAPTER 8. GROUP ACTIONS

Proof. (1) =⇒ (3): Since np (G) = 1, there is only one Sylow p-subgroup P of G. Every auto-
morphism ψ ∈ Aut(G) maps subgroups of G back into subgroups of the same cardinality. Hence,
ψ(P ) = P so P is characteristic.
(3) =⇒ (2): Follows from the fact that conjugation by any g ∈ G is an automorphism of G.
(2) =⇒ (1): By part 1 of Sylow’s Theorem, the action of G on Sylp (G) is transitive. Since
gP g −1 = P for all g ∈ G, then Sylp (G) = {P } and np (G) = 1. 

8.5.2 – Applications of Sylow’s Theorem


Sylow’s Theorem implies many profound consequences for what we can determine about the structure
of a group simply from the order.
As mentioned earlier, one of the simplest applications of Sylow’s Theorem involves determining
that a group is not simple by showing that np (G) = 1 for some prime p that divides |G|. Certain
numerical situations allow us to conclude much more than whether or not a group has a normal
subgroup. For example, it may be possible to prove that a group of a certain order must be abelian
and then the Fundamental Theorem of Finitely Generated Abelian Groups give us a classification
of all groups of that order. By amassing more and more results, it becomes possible to narrow down
all the possibilities for groups of a given order.
Chapter 9 discusses the program to classify groups. There, many results from group theory, some
of which this text presented in the exercises, combine to give more and more results in the aide of
the classification effort. We will see that Sylow’s Theorem plays a crucial role. Here, we begin to
show how Sylow’s Theorem can provide considerable information about what can happen for groups
of a given order.
We start with a challenging example to illustrate that it is useful to remember as many theorems
about group theory as possible but also because this result is very helpful for further examples.

Example 8.5.10 (Groups of Order pq). Let G be a group of order pq where p and q are primes.
As an application of the Class Equation, we saw in Exercise 8.4.15 that groups of order p2 are
isomorphic either to Zp2 or to Zp ⊕ Zp . We assume for now on that p < q.
Consider the center of the group Z(G) and, in particular, its order |Z(G)|. If |Z(G)| = pq, then
the group is abelian. By the FTFGAG, we know that G ∼ = Zpq .
In Exercise 4.3.21, we saw that if G/Z(G) is cyclic, then G is abelian. If |G| = pq with p 6= q,
then we cannot have |Z(G)| = p or q because otherwise G/Z(G) would be isomorphic to Zq or Zp
respectively, making G abelian and |Z(G)| = pq, contradicting |Z(G)| = p or q.
Now assume that |Z(G)| = 1. By Sylow’s Theorem, nq = 1 + kq (with k ≥ 0) and nq divides
|G|/q = p. However, if k > 0, then nq > q > p which contradicts nq | p, so we must have nq = 1.
Therefore, G contains one subgroup Q ≤ G of order q and it is normal. Similarly, np ≡ 1 (mod p)
and np must divide q. This leads to two cases.
Let us first suppose that p - (q − 1). Then we must have np = 1 and so G has a normal subgroup
P of order p. Then by the Direct Sum Decomposition Theorem (Theorem 4.3.12), G ∼ = P ⊕ Q, so
G∼ = Zp ⊕ Zq ∼ = Zpq . Hence, G is abelian again, contradicting Z(G) = {1}.
Now suppose that p | (q − 1). Then, a priori, it is possible for np > 1. We now provide a
constructive proof of the existence of a nonabelian group of order pq. Let x be a generator of Q,
so of order q. Also, by Cauchy’s Theorem, G has an element y of order p. Then hyi is a Sylow
p-subgroup and all Sylow p-subgroups are conjugate to P = hyi. By Corollary 4.2.10, P Q ≤ G and
hence P Q = G by Lagrange’s Theorem since |P Q| > q. The subgroup P acts by conjugation on Q
and this action defines a homomorphism of P into Aut(Q). Note that Aut(Q) ∼ = Aut(Zq ) ∼= U (q),
the multiplicative group of units in Z/qZ. (See Exercise 3.7.40.)
Proposition 7.5.2 establishes that U (q) is a cyclic group and hence as a generator of order q − 1.
Since Q is cyclic, with generator x, automorphisms on Q are determined by where they map the
generator, ψk (x) = xk , where gcd(k, q − 1) = 1. Let a be a positive integer such that ψa (x) = xa
has order q − 1. If d = (q − 1)/p, then ψad = ψad has order p. Then the action of conjugation of P
on Q determined by the homomorphism P → Aut(Q) given by y 7→ ψad is a nontrivial action. This
8.5. SYLOW’S THEOREM 413

gives a nonabelian group, which can be presented as

hx, y | xq = y p = 1, yxy −1 = xα i

where α ≡ ad (mod q).


Finally, we wish to show that all nonabelian groups of order pq are isomorphic. We have seen
that any two nonabelian groups of order pq have presentations of the form

G1 = hx, y | xq = y p = 1, yxy −1 = xα i and G2 = hg, h | g q = hp = 1, hgh−1 = g β i,

where α and β both are elements of order p in the multiplicative group U (q). Now in cyclic groups
there exists a unique subgroup of any given order. Thus, hβi is the unique subgroup of order p in
U (q) and α ∈ hβi so α = β c for some integer 1 ≤ c ≤ p − 1. Consider a function ϕ : G1 → G2 that
maps ϕ(x) = g and ϕ(y) = hc . Obviously, g q = 1 and (hc )p = 1, but also

(hc )g(hc )−1 = hc gh−c = hc−1 g β h−(c−1) = hc−2 (g β )β h−(c−2)


2 c
= hc−2 g β h−(c−2) = · · · = g β = g α .

By the Extension Theorem on Generators, we deduce that ϕ extends to a homomorphism from G1


to G2 . However, it is easy to tell that ϕ is both surjective and injective so it is an isomorphism.
To recap, if G is a group of order pq, then

(1) if p = q, then G is isomorphic to Zp2 or Zp ⊕ Zp ;

(2) if p 6= q and p - (q − 1), then G ∼


= Zpq ;

(3) if p 6= q and p | (q − 1), then G is isomorphic to Zpq or the unique nonabelian group of order
pq. 4

Example 8.5.11 (Groups of Order 39). As a specific illustration of the previous classification
result, consider n = 39 = 3 · 13. The group Z39 is the only abelian group of order 39. Note that
3 | (13 − 1) so by the previous example there also exists a nonabelian group of order 39. The cyclic
group U (13) is generated by 2 because for example in modular arithmetic base 13, the powers of 2
are: 2, 4, 8, 3, 6, 12, 11, 9, 5, 10, 7, 1 . . . We have d = (13 − 1)/3 = 4, so, using a = 2, we have α = ad
mod 13 = 24 mod 13 = 3. The sequence of powers of 3 is 1, 3, 9, 1, 3, 9, . . . As a presentation, the
group
G = hx, y|x13 = y 3 = 1, yxy −1 = x3 i
is a nonabelian group of order 39. It is possible to construct this example even more explicitly as
a subgroup of S13 . Let σ = (1 2 3 . . . 13) and let τ be the permutation such that τ στ −1 = σ α = σ 3 .
Since σ 3 = (1 4 7 10 13 3 6 9 12 2 5 8 11), by Example 4.2.13, we find that the appropriate permuation
τ is τ = (2 4 10)(3 7 6)(5 13 11)(8 9 12). Then hσ, τ i is isomorphic to this nonabelian group of order
39. 4

Example 8.5.12 (Groups of Order 30). We now prove that by virtue of |H| = 30, H must have
a normal (and hence unique) Sylow 5-subgroup and Sylow 3-subgroup. Let Q1 ∈ Syl3 (H) and let
Q2 ∈ Syl5 (H). If either Q1 or Q2 are normal in H then Q1 Q2 is a subgroup of H of order 15. Since
15 is half of 30, then Q1 Q2 E H. Since Q1 and Q2 are characteristic subgroups of Q1 Q2 by the
Corollary to Sylow’s Theorem, then Q1 and Q2 are both normal subgroups of H. Therefore, we have
proven that either both Q1 and Q2 are normal in H or neither are. If neither are, then n3 (H) = 10
and n5 (H) = 6. But this would lead to 10 · 2 + 6 · 4 = 44 elements of order 3 or 5 whereas the group
H has only 30 elements. This is a contradiction. Hence, both Q1 and Q2 are normal in H. 4

The last example illustrates how counting elements of a given order may allow us to gain more
information beyond that given immediately from Sylow’s Theorem.
414 CHAPTER 8. GROUP ACTIONS

Example 8.5.13 (Groups of Order 105). Let G be a group of order 105 = 3 · 5 · 7. Using the
criteria of Sylow’s Theorem, we find that n3 (G) = 1 or 7, that n5 (G) = 1 or 21 and that n7 (G) = 1
or 15. So by divisibility considerations, it would appear that it would be possible for G not to
have any normal Sylow p-subgroups. However, that is not the case. Assume that n3 (G) = 7, that
n5 (G) = 21, and that n7 (G) = 15. Each Sylow 5-subgroup would contain 4 elements of order 5 and
these subgroups would intersect pairwise in the identity (since they are distinct cyclic subgroups of
prime order). Hence, n5 (G) = 21 accounts for 4 × 21 = 84 elements of order 5. By a same reasoning,
if n7 (G) = 15 then G contains 15 distinct cyclic subgroups of order 7, which accounts for 6 × 15 = 60
elements of order 7. However, this count gives us already 84 + 60 = 144 elements of order 5 or 7 but
this number is already greater than the order of the group, 105. Hence, every group of order 105
must contain a normal subgroup of order 5 or a normal subgroup of order 7. 4

Example 8.5.14 (Groups of Order 2115). Let G be a group of order 2115 = 32 · 5 · 47. It is
easy to see that the conditions n47 ≡ 1 (mod 47) and 47 | 45 imply that n47 = 1. Hence, G must
contain a normal subgroup N of order 47. However, because of the numerical relationships in this
case, more can be said about N . Consider the action of G on N by conjugation. Since N is normal,
this conjugation engenders a homomorphism ψ : G → Aut(N ). However, Aut(N ) = Aut(Z47 ),
which has order 46. But gcd(46, 2115) = 1 so the only homomorphism ψ : G → Aut(N ) is the
trivial homomorphism. Thus, the action of conjugation of G on N is trivial and we conclude that
N commutes with all of G so N ≤ Z(G). 4

Exercises for Section 8.5


1. Determine the isomorphism type of the Sylow 2-subgroups of A6 .
2. Let p be a prime and let suppose that 2 ≤ k ≤ p − 1. Find a Sylow p-subgroup of Skp by expressing
it using generators.
3. Let p be a prime. Find a Sylow p-subgroup of Sp2 by expressing it using generators. Show also that
it is a nonabelian group of order pp+1 .
4. Exhibit all Sylow 2-subgroups of S4 .
5. Exhibit a Sylow 2-subgroup of SL2 (F3 ) by expressing it using generators.
6. Suppose that G = GL2 (Fp ). Prove that np (G) = p + 1.
7. Exhibit a Sylow 3-subgroup of GL2 (F17 ) by expressing it using generators.
8. Let p be a prime number and consider the group S2p .
(a) Show that np = 21 2p ((p − 2)!)2 .

p

(b) Use Sylow’s Theorem to conclude that 21 2p ((p − 2)!)2 ≡ 1 (mod p).

p
9. Show that a group of order 418 has a normal subgroup of order 11 and a normal subgroup of order
19.
10. Prove that there is no simple group of order 225.
11. Prove that there is no simple group of order 825.
12. Prove that there is no simple group of order 2907.
13. Prove that there is no simple group of order 3124.
14. Prove that there is no simple group of order 4312.
15. Prove that there is no simple group of order 132.
16. Prove that there is no simple group of order 351.
17. Prove that a group of order 273 has a normal subgroup of order 91.
18. Prove that if |G| = 2015, then G contains a normal subgroup of order 31 and subgroup of order 13 in
Z(G).
19. Prove that if |G| = 459, then G contains a Sylow 17-subgroup in Z(G).
20. Prove that every group of order 1001 is abelian.
21. Prove if |G| = 9163, then G has a Sylow 11-subgroup in Z(G).
8.6. A BRIEF INTRODUCTION TO REPRESENTATIONS OF GROUPS 415

22. How many elements of order 7 must exist in a simple group of order 168?
23. Prove that np (G) = 1 is equivalent to the property that all subgroups of G generated by elements of
order p are p-subgroups.
24. Let p be an odd prime. Show that every group of order 2p is isomorphic to Z2p or to Dp .
25. Suppose that |G| = pm where p is a prime and p - m. Prove that gcd(p, m − 1) = gcd(p − 1, m) = 1 if
and only if G has a normal Sylow p-subgroup in Z(G).
26. Suppose that for every prime p dividing |G|, the Sylow p-subgroups are nonabelian. Prove that |G| is
divisible by a cube.
27. Suppose that |G| = p2 q 2 with p and q distinct primes. Prove that if p - (q 2 − 1) and q - (p2 − 1), then
G is abelian.
28. Suppose that H is a subgroup of G such that gcd(| Aut(H)|, |G|) = 1. Prove that NG (H) = CG (H).
29. Prove that if N E G, then np (G/N ) ≤ np (G).
30. Let P be a normal Sylow p-subgroup of a group G and let H ≤ G. Prove that P ∩ H is the unique
Sylow p-subgroup of H.
31. Let P ∈ Sylp (G) and let N E G. Prove that P ∩ N ∈ Sylp (N ). Prove also that P N/N is a Sylow
p-subgroup of G/N .
32. Let G1 and G2 be two groups, both of which have orders divisible by a prime p. Prove that all Sylow
p-subgroups of G1 ⊕ G2 are of the form P1 ⊕ P2 , where P1 ∈ Sylp (G1 ) and P2 ∈ Sylp (G2 ).
33. Let G be a finite group and let M be a subgroup such that NG (P ) ≤ M ≤ G for some Sylow p-subgroup
P . Prove that |G : M | ≡ 1 (mod p).
34. Let p be a prime dividing |G|. Prove that the intersection of all Sylow p-subgroups is the largest
normal p-subgroup in G.

8.6
A Brief Introduction to Representations of
Groups
At first pass, representation theory of finite groups is the study of how to represent finite groups
using matrices in GLn (F ), where F is a field. This addresses the goal, mentioned in the book’s
preface, of conveniently describing groups. Indeed, we understand matrix multiplication so if every
group can be represented by a subgroup of some GLn (F ) then we can study properties of specific
group elements from this perspective. In this sense, representation theory is not unlike group actions,
in which faithful actions provide permutation representations of a group.
More precisely, in a representation of group G, the group acts on a vector space V (over a field F ),
but in which group elements act not just as bijections on V but as invertible linear transformations.
This requirement brings together the group structure with the structure of a vector space in a way
that uncovers interesting results for both theories.
Like group actions, the collection of representations of a group G has all the characteristics of an
algebraic structure. We sometimes generically call representations of a group an action structure,
in which one structure acts on another algebraic structure. As we will see in Chapter 10, there are
many fruitful and interesting action structures from rings, including representations of rings.
The representation theory of groups is broad subfield of algebra and often stands as a course
in its own right. In this section, we introduce the notion of representations of groups, give some
examples, show how they provide another example of an algebraic structure, and then illustrate
some of the interesting interplay between group theory and linear algebra by establishing two key
reducibility results. In so doing, we hope to whet the reader’s appetite for more.
416 CHAPTER 8. GROUP ACTIONS

8.6.1 – Definitions and Examples


As a motivating example, consider the dihedral group Dn . The presentation of Dn described the
group by functions
(1) rotations around a center O: 1, r, r2 , . . . , rn−1 of angle 2πk/n, for rk ;
(2) reflections through lines through O: s, sr, sr2 , . . . , srn−1 .
The standard presentation of Dn is hr, s | rn = s2 = 1 rs = sr−1 i. This presentation encodes all the
algebraic properties of compositions of these functions. However, the functions r and s and various
words we can make from them represent invertible transformations in the plane. In Exercise 3.8.16
we saw that there exists a homomorphism ϕ : Dn → GL2 (R) with
   
cos(2π/n) − sin(2π/n) 1 0
ϕ(r) = and ϕ(s) = .
sin(2π/n) cos(2π/n) 0 −1

With respect to the standard basis of R2 , these matrices correspond respectively to the rotation of
angle 2π/n about the origin and the reflection through the x-axis. The homomorphism ϕ is what
we intend by a representation of Dn . We will call this the standard representation of Dn .

Definition 8.6.1
Let V be a vector space over a field F . The group of invertible linear transformations from
V to V is called the general linear group on V and is denoted by GL(V ).

If V is finite of dimension n, then V ∼ = F n . Furthermore, given an ordered basis B of V ,


associating to a linear transformation its matrix with respect to B gives an isomorphism between
GL(V ) and GLn (F ). Consequently, sometimes these groups are identified but it is only precise to
do so in reference to an ordered basis on V .

Definition 8.6.2
A representation of a group G is a homomorphism ρ : G → GL(V ) for some vector space
V over a field F . If dim V = n, then we sometimes call ρ a representation of G over F of
degree n.

Example 8.6.3. Consider the presentation of Q8 described in Exercise 3.8.5. Consider a function
ϕ : Q8 → GL2 (C) such that
   
i 0 0 −1
ϕ(i) = and ϕ(j) = .
0 −i 1 0

It is not hard to show that this function satisfies the hypotheses of the Generator Extension Theorem
so ϕ extends to a homomorphism from all of Q8 . 4

By now, the reader should be able to define the kernel of a representation for him- or herself.

Definition 8.6.4
Let ρ : G → GL(V ) be a representation of the group G. The kernel of ρ is Ker ρ = {g ∈
G | ρ(g) = idV }. If Ker ρ = {1}, then the representation is called faithful .

Note that if a representation ρ is faithful, then by the First Isomorphism Theorem, G ∼


= Im ρ.
Then ρ gives an embedding of G in GL(V ). If V is also equipped with an ordered basis, the
representation gives an embedding of G in the matrix group GLn (F ).
Note that by properties of group homomorphisms, ρ(1) = idV and ρ(g −1 ) = ρ(g)−1 .
A common abuse of language simplifies both the terminology and the notation. We often refer
to “a representation V of a group G.” By this, we mean that V is a vector space (over a field F )
8.6. A BRIEF INTRODUCTION TO REPRESENTATIONS OF GROUPS 417

equipped with a homomorphism ρ : G → GL(V ). Then, when the homomorphism ρ : G → GL(V )


is understood from context, it is common to write g · v or more simply gv instead of the more precise
expression ρ(g)(v) for how the representation makes a group element g act on a vector v ∈ V .

Example 8.6.5 (Trivial Representation). Let G be a group and let V be any vector space over
any field F . The trivial homomorphism ρ : G → GL(V ) that maps all group elements to the identity
linear transformation from V to V is called the trivial representation. This representation is the
opposite of faithful. 4

Example 8.6.6 (Regular Representation). Let F be a field and G a group. The group ring
(see Section 5.2.3) F [G] has the structure of a vector space over F with the following addition and
scalar multiplication operators:
     
X X X
 ag g  +  bg g  =  (ag + bg )g  ,
g∈G g∈G g∈G
   
X X
r ag g  =  (rag )g  .
g∈G g∈G

The regular representation of a group G involves the action of G on V = F [G] defined by


    !
X X X
x· ag g  =  ag (xg) = ax−1 h h .
g∈G g∈G h∈G

The degree of the regular representation of G is equal to the order |G|.


As a specific example, let G = S3 and consider the regular representation of S3 over R. Let us
order the group elements by 1, (1 2 3), (1 3 2), (1 2), (2 3), and (1 3), and use the standard basis on
R6 with this order on elements of S3 . The regular representation ρ has
 
0 0 1 0 0 0
1 0 0 0 0 0
 
0 1 0 0 0 0
ρ(1 2 3) =  .
0 0 0 0 1 0

0 0 0 0 0 1
0 0 0 1 0 0 4

Example 8.6.7 (Standard Representation of Sn ). Let F be a field. The standard represen-


tation of Sn over F is the degree n representation defined as follows. Let (e1 , e2 , . . . , en ) be the
standard basis on F n . For any vector ~a with coordinates (a1 , a2 , . . . , an ), we define

def
σ · ~a = (aσ−1 (1) , aσ−1 (2) , . . . , aσ−1 (n) ).

This action of Sn on F n has the effect of making the ith basis vector the σ(i)th basis vector. For
example, if ϕ is the standard representation of S5 on R5 , then with respect to the standard basis,
 
0 1 0 0 0
0 0 0 1 0
 
ϕ((1 2 4)(3 5)) = 
0 0 0 0 1 .

1 0 0 0 0
0 0 1 0 0 4
418 CHAPTER 8. GROUP ACTIONS

Example 8.6.8 (Determinant Representation). Let ρ : G → GL(V ) be a representation of G


into a vector space over a field F . Then det ρ : G → F ∗ is a representation of degree 1. Indeed, for
all g1 , g2 ∈ G,
det(ρ(g1 g2 )) = det(ρ(g1 )ρ(g2 )) = det(ρ(g1 )) det(ρ(g2 )).
As a specific example, consider f : Dn → R∗ the determinant of the standard representation of Dn .
This is a representation with f (ra ) = 1 for all rotations ra and f (srb ) = −1 for all reflections. Note
that this representation is not faithful. 4

Observe that if ϕ : G → H is a homomorphism between groups and if ρ : H → GL(V ) is


a representation of H, then the composition ρ ◦ G : G → GL(V ) is a homomorphism, so is a
representation of G. This observation gives rise to many more examples of representations.

8.6.2 – Subrepresentations and Morphisms


For a given group G, representations of G form an algebraic structure. Following the guiding
principles laid out in the preface, the above paragraphs discussed a few motivations for this structure,
presented the definition, and explored a few examples. We consider now subobjects of representations
and morphisms between representations.
Before giving a definition, let us consider what makes sense intuitively. A representation of a
group G involves a vector space V along with an action of G on V in which group elements act as
invertible linear transformations. The set of elements that anchors this structure is the vector space
V . So a subrepresentation should be a subset of V that is a representation of G in its own right.
The following definition makes this precise.

Definition 8.6.9
Let V be a representation of a group G. A subrepresentation of V is a subspace W such
that for all g ∈ G and all w ∈ W , we have gw ∈ W . A subrepresentation of V is also called
a G-invariant subspace of V .

Definition 8.6.10
A representation V of a group G is called irreducible if V 6= {0} and if V has no G-
subrepresentations besides the subspaces {0} and itself.

In the more precise notation, a representation is given by a homomorphism ρ : G → GL(V ).


A subrepresentation involves a subspace W such that for each g ∈ G, the linear transformation
ρ(g) maps W back into itself. We define the restriction of ρ(g) to W as ρ(g)|W . By definition,
ρ(g)|W : W → W is a linear transformation. Furthermore, ρ(g)−1 = ρ(g −1 ) and by definition of
a subrepresentation ρ(g)−1 maps W back into itself. Therefore, ρ(g)|W is invertible with inverse
ρ(g −1 ) so ρ(g)|W ∈ GL(W ). Hence, the function ρW : G → GL(W ) defined by ρW (g) = ρ(g)|W is a
homomorphism for the representation of G on W .
Example 8.6.11. Let G be a finite group with G = {g1 , g2 , . . . , gn } and let F be a field. Consider
the regular representation of G acting on F [G]. Consider the subspace

W = a1 g1 + a2 g2 + · · · + an gn ∈ F [G] a1 + a2 + · · · + an = 0 .

In the regular representation, for all x ∈ G, the action of x on α = a1 g1 + a2 g2 + · · · + an gn simply


permutes the coefficients ai with respect to which elements gi of the group. Consequently, the sum
of all the coefficients does not change under the action of any x ∈ G. Hence, W is an invariant
subspace and, in other words, W is a subrepresentation. 4

Example 8.6.12. Following up the previous example, consider the regular representation of S3
described in Example 8.6.6. Let W be the invariant subspace

W = {~v ∈ R6 v1 + v2 + · · · + v6 = 0}
8.6. A BRIEF INTRODUCTION TO REPRESENTATIONS OF GROUPS 419

described in the previous example. Consider the ordered basis B for W with vectors
     
1 0 0
−1 1 0
     
0 −1 0
~u1 =   ,
  ~u2 =   ,
  ···  0 .
~u5 =  
0 0  
0 0 1
0 0 −1
It is not hard to show that ρ(1 2 3) maps ~u1 , ~u2 , . . . , ~u5 respectively to
         
0 −1 1 0 0
1 0 0 0 0
         
−1 1 0 0 0
 ,  ,  ,  ,  ,
0 0 0 −1 1
         
0 0 0 0 −1
0 0 −1 1 0
which are respectively
~u2 , −~u1 − ~u2 , ~u1 + ~u2 + ~u3 + ~u4 + ~u5 , −~u4 − ~u5 , ~u4 .
Therefore, with respect to the ordered basis B on W , the matrix of ρW ((1 2 3)) is
 
0 −1 1 0 0
1 −1 1 0 0
 
0 0 1 0 0 .
[ρW ((1 2 3))]B =  
0 0 1 −1 1
0 0 1 −1 0 4
We can give a characterization of subrepresentations in terms of matrices. Let V be a represen-
tation of finite dimension n with homomorphism ρ : G → GL(V ). A subspace W of V with dim W
is a subrepresentation of V if and only if V has an ordered basis B = {v1 , v2 , . . . , vn } such that
B1 = (v1 , v2 , . . . , vm ) is an ordered basis of W and for all g ∈ G, with respect to the basis B,
 
 [ρW (g)]B1 A(g) 
[ρ(g)]B =  , (8.7)
 
0 B(g)

where A(g) and B(g) are m × (n − m) and (n − m) × (n − m) matrices, respectively.


Another common topic in any algebraic structure is the notion of a homomorphism between
two objects. A homomorphism, or a function that preserves the structure, should retain the linear
algebraic properties of V and W and should commute with respect to the action of G on V and on
W . Hence, we set the following definition.

Definition 8.6.13
Let G be a group and let V and W be two representations of G. A (representation)
homomorphism from V to W is a linear transformation T : V → W such that

T (g · v) = g · T (v) for all g ∈ G and all v ∈ V .

In this definition, the action g ·T (v) involves the action of G on W . More precisely, this definition
states that if ρV : G → GL(V ) is a representation of G and if ρW : G → GL(W ) is another
representation of G, then for all g ∈ G,
T ◦ ρV (g)(v) = ρW (g) ◦ T (v).
We depict this function relationship with the following diagram.
420 CHAPTER 8. GROUP ACTIONS

ρV (g)
V V

T T

W W
ρW (g)

A function diagram of sets (resp. groups, rings, vector spaces, etc.) is a directed graph in which
every vertex corresponds to a set (resp. group, ring, vector space, etc.) and where every directed
edge corresponds to a function (homomorphism, ring homomorphism, linear transformation, etc.).
A directed path of arrows (directed paths) corresponds to the composition of functions associated
to the directed edges involves. We call a function diagram commutative if any two directed paths
from a domain to a codomain correspond to equal compositions. In particular, the above diagram
is commutative because T ◦ ρV (g) = ρW (g) ◦ T .

Example 8.6.14. Consider the regular representation of S3 into R6 as described in Example 8.6.6.
Consider also the function ϕ : R6 → R defined by

ϕ(~x) = x1 + x2 + · · · + x6 .

This is a linear transformation. As pointed out in Example 8.6.12, in the regular representation,
the sum of the coordinates of a vector does not change under the action of S3 on R6 . Thus, for all
σ ∈ S3 ,
ϕ(σ~x) = ϕ(~x) = σϕ(~x),
assuming that R is equipped with the trivial representation structure. Thus, if V is the regular
representation of S3 over the field R and if U is the trivial representation of dimension 1 over R,
then ϕ : V → U is a group representation homomorphism. 4

The reader may notice that the subspace W described in Example 8.6.12 is the kernel of the group
representation homomorphism described in Example 8.6.14. That the kernel of a homomorphism is
a subrepresentation should not come as a surprise. Indeed, a similar result holds with vector spaces,
groups, rings, and other algebraic structures.

Proposition 8.6.15
Let G be a group and let ϕ : V → W be a homomorphism of representations of G.
(1) The kernel Ker ϕ is a subrepresentation of V .
(2) The image Im ϕ is a subrepresenation of W .

Proof. (Left as an exercise for the reader. See Exercise 8.6.13.) 

In all algebraic structures, two given objects are considered the same if there exists an invertible
homomorphism between them. Again, we define this notion for representations of a group G.

Definition 8.6.16
Let G be a group. An isomorphism between two G-representations V and W is a homo-
morphism of representations that is also a bijection between V and W . If there exists an
isomorphism of representation between V and W , then V and W are called isomorphic or
equivalent representations and we write V ∼
= W.
8.6. A BRIEF INTRODUCTION TO REPRESENTATIONS OF GROUPS 421

If T : V → W is an isomorphism between G-representations then T is an invertible linear


transformation. It is easy to see from Definition 8.6.13 that T −1 is also a homomorphism between
representations. Thus, ρV : G → GL(V ) and ρW : G → GL(W ) satisfy ρW (g) = T ◦ ρv (g) ◦ T −1 .
With respect to bases on V and W , the matrices of ρW (g) and ρV (g) are similar using the same
similarity matrix M of T for all g ∈ G.
Example 8.6.17. Consider a function ρ : D8 → GL2 (R) that has
 √ √ 
2 2 − 52√ 2
 
3 −4
ρ(r) = √ and ρ(s) = .
2 − 2 2 −3

It is not hard to show that these matrices satisfy the same relations as r and s in D8 . Therefore,
by the Generator Extension Theorem, the mapping ρ extends to a unique homomorphism, thereby
defining a representation of D8 . It turns out that this representation is equivalent to the standard
representation of D8 on R2 because
 √ √    √2 √ !
2
−1
2√ 2 − 52√ 2 2 1 √2 −√ 2 2 1
= 2 2
2 − 2 1 1
2 2
1 1

and      −1
3 −4 2 1 1 0 2 1
= .
2 −3 1 1 0 −1 1 1 4

Example 8.6.18. As another example, consider the standard representation ρ of D4 on R2 that


has    
0 −1 1 0
ρ(r) = and ρ(s) = .
1 0 0 −1
Changing the base field from R to C does not change the algebraic relationships between these
matrices so this ρ can be thought of as a representation of D4 on C2 .
We obtain another representation of D4 on C2 by the following association
   
i 0 0 i
ρ2 (r) = and ρ2 (s) = .
0 −i −i 0

Indeed ρ2 (r) has order 4, ρ2 (s) has order 2, and


       −1   
i 0 0 i 0 −1 0 i i 0 0 i −i 0
= = = .
0 −i −i 0 −1 0 −i 0 0 −i −i 0 0 i

Consequently, ρ2 (r) and ρ2 (s) satisfy the relations of r and s and hence they determine a homomor-
phism from D4 to GL2 (C).
The two representations ρ and ρ2 are equivalent because they both map D4 into the same general
linear group GL2 (C) and because
     −1
0 −1 i−1 −i − 1 i 0 i−1 −i − 1
=
1 0 i+1 −i + 1 0 −i i+1 −i + 1

and      −1
1 0 i−1 −i − 1 0 i i−1 −i − 1
= .
0 −1 i+1 −i + 1 −i 0 i+1 −i + 1
Consequently, for all g ∈ G,
   −1
i−1 −i − 1 i−1 −i − 1
ρ(g) = ρ2 (g) .
i+1 −i + 1 i+1 −i + 1 4
422 CHAPTER 8. GROUP ACTIONS

8.6.3 – Complete Reducibility


We conclude this brief introduction to the representation theory of groups by presenting Maschke’s
Theorem, a profound result about the structure of representations of groups.
Maschke’s Theorem pertains to the projection function of a vector space onto a subspace. Con-
sequently, we review the concept of projection from linear algebra.
Let V be a vector space over a field F . Recall that subspaces W and U of V are called comple-
mentary subspaces if W ∩ U = {0} and if for all v ∈ V we can write v = w + u for some w ∈ W
and u ∈ U . We write V = W ⊕ U . It is a simple linear algebra exercise to show that v = w + u for
unique w ∈ W and u ∈ U .
Let W be a subspace of a finite dimensional vector space V and let U be a complementary
subspace to W in V . The projection function projW,U : V → V has projW (v) = w, where w is the
unique vector in W when v is written as v = w + u for w ∈ W and u in the complementary subspace
U.

Lemma 8.6.19
The function of projection projW,U : V → V is a linear transformation with image W .

Proof. Consider the projection function projW,U . Let v1 , v2 ∈ V be two vectors and suppose that
they can be written (uniquely) as

v1 = w1 + u1 and v2 = w 2 + u 2

with w1 , w2 ∈ W and u1 , u2 ∈ W 0 . Then

v1 + v2 = (w1 + u2 ) + (w10 + u2 )

so projW,U (v1 + v2 ) = projW,U (v1 ) + projW,U (v2 ). Similarly, if λ ∈ F and v ∈ V , then writing
v uniquely as v = w + u with w ∈ W and u ∈ U , we have a unique expression for λv, namely
λv = (λw) + (λu). Together these prove that the projection function is a linear transformation. 

It is important to note that we used the cumbersome notation projW,U because this projection
depends not only on the subspace W but on the choice of complementary subspace U . Notice from
the construction that projW,U w = w for all w ∈ W . Hence, the composition of projW,U with itself is
again projW,U . This motivates the following definition general definition of a projection, applicable
even if V is not necessarily finite dimensional.

Definition 8.6.20
Let V be a vector space over a field F . A projection is a linear transformation π : V → V
such that π is idempotent, i.e., π ◦ π = π. We say that the linear transformation π projects
from V onto W = Im π.

Proposition 8.6.21
Let π : V → V be a projection. Then V = Im π ⊕ Ker π.

Proof. Let v ∈ V . Obviously, v = π(v) + (v − π(v)). Note that π(v) ∈ Im π and

π(v − π(v)) = π(v) − π(π(v)) = π(v) − π(v) = 0,

so v − π(v) ∈ Ker π. Furthermore, if u ∈ Im π ∩ Ker π, then π(u) = u since u ∈ Im π and π(u) = 0


since u ∈ Ker π. Thus, u = 0 and Im π ∩ Ker π = {0}. This shows that Im π and Ker π are
complementary subspaces. 
8.6. A BRIEF INTRODUCTION TO REPRESENTATIONS OF GROUPS 423

We apply the concept of projections to representations of a group G and subrepresentations. Let


G be a group, V a representation of G, and W a subrepresentation.

Theorem 8.6.22 (Maschke’s Theorem)


Let G be a finite group and let F be a field of characteristic 0 or such that char F does not
divide |G|. Let V be a representation of V over F . For every subrepresentation W of V ,
there exists another subrepresetation U of V such that V = W ⊕ U .

Before proving Maschke’s Theorem, we discuss why it is not obvious.


Let U be a complementary subspace to W in V . Let v ∈ V and suppose that v decomposes
uniquely into v = w + u with w ∈ W and u ∈ U . Then for all g ∈ G, we also have gv = g(w + u) =
(gw) + (gu). Since W is a subrepresentation gw ∈ W . However, U need not be a subspace so we do
not know if gu ∈ U , i.e., that U is a subrepresentation as well.
From another perspective, consider some projection π of V onto the subspace W . If π were a
G-representation homomorphism, then by Proposition 8.6.21 the decomposition V = Im π ⊕ Ker π
would express V as a direct sum of two subrepresentations. However, we do not know (and in general
it is not) whether π is a representation homomorphism.
The strategy behind the proof of Maschke’s Theorem involves finding a G-representation homo-
morphism ψ : V → V with Im ψ = W .

Proof (Theorem 8.6.22). Let π : V → V be any projection of V onto W . Define the function
ψ : V → V by
1 X −1
ψ(v) = g π(gv).
|G|
g∈G

For all g ∈ G, the mapping v 7→ g −1 · π(g · v) is a composition of linear transformations so is a linear


transformation and hence ψ is itself a linear transformation.
Next, we show that ψ is another projection onto W . For all w ∈ W , we have gw ∈ W so
π(gw) = gw since π is a projection on W , and then g −1 π(gw) = g −1 (gw) = w. Thus,

1 X |G|
ψ(w) = w= w = w,
|G| |G|
g∈G

so ψ ◦ ψ = ψ.
Finally, we show that ψ is also a G-representation homomorphism. Let x ∈ G and let v ∈ V .
Then
1 X −1
ψ(v) = g π(gxv).
|G|
g∈G

Note that as g runs through all elements in g, the elements gx also run through all the elements of
G. So we can change the summation variable via h = gx so that g = hx−1 and g −1 = xh−1 . Thus,
 
1 X 1 X
ψ(v) = x(gx)−1 π(gxv) = x  (gx)−1 π(gxv)
|G| |G|
g∈G g∈G
!
1 X −1
=x h π(hv) = xψ(v).
|G|
h∈G

So ψ is a projection onto W that is also a G-representation homomorphism. Thus, Im ψ and Ker ψ


are subspaces of V and so
V = Im ψ ⊕ Ker ψ = W ⊕ Ker ψ.

The theorem follows. 


424 CHAPTER 8. GROUP ACTIONS

Suppose that V is a finite dimensional representation of a group G with homomorphism ρ : G →


GL(V ). In terms of the matrices of representations, Maschke’s Theorem implies a stronger result
than expressed by (8.7). If W is a subrepresentation of V , then there exists another subrepresentation
U along with an ordered basis B = (v1 , v2 , . . . , v) such that B1 = (v1 , v2 , . . . , vm ) is an ordered basis
of W and B2 = (vm+1 , vm+2 , . . . , vn ) is an ordered basis of U . Then with respect to this basis, the
matrix of ρ is
 
 [ρW (g)]B1 0 
.
[ρ(g)]B = 
 
0 [ρU (g)]B2

With respect to B, the matrix of ρ(g) is not just block upper triangular but block diagonal.

Definition 8.6.23
A representation V of a group G is called completely reducible if

V = W1 ⊕ W2 ⊕ · · · ⊕ Ws

with Wi irreducible for each 1 ≤ i ≤ s.

Maschke’s Theorem leads to the following important result in representation theory.

Corollary 8.6.24
Let G be a finite group and let F be a field of characteristic 0 or such that char F does not
divide |G|. Let V be a finite dimensional representation of V over F . Then V is completely
reducible.

Proof. Let s be the largest integer such that V can be written as a direct sum of subrepresentations
of dimension 1 or greater. Since dim V is finite, by the well-ordering principle, such an integer exists.
Let
V = W 1 ⊕ W 2 ⊕ · · · ⊕ Ws (8.8)
be an expression of V as a direct sum of subrepresentations. Assume that one of the Wi is not
irreducible. Suppose, without loss of generality that Ws has a subrepresentation. Then by Maschke’s
Theorem, Ws = U1 ⊕ U2 , where U1 and U2 are subrepresentations of Ws , and thus are in turn
subrepresentations of V . Hence,

V = W1 ⊕ W2 ⊕ · · · ⊕ Ws−1 ⊕ U1 ⊕ U2 ,

contradicting the maximality of s. Hence, by contradiction, we conclude that in (8.8), all the Wi
subrepresentations are irreducible.

We reiterate that this section merely offered a glimpse into the study of representations of groups.
Indeed, the theory of representations of finite groups offers many interesting surprises about the
internal structure of a group. 

Exercises for Section 8.6


1. Consider the cyclic group Z6 = hz | z 6 = 1i. Prove that mapping
 
0 −1
z 7→
1 1

defines a representation of Z6 on R2 .
8.6. A BRIEF INTRODUCTION TO REPRESENTATIONS OF GROUPS 425

2. Consider the dihedral group D4 . Prove that mapping


   
i 0 1 0
r 7→ and s 7→
0 −i 0 −1

does not define a representation on D4 in C2 .


3. Find four nonisomorphic one-dimensional representations of D4 .
4. Prove that the standard representation of Dn in R2 is an irreducible representation over the field R.
5. Prove that a function ϕ : Q8 → GL4 (R) such that
   
0 −1 0 0 0 0 −1 0
1 0 0 0 0 0 0 −1
ϕ(i) = 
  and ϕ(j) = 
 
0 0 0 1 1 0 0 0 
0 0 −1 0 0 1 0 0

extends to homomorphism and thereby defines a representation of Q8 on R4 .


6. Prove that a function ϕ : Q8 → GL2 (C) such that
   
i 0 0 −1
ϕ(i) = and ϕ(j) =
0 −i 1 0

extends to homomorphism and thereby defines a representation of Q8 on C2 .


7. Let G = Z3 = hz | z 3 = 1i. Show that the mapping
 
7 −3
ρ(z) =
19 −8

defines a representation of Z3 of degree 2 over C. Also find a nontrivial subrepresentation.


8. Prove that the determinant of the standard representation of Sn corresponds to the sign homomor-
phism.
9. Let G be the group G = hx, y | x7 = y 3 = 1, yxy −1 = x2 i. Let F be the finite field F7 . Prove that the
mapping    
1 1 2 0
x 7→ and y 7→
0 1 0 1
defines a degree-2 representation of G over the field F7 .
10. Let V be the n-dimensional standard representation representation of G = Sn over C. Show that

W = {~
x ∈ V | x1 + x2 + · · · + xn = 0}

is a subrepresentation of V .
11. Let G = GLn (R) and let V = Mn×n (R) be the vector space of n × n matrices of real coefficients.
(a) Prove that the action of G on V defined by g · A = gAg −1 is a representation ρ of GLn (R).
(b) Identifying M2×2 (R) with R4 via the standard basis on Mn×n (R), express
 
a b
ρ( )
c d

for all invertible matrices in GL2 (R).


12. Let V be a representation of a group G. Prove that the subset

V G = {v ∈ V | gv = v for all g ∈ G}

is a subrepresentation of V .
13. Prove Proposition 8.6.15.
14. Let G be a finite group and let V be a finite-dimensional representation over C. Prove that G is
diagonalizable and that all the eigenvalues of G are roots of unity.
426 CHAPTER 8. GROUP ACTIONS

15. Let V be a representation of G and let v ∈ V . Prove that Span(gv | g ∈ G) is a subrepresentation of


V . Conclude that every representation V of a finite group G has a subrepresentation of a dimension
less than or equal to |G|.
16. Consider the abelian group G = (Z, +) and the function ρ : G → GL2 (C) given by
 
1 n
ρ(n) = .
0 1

Show that ρ is a representation that is not completely reducible. (This gives a counterexample to
Maschke’s Theorem when G is infinite.)
17. Let ρ : G → GL(V ) be a representation of a group G. Prove that ρ gives a faithful representation of
the group G/ Ker ρ.
18. (Schur’s Lemma) Let G be a group and let V and W be representations of G over the field C.
(a) Prove that a G-representation homomorphism ϕ : V → W is either trivial (maps to 0) or is an
isomorphism.
(b) Prove all isomorphisms ϕ : V → V are of the form λidV . [Hint: Let λ be an eigenvalue of ϕ and
consider the linear transformation ϕ − λidV .]
19. Use Schur’s Lemma (Exercise 8.6.18) to prove the following result. Let V be a representation of G
over a field C and suppose that V = W1 ⊕ W2 ⊕ · · · ⊕ Ws with Wi irreducible submodules. Prove that
every irreducible U of V is isomorphic to Wi for some i.
20. Use Schur’s Lemma (Exercise 8.6.18) to prove that every irreducible representation over C of a finite
abelian group is one dimensional.

8.7
Projects
2
Project I. Ebbs and Flows. As a first example, consider the differential dx dt = x with x(t) a
real-valued real function. Show how the solutions of this differential equation form a flow on
R, thereby producing an action of the group (R, +) on R. (See Example 8.2.3.) Repeat the
same discussion, emphasizing the flow and action of (R, +), with other (more complicated)
differential equations, including some systems of differential equations.

Project II. Sudoku and Group Actions. Project IV in Chapter 3 discussed the group of
symmetries within the set of permissible Sudoku fillings. Again, let S be the set of all possible
solutions to a Sudoku puzzle, i.e., all possible ways of filling out the grid according to the rules,
and G the group of transformations described in that project. By construction, the set G acts
on S. With the formalism of group actions, we can attempt to address this problem more
effectively. For example, we now call two Sudoku fillings equivalent if they are in the same G
orbit.
Try to answer some of the following questions and any others you can imagine. Is the action
of G on S faithful or transitive or free? If it is not transitive, are the orbits all the same size?
If not, is there a range on what the sizes of the orbits can be? How many orbits are there?

Project III. Young’s Geometry. The Fano plane described in Exercise 8.3.17 is an example
of a finite geometry. In the study of finite geometries, one starts with a set of axioms and
deduces as many theorems as possible. In a finite geometry, the axioms do not explicitly refer
to a finite number of points or lines but lead to there only being a finite number. In Young’s
geometry, it is possible to prove that there are 9 points and 12 lines. Furthermore, a possible
configuration representing the subset structure of points to lines is as follows.
8.7. PROJECTS 427

A B C

D E F

G H I

Study the subgroup of the permutation group S9 acting on the points that preserves the line
structure in Young’s geometry.
Project IV. Invariants in Polynomial Rings. Let X = C[x1 , x2 , . . . , xn ] be the ring of multi-
variable polynomials and consider the action of Sn defined by

σ · p(x1 , x2 , . . . , xn ) = p(xσ(1) , xσ(2) , . . . , xσ(n) ).

Find X σ for various permutations σ. Can you deduce any conclusions about these fixed
subsets? For various subgroups H ≤ Sn , can you determine or say anything about the subset
of X fixed by all of H?
Project V. Quotient G-Sets. Discuss the possibility of taking quotient objects (similar to quo-
tient groups or quotient sets) in the context of G-sets. Given interesting examples. Try to
prove as many interesting propositions that you can about quotient G-sets that illustrate the
connection between this quotient process and the group action.
Project VI. Coloring Tilings. Consider one or more of M. C. Escher’s paintings in the sym-
metry category. (See the official website http://www.mcescher.com/gallery/symmetry/.) Dis-
cuss different strategies of coloring the copies of the fundamental region so that the number
of possible colorings for a given strategy is finite. Discuss the subgroup or quotient group
corresponding to periodicity of a coloring.
Project VII. Coloring the Soccer Ball. Use the Cauchy-Frobenius Lemma to discuss the
number of ways to color the soccer ball using 2, 3, 4, or more colors, where we view two
colorings as the same if one can be obtained from the other by some rigid motion of the ball.
The standard black-and-white coloring of a soccer ball is invariant under the group of rigid
motions of the ball. Discuss the number of colorings that are invariant under some subgroups
of the group of rigid motions.
Project VIII. Sylow p-Subgroup Conjecture. Let G be a finite group. We saw in Section 8.5
that if np = 1 for some prime p that divides |G|, then that unique Sylow p-subgroup is normal.
Therefore, if G is simple then np > 1 for all p dividing |G|. Discuss the conjecture of the
converse: If np (G) > 1 for all primes p dividing |G| then G is simple.
428 CHAPTER 8. GROUP ACTIONS

Project IX. Simple Isn’t So Simple. Write a computer that uses Sylow’s Theorem to eliminate
all orders between 1 and say 1000 (or more) for which groups of that order cannot be simple.
For any orders that could have a simple group, list the np for all p dividing the order.
Project X. Sylow p-Subgroups Action. Sylow’s Theorem affirms that a group G acts tran-
sitively by conjugation on Sylp (G). Explore whether this action is multiply transitive. If it
is not, explore whether the action of G on Sylp (G) system of blocks. Discuss theoretically or
with examples.
Project XI. The S4 Tetrahedron. In (R+ )3 (the first octant in Euclidean 3-space), there are
6 permutations of the inequalities x1 ≤ x2 ≤ x3 , each one corresponding to the action of an
element in S3 on 0 ≤ x1 ≤ x2 ≤ x3 . Each of the inequalities corresponds to a region of (R+ )3 .
Intersected with the plane x1 + x2 + x3 = 1 produces 6 regions in an equilateral triangle.

In this project, consider the action of S4 on (R+ )4 that permutes the coordinates. There are 24
permutations of the inequalities x1 ≤ x2 ≤ x3 ≤ x4 , each one corresponding to a cone-shaped
region in (R+ )4 . The intersection of (R+ )4 with the plane x1 + x2 + x3 + x4 = 1 is a regular
tetrahedron. The regions 0 ≤ xi ≤ xj ≤ xk ≤ x` for different indices cut this tetrahedron into
24 regions, each one corresponding to the action of an element in S4 .
Here are just a few ideas to explore: Physically construct a regular tetrahedron. Trace (at
least on the surface) these 24 regions and show which ones correspond to which element in S4 .
Discuss orbits of subgroups of S4 or quotient groups of S4 in reference to the tetrahedron. Use
this tetrahedron to provide a faithful representation of S4 in R2 .
Project XII. Combinatorial Identities. Example 8.2.15 and Exercises 8.2.13, 8.2.14, and
8.2.15 established known combinatorial formulas from the Orbit Equation. Find a number
of other combinatorial identities in a discrete mathematics or combinatorics textbook and try
to recover these combinatorial identities as the orbit equations of some group actions.
9. Classification of Groups

On a finite set, there can only be a finite number of distinct binary operations. (If the set S has
2
|S| = n, there are only nn binary operations.) Imposing the three axioms for a group restricts
the possibilities considerably, especially when we consider two groups as the same if there exists an
isomorphism between them.
For its usefulness in further study and simply as a research goal, it is a common theme in algebra
to find all objects with a given structure. Such problems are called classification problems. Though
we did not specifically refer to this direction of study in the preface, classification problems arise
naturally in the context of different algebraic structures. This is particularly true when there exists
a finite number of objects or a finite number of infinite families of objects with a certain property.
Some theorems and exercises already addressed some classification results concerning finite
groups. Early on, we presented the classification of groups of order 8 (Example 3.3.11: 3 abelian
groups, D4 and Q8 ) and of order 6 (Exercise 3.3.31: Z6 and D3 ). With Lagrange’s Theorem, we
know immediately that every group of prime order p must be isomorphic to the cyclic group Zp . In
Exercise 4.1.35, we showed that if p is prime then every group of order 2p is isomorphic to Z2p or to
Dp . The Fundamental Theorem of Finitely Generated Abelian Groups is a profound result in group
theory that completely answers the classification problem, but only for finite abelian groups.
In this chapter we present ideas in the project to find all groups of a given finite order. Section 9.1
introduces the Jordan-Hölder Program, a general strategy to find all groups of a given size. From the
perspective of the program, we also introduce the notion of solvable groups. The name “solvable”
groups foreshadows the importance of these groups in the application of Galois Theory (Chapter 11).
The following two sections expand on the program to classify groups. In Section 9.2 on finite
simple groups, we prove the simplicity of a few families of groups and state the classification theorem
of finite simple groups, one of the crowning achievements of group theory in the 20th century. The
second part of the Jordan-Hölder Program involved finding ways to combine small groups into bigger
ones. The direct product is an example of this process of combining smaller groups into a bigger one.
Section 9.3 introduces the operation of a semidirect product—a more general method to combine
small groups of a given size into a single larger group.
Using methods of the semidirect product, we spend Section 9.4 presenting various classification
results about groups. Finally, in Section 9.5 we consider nilpotent groups and solvable groups, two
classes of groups that arise naturally in the Jordan-Hölder Program.

9.1
Composition Series and Solvable Groups
The Lattice or Fourth Isomorphism Theorem of groups states that if N is a normal subgroup of
G, then the quotient group G/N is the group whose structure is the same as G but above N in
the subgroup lattice of G. Similarly, the lattice of G below N is simply the lattice of the group
N . Intuitively speaking, much of the structure of G is contained in the structure of G/N and N .
Furthermore, if G has a normal subgroup N , then it is somehow “made up” from the smaller groups
G/N and N . This intuitive perspective gave rise to the Jordan-Hölder Program, whose ultimate
goal is a method to find all finite groups of a given order.

9.1.1 – Composition Series


Suppose that G is a group with a normal subgroup N . If G/N or N is not simple, then they can be
“broken down” further by the same process. This process reminds us of a factor tree and the process

429
430 CHAPTER 9. CLASSIFICATION OF GROUPS

of finding the prime factorization of a positive integer. In this metaphor, simple groups are similar
to the prime factors because simple groups have no normal subgroups besides the trivial subgroup
and itself.
We need to move away from metaphor to precise notions relevant to group theory.

Definition 9.1.1
Let G be a group and consider a sequence of subgroups

1 = N0 ≤ N1 ≤ N2 ≤ · · · ≤ Nk = G.

This sequence is called a composition series if Ni−1 E Ni and if Ni /Ni−1 is a nontrivial


simple group for all 1 ≤ i ≤ k. The number k is called the length of the composition series
and the quotient groups Ni /Ni−1 are called the composition factors.

We will often denote a composition series of G by 1 = N0 E N1 E N2 E · · · E Nk = G but we recall


that the normal subgroup relation E is not transitive. When discussing the composition factors of a
group, we usually refer to their isomorphism class. For example, if one of the factors Ni /Ni−1 ∼= Z5 ,
then we say that Z5 is a composition factor as opposed to referring explicitly to the specific quotient
group.
There are two ways in which composition series differs from the usual factorizations of positive
integers. First, one can find different groups with isomorphic composition factors. For example,
Z2 ⊕ Z2 and Z4 have two copies of Z2 as composition factors. Both have a composition series
of length 2 involving only the composition factors Z2 . Second, a group may have an alternative
composition series as the following example illustrates.

Example 9.1.2. Consider the dihedral group D6 with the following two composition series:

1 E hsi E hs, r3 i E D6 and 1 E hr2 i E hri E D6 .

The composition factors corresponding to each series are

D6 /hs, r2 i ∼
= Z3 with generator rhs, r3 i, D6 /hri ∼= Z2 with generator shri,
2 ∼ 2 2 2 ∼
hs, r i/hsi = Z2 with generator r hs, r i, hri/hr i = Z2 with generator rhr2 i,
hsi ∼= Z2 with generator s, hr2 i ∼
= Z3 with generator r2 . 4

However, in this last example there does appear to be some structure, namely that the list of
composition factors is the same after reordering. The following theorem formalizes what can happen
among different composition series of a given group.

Theorem 9.1.3 (Jordan-Hölder Theorem)


Let G be a nontrivial finite group. Then G has a composition series. Furthermore, if

1 = M0 E M1 E M2 E · · · E M r = G and
(9.1)
1 = N0 E N1 E N2 E · · · E Ns = G

are two composition series of G, then r = s and there is some permutation σ ∈ Sr such
that
Mi /Mi−1 ∼= Nσ(i) /Nσ(i)−1
for all 1 ≤ i ≤ r.

Before proving this theorem, we will need the evocatively named Butterfly Lemma (also called
Zassenhaus Lemma) because of the diagram of subgroups involved.
9.1. COMPOSITION SERIES AND SOLVABLE GROUPS 431

A K

B(A ∩ K) (A ∩ K)L

A∩K

B(A ∩ L) (B ∩ K)L

B (B ∩ K)(A ∩ L) L

B∩K A∩L

Figure 9.1: Diagram for the Butterfly Lemma

Lemma 9.1.4 (Butterfly Lemma)


Let A and K be subgroups of a group G and let B E A and L E K. Then

B(A ∩ L) E B(A ∩ K) and (B ∩ K)L E (A ∩ K)L

and
B(A ∩ K)/B(A ∩ L) ∼
= (A ∩ K)L/(B ∩ K)L.

Proof. Throughout this proof, we use the fact that for subgroups H, K ≤ G, we have HK ≤ G if
and only if HK = KH as sets. (See Exercise 4.1.36.)
See Figure 9.1 for the relative configuration of subgroups involved. In this diagram, if H3 is the
immediate successor of H1 and H2 , then H3 = H1 H2 and if H3 is the immediate predecessor of
H1 and H2 , then H3 = H1 ∩ H2 . It is not hard to check that all the shown subsets are subgroups
because of B E A and L E K. The only tricky verification occurs at the subgroup where the butterfly
head is (B ∩ K)(A ∩ L). It has a few equivalent expressions:
(B ∩ K)(A ∩ L) = B(A ∩ L) ∩ (B ∩ K)L = B(A ∩ L) ∩ (A ∩ K) = (A ∩ K) = (B ∩ K)L.
The first equality is obvious. However, the second equality holds because if y = bx ∈ A ∩ K
with b ∈ B and x ∈ A ∩ L, then we must have b ∈ K since L is only a subgroup of K. Thus,
B(A ∩ L) ∩ (A ∩ K) ⊆ (B ∩ K)(A ∩ L) and the reverse inclusion is obvious. The fourth equality
holds via the same reasoning as for the third.
Since L E K, then A ∩ L E A ∩ K. Let g ∈ A ∩ K and let bx ∈ B(A ∩ L). Then
g(bx)g −1 = (gbg −1 )(gxg −1 ).
But gbg −1 ∈ B because g ∈ A and B E A and gxg −1 ∈ A ∩ L because A ∩ L E A ∩ K. Hence,
A ∩ K ≤ NG (B(A ∩ L)) and so we can apply the Second Isomorphism Theorem to conclude that
B(A ∩ L)(A ∩ K)/B(A ∩ L) = B(A ∩ K)/B(A ∩ L) ∼
= (A ∩ K)/(B ∩ K)(A ∩ L).
These quotient groups correspond to the wing on the left side of Figure 9.1. Applying the same
reasoning to the right side, we deduce that
B(A ∩ K)/B(A ∩ L) ∼
= (A ∩ K)/(B ∩ K)(A ∩ L) ∼
= (A ∩ K)L/(B ∩ K)L
and the lemma follows. 
432 CHAPTER 9. CLASSIFICATION OF GROUPS

Proof (Theorem 9.1.3). We first prove that every nontrivial finite group G has a composition series
by induction on |G|. First, suppose that |G| = 2. Then G = Z2 and G is simple so it has a
composition series of length 1. Now suppose that |G| = n and that all groups H of order |H| < n
have a composition series. If G is simple, then again it has a composition series {1} E G that is of
length 1. On the other hand, if G is not simple then it has a normal subgroup N . By the induction
hypothesis, both N and G/N have composition series

1 =K0 E K1 E K2 E · · · E Kr = N and
1 =M 0 E M 1 E M 2 E · · · E M s = G/N.

By the Fourth Isomorphism Theorem, there exist subgroups Mi with 1 ≤ i ≤ s such that N ≤ Mi ≤
G with Mi−1 E Mi and M i = Mi /N . Hence, G has a chain of successively normal subgroups

1 = K0 E K1 E K2 E · · · E Kr = M0 E M1 E M2 E · · · E Ms = G.

By definition, Ki /Ki−1 for 1 ≤ i ≤ r. By the Third Isomorphism Theorem, Mi /Mi−1 ∼ = M i /M i−1


so Mi /Mi−1 is simple. Thus, G has a composition series.
Now suppose that G has two composition series as expressed in the statement of Theorem 9.1.3.
The proof strategy involves combining the two composition series to create two longer chains of
subgroups (refinements).
Define for each i with 1 ≤ i ≤ r, define Mi,j = Mi−1 (Mi ∩ Nj ) for 0 ≤ j ≤ s. Similarly, for each
j with 0 ≤ j ≤ s, define Nj,i = Nj−1 (Nj ∩ Mi ). Note that

Mi−1 = Mi,0 E Mi,1 E · · · E Mi,s = Mi for all 1 ≤ i ≤ r,


Nj−1 = Nj,0 E Nj,1 E · · · E Nj,r = Nj for all 1 ≤ j ≤ s.

Insert these chains into the composition series given in (9.1) respectively to create two new chains,
each of length rs. We temporarily call these chains expanded composition series because they are
composition series except that quotients between successive groups are simple though not necessarily
nontrivial.
Using the Butterfly Lemma with A = Mi , B = Mi−1 , K = Nj , and L = Nj−1 , we deduce that

Mi−1 (Mi ∩ Nj )/Mi−1 (Mi ∩ Nj−1 ) ∼


= Nj−1 (Nj ∩ Mi )/Nj−1 (Nj ∩ Mi−1 )

or, in other words, that Mi,j /Mi,j−1 ∼


= Nj,i /Nj,i−1 . Notice that by the Fourth Isomorphism Theo-
rem,
{1} = Mi,0 /Mi−1 E Mi,1 /Mi−1 E · · · E Mi,s /Mi−1 = Mi /Mi−1
is an expanded composition series of Mi /Mi−1 . Since Mi /Mi−1 is simple, then for each i, there exists
only one index j = σ(i) such that Mi,j /Mi,j−1 6= {1} and for that index we have Mi,j /Mi,j−1 ∼ =
Mi /Mi−1 . The same observation is also true of the Nj,i expanded composition series. Then

Mi /Mi−1 ∼
= Mi,σ(i) /Mi,σ(i)−1 ∼
= Nσ(i),i /Nσ(i),i−1 ∼
= Nσ(i) /Nσ(i)−1 .

That r = s immediately follows. 

9.1.2 – The Hölder Program


The Jordan-Hölder Theorem shows that the set (counted with multiplicity) of composition factors of
a group is the same for all composition series. The theorem leads naturally to the Hölder Program,
which involves a two-pronged effort.

(1) Classify all finite simple groups.

(2) Classify all groups with a given set of composition factors.


9.1. COMPOSITION SERIES AND SOLVABLE GROUPS 433

Note that if G is a group with a given list of composition factors K1 , K2 , . . . , Kn , then the order
|G| is the product of the orders of Ki . Consequently, knowing all the finite simple groups of a given
order and knowing |G| we can deduce the possible composition factors of G. The second part of
the Hölder Program would then allow us to find all groups with those possible composition factors.
Hence, this additional strategy would allow us to find all groups of any given order.
This program drove much of the research in group theory during the 20th century. One of the
greatest collaborative achievements of group theory is the complete solution of the first part of the
program. We will discuss this in the following section. The second part of the Hölder Program
is much more complicated. However, under some circumstances, there are a few strategies for this
second part, in particular using the semidirect product, discussed in Section 9.3. By virtue of Sylow’s
Theorem and the semidirect product, for many integers n especially if n is not too large, it is possible
to classify all groups of order n.

9.1.3 – Solvable Groups

Definition 9.1.5
A group G is solvable if there is a chain of subgroups

{1} = M0 E M1 E M2 E · · · E Mr = G

such that Mi /Mi−1 is abelian for all 1 ≤ i ≤ r.

For finite groups, this definition has a simpler equivalent. In the metaphor that compared a
composition series of a group to a prime factorization of a positive integer, prime numbers correspond
to simple groups. However, among the simple groups there is a family of groups closely connected
to prime numbers, namely the cyclic groups of prime order Zp .

Proposition 9.1.6
A finite group G is solvable if and only if each of its composition factors is a finite cyclic
group.

Proof. (Left as an exercise for the reader. See Exercise 9.1.7.) 

By using Sylow’s Theorem for low values of a positive integer n, it is possible to show that n = 60
is the first integer for which there may exist a simple group that is not isomorphic to Zp for some
prime p. Furthermore, we can show that the only simple group of order 60 is A5 . (Exercise 9.2.10.)
Hence, every group of order less than 60 is solvable.
As we will see in Chapter 11, solvable groups are important in the study of solutions of polynomial
equations. Consequently, classifying solvable groups is a key ingredient in the advanced study of
roots of polynomials. However, the effort to classify all solvable groups involves challenging group
theory. For example, using representation theory of groups, it is possible to prove (Burnside’s
Theorem) that if |G| = pa q b for some primes p and q, then G is solvable. A far more challenging
result is that (Feit-Thompson Theorem) if |G| is odd, then G is solvable. When this latter theorem
was first proved, it occupied 255 pages [23].
We conclude this section with a characterization of solvable groups involving commutators.
434 CHAPTER 9. CLASSIFICATION OF GROUPS

Definition 9.1.7
Let G be a group.
(1) The commutator of two group elements x, y ∈ G is [x, y] = x−1 y −1 xy.
(2) If H, K ≤ G, we denote by [H, K] the subgroup generated by commutators of elements
in H and elements in K, namely [H, K] = h[h, k] | h ∈ H and k ∈ Ki.
(3) The commutator subgroup of G is G0 = [G, G].

If x and y commute, the x, y, x−1 and y −1 all commute so [x, y] = x−1 xy −1 y = 1. Consequently,
if G is abelian, then G0 = {1}. However, the converse is also true, so G is abelian if and only if
G0 = {1}. We observe two simple relations among commutators:

[y, x] = y −1 x−1 yx = (x−1 y −1 xy)−1 = [x, y]−1

and
[x, y] = [x, xy] = [yx, y].
Note that from the form of the commutator, if x, y, a, b ∈ G, there is no reason why [x, y][a, b]
should be the commutator of two other group elements. In fact, by taking the extreme case of a free
group on 4 elements hx, y, a, bi, we see that

[x, y][a, b] = x−1 y −1 xya−1 b−1 ab

does not simplify and therefore cannot be put into the form of a commutator of two elements.
Therefore, the subset of commutators from elements in H and K might not be a subgroup and so
we must consider the subgroup generated by the subset of commutators. This observation holds for
[G, G] as well.
Example 9.1.8. Consider the dihedral group Dn on the regular n-gon. We calculate

[ra , rb ] = 1, [sra , rb ] = r2b , [ra , srb ] = r−2a , [sra , srb ] = r2b−2a .


0
This covers all possibilities of elements in D2n so we deduce that D2n = hr2 i. We note that if n is
2 0
odd then hr i = hri. So the index of D2n is 4 if n is even and only 2 if n is odd. 4

Proposition 9.1.9
Let G be a group. Then G0 is a characteristic subgroup of G. Furthermore, G0 is the
smallest (by inclusion) normal subgroup of G such that G/G0 is abelian.

Proof. Let ψ ∈ Aut(G). Then for all x, y ∈ G,

ψ([x, y]) = ψ(x−1 )ψ(y −1 )ψ(x)ψ(y) = ψ(x)−1 ψ(y)−1 ψ(x)ψ(y) = [ψ(x), ψ(y)].

Since G0 is the subgroup generated by all commutators, then ψ applied to any generator (commu-
tator) is again another generator of G0 so ψ(G0 ) = G0 and G0 is characteristic.
Since G0 is characteristic, it is also normal. Let xG0 and yG0 be in the quotient G/G0 . Then

(xG0 )(yG0 ) = (xy)G0 = (xy[y, x])G0 = (xyy −1 x−1 yx)G0 = (yx)G0 = (yG0 )(xG0 ),

so G/G0 is abelian.
Now suppose that N is any normal subgroup such that G/N is abelian. The criterion that
(xN )(yN ) = (yN )(xN ) is equivalent to

N = (xN )−1 (yN )−1 (xN )(yN ) = x−1 y −1 xyN = [x, y]N,

which is equivalent to [x, y] ∈ N . Thus, the commutator of every element in G is in N so G0 ≤ N .


9.1. COMPOSITION SERIES AND SOLVABLE GROUPS 435

The characterizing property to G0 can be restated to say that G/G0 is the largest abelian quotient
group of G in the sense that if N E G with G/N abelian, then G0 ≤ N .

Definition 9.1.10
Let G be a group. The commutator series of G is the chain of subgroups

G = G(0) ≥ G(1) ≥ G(2) ≥ · · ·

where G(i) is defined inductively by G = G(0) and G(i) = [G(i−1) , G(i−1) ].

Note that if G is nonabelian and simple, then G0 E G and G0 6= {1}. Hence, G0 = G.


Example 9.1.11. Consider Example 9.1.8. Since hr2 i is abelian, then its own commutator is trivial.
Hence, the commutator series of Dn is
{1} ≤ hr2 i ≤ Dn . 4

Example 9.1.12. Consider the group S4 . This group contains only two normal subgroups, namely
A4 and K = h(1 2)(3 4), (1 3)(2 4)i. However, S4 /K is not abelian so S40 6= K. On the other hand,
S4 /A4 ∼
= Z2 , which is abelian so A4 = S40 . In Exercise 9.1.13, we prove that the commutator subgroup
of A4 is precisely K. Finally, K ∼ = Z2 ⊕ Z2 so it is abelian. This shows that the commutator series
of A4 is
{1} ≤ K ≤ A4 ≤ S4 . 4

Example 9.1.13. Let G = S5 . The symmetric group S5 only has one normal subgroup, namely
A5 . By Exercise 4.2.27, A5 is simple. Since S5 /A5 ∼ = Z2 is abelian, then A5 is the commutator
subgroup. Note that from the definition, it is obvious that the commutator of any two elements in
S5 must be written using an even number of transpositions and hence S50 ≤ A5 . Then, since A5 is
simple, its commutator subgroup is itself. Hence, the commutator series of S5 is
A5 ≤ S5 ,
without the subgroup {1} in the chain. Then G(i) = A5 for all i ≥ 1. 4
If we compare properties of commutator series to those of composition series, it may seem strange
that a commutator series does not terminate at {1}. As the following proposition shows, this is
precisely when a group is solvable.

Proposition 9.1.14
A group G is solvable if and only if G(s) = {1} for some positive integer s.

Proof. Suppose that G is solvable with a chain of successively normal subgroups


{1} = M0 E M1 E M2 E · · · E Ms = G
such that Mi /Mi−1 is abelian. By Proposition 9.1.9, since Ms /Ms−1 is abelian, G(1) = G0 ≤ Ms−1 .
Assume that G(i) ≤ Ms−i for some i ≥ 1. Then,
G(i+1) = [G(i) , G(i) ] ≤ [Ms−i , Ms−i ].
However, since Ms−i /Ms−(i+1) is abelian then by Proposition 9.1.9, [Ms−i , Ms−i ] ≤ Ms−(i+1) .
Therefore, G(i+1) ≤ Ms−(i+1) and by induction, this holds true for all 0 ≤ i ≤ s. Consequently,
G(s) = M0 = {1}.
Conversely, if G(s) = {1} for some s, since G(i) /G(i+1) is abelian, then the commutator series of
G is a chain of subgroups that satisfies the conditions for a solvable group. 
436 CHAPTER 9. CLASSIFICATION OF GROUPS

Exercises for Section 9.1


1. Exhibit all the composition series of Z20 and determine the composition factors associated to each
series.
2. Exhibit all the composition series of A4 and determine the composition factors associated to each
series.
3. Exhibit all the composition series of D6 and determine the composition factors associated to each
series.
4. Exhibit all the composition series of F20 (Exercise 4.3.16) and determine the composition factors
associated to each series.
5. Prove that if N E G, then G has a composition series with N as a term.
6. Suppose that G and H are groups with composition series of length r and s, respectively. Show that
G ⊕ H has a composition series of length rs and that the composition factors of G ⊕ H are of the form
M 0 ⊕ N 0 , where M 0 and N 0 are composition factors of G and H, respectively.
7. Prove that if G is finite, then it is solvable if and only if each of its composition factors is a finite
cyclic group of prime order.
8. Prove that subgroups and quotient groups of solvable groups are solvable.
9. Suppose that G is a group with a normal subgroup N . Prove that if N and G/N are both solvable,
then G is solvable.
10. Let G be a solvable group and let ϕ : G → H be a surjective homomorphism. Prove that H is solvable.
11. Prove that every p-group is solvable.
12. Prove that if H E G, then [H, G] ≤ H.
13. Prove that the commutator subgroup of A4 is (A4 )0 = h(1 2)(3 4), (1 3)(2 4)i.
14. Let F be a field and let n ≥ 2. Prove that the commutator subgroup G0 of G = GLn (F ) is a nontrivial
strict subgroup of G.
15. Prove that G(i) is a characteristic subgroup of G for all i ≥ 1.

9.2
Finite Simple Groups
The classification of finite simple groups “is generally regarded as a milestone of twentieth-century
mathematics” [31]. How the theorem came about exemplifies the collaborative nature of mathemat-
ical investigation. Hundreds of mathematicians contributed to the effort.
Because of the extreme length and the number of disparate results necessary for a full classifica-
tion, the realization that the classification of finite simple groups was within reach arose slowly. In
1972, when the classification felt close, Gorenstein laid out a 16-step program to break the project
down into cases covering all possibilities [30]. In 1986, Gorenstein wrote a summary article declaring
at long last that all finite simple groups had been found [31]. It was estimated at the time that the
work spanned 15,000 pages of articles both published and unpublished.
However, as group theorists labored to synthesize the work, it became apparent that a gap
remained in some unpublished material related to so-called quasithin groups. Aschbacher and Smith
began working to rectify this gap and completed their work in 2004 in a pair of monographs [4, 5].
The form itself of the classification theorem is surprising. In retrospect, it is almost more sur-
prising that so much work can be summarized in so brief a statement. Of course, to understand all
parts of the theorem requires considerable effort and advanced study. As with any fruitful problem,
the project to classify all finite simple groups drove investigations in many areas, which in turn
produced many theorems not directly related to the classification theorem.
9.2. FINITE SIMPLE GROUPS 437

In this section, we state the Classification Theorem and offer some explanation of the terms. (A
complete treatment is outside the scope of this book. Whole books have been written about finite
simple groups and we encourage the reader to consult [14, 61]). Subsequently, we remind the reader
of a few necessary conditions for a group to be simple. Then we give proofs that two of the families
of groups mentioned in the theorem are simple.

9.2.1 – The Classification of Finite Simple Groups

Theorem 9.2.1 (Classification Theorem)


Every finite simple group is isomorphic to one of the following:

(1) A cyclic group of prime order;


(2) An alternating group;
(3) A member of one of 16 infinite families of groups of Lie type over a finite field; or
(4) One of 26 sporadic groups not isomorphic to any of the above groups.

Some explanation is in order. We already know that cyclic groups of the form Zp with p prime
are simple. Indeed, by Lagrange’s Theorem, the groups Zp have no proper nontrivial subgroups, let
alone normal subgroups. We are also familiar with alternating groups. In Exercise 4.2.27, we proved
that A5 is simple. We will show below that An is simple for all n ≥ 5. (A3 ∼ = Z3 is simple as well.)
Without going into detail, a Lie group is a group that has both the structure of a group and
the differential geometric structure of a manifold. Lie groups can often be viewed as certain groups
of matrices, subgroups of GLn (F ), where F is R or C. If F is replaced with a finite field, then the
resulting groups are no longer Lie groups but are called groups of Lie type.
Finally, the Classification Theorem states that there are precisely 26 other finite simple groups.
Since they are not in one of the 18 families, they are given the name sporadic groups. The discovery of
these groups spanned over a century from the Mathieu groups denoted by M11 , M12 , M22 , M23 , and
M24 , discovered in 1861, to the Fischer-Griess Monster Group M whose existence was conjectured
in 1973 but only proven in 1989. The sizes of sporadic groups start with the Mathieu group M11 ,
with order 7920, and rise quickly, ending with the Baby Monster Group B of order
|B| = 4,154,781,481,226,426,191,177,580,544,000,000
and the Monster Group, which tops the scales with order
|M | = 808,017,424,794,512,875,886,459,904,961,710,757,005,754,368,000,000,000.
The monster group is so complicated that, using representation theory to describe it, the minimal
degree of a faithful representation over the field C is 196,883. In other words, if we faithfully represent
M as a group of invertible linear transformations on some Cn , we would need the dimension n to
be 196,883 (or greater)!
Classification theorems are common in algebra. We encountered a few simple classification
theorems in group theory. The Fundamental Theorem of Finitely Generated Abelian Groups is
an important classification theorem. And yet, the classification of finite simple groups is indeed a
milestone; in intuitive language, it provides a complete list of all finite irreducible (simple) patterns
of symmetry.

9.2.2 – Necessary Conditions of Simplicity


In our presentation of the theory of groups, we encountered theorems at various points that provided
a condition that guaranteed that a group has a normal subgroup. Consequently, the negation of
that condition is a necessary (though not sufficient) condition for a group to be simple. We restate
and remind the reader of a few of these results.
438 CHAPTER 9. CLASSIFICATION OF GROUPS

Proposition 9.2.2
Let G be a group with conjugacy classes K1 , K2 , . . . , Kr with K1 = {1}. If the only subsets
S ⊆ {1, 2, . . . , r} such that X
|Ki |
i∈S

divides |G| are {1} and {1, 2, . . . , r}, then G is simple.

Proof. Follows from Proposition 4.2.14. 

Proposition 9.2.3 (Sylow’s Test for Nonsimplicity)


Suppose that n = pk m and p is a prime that does not divide m. If the only divisor d of m
such that d ≡ 1 (mod p) is d = 1, then there is no simple group of order n.

Proof. By Sylow’s Theorem (Theorem 8.5.6), the number of Sylow p-subgroups np divides m and
satisfies np ≡ 1 (mod p). So if the only divisor d of m satisfying d ≡ 1 (mod p) is d = 1, then
np = 1. Hence, there is only one Sylow p-subgroup. Being the only subgroup of a given order, that
Sylow p-subgroup is normal. 

Proposition 9.2.4
Let p be the smallest prime dividing the order of a group G. If G has a subgroup H
satisfying |G : H| = p, then H E G so G is not simple.

Proof. (See Exercise 8.4.9.) 

Proposition 9.2.5 (Index Theorem)


If a finite group G has a proper subgroup H such that |G| does not divide |G : H|!, then G
is not simple.

Proof. Consider the action of G on the set X of left cosets of H by left multiplication and let
ρ : G → SX be the associated homomorphism. Since gH = H if and only if g ∈ H, then Ker ρ ≤ H
and in particular Ker ρ ( G. The order of the permutation group SX is |G : H|!. By the First
Isomorphism Theorem, G/ Ker ρ isomorphic to a subgroup of SX so |G|/| Ker ρ| divides |G : H|!.
Since |G| does not divide |G : H|!, then we must have | Ker ρ| > 1. Hence, Ker ρ is a nontrivial,
proper, normal subgroup of G. 

Proposition 9.2.6
If (G, X, ρ) is a faithful, transitive, primitive group action such that no subgroup of G acts
transitively on X, then G is simple.

Proof. Follows from Corollary 8.3.14. 

9.2.3 – The Simplicity of An for n ≥ 5

Theorem 9.2.7
For all n ≥ 5, the alternating group An is a simple group.
9.2. FINITE SIMPLE GROUPS 439

Proof. By Theorem 3.4.16, An consists of all permutations that can be expressed as a product of an
even number of transpositions. A simple calculation gives

(a b)(c d) = (a c b)(a c d) and (a b)(b c) = (a b c).

Hence, as a subgroup of Sn , An is generated by the set of 3-cycles.


Suppose that N is a nontrivial normal subgroup of An . By considering four cases based on the
smallest cycle in the standard cycle notation of permutations, we prove that N contains a 3-cycle.

Case 1. Suppose that N contains a permutation that can be written with disjoint cycles as σ =
(a1 a2 · · · ar )τ with r ≥ 4. Since N is normal, it contains the element (a1 a2 a3 )−1 σ(a1 a2 a3 ).
Hence, N also contains the element

σ −1 (a1 a2 a3 )−1 σ(a1 a2 a3 )


= τ −1 (ar · · · a2 a1 )(a1 a2 a3 )−1 (a1 a2 · · · ar )τ (a1 a2 a3 )
= (ar · · · a2 a1 )(a3 a2 a1 )(a1 a2 · · · ar )(a1 a2 a3 )
= (a2 a3 ar ).

Case 2. Suppose that N contains a permutation that can be written with disjoint cycles as σ =
(a1 a2 a3 )(a4 a5 a6 )τ . Since N is normal, it contains the element (a1 a2 a4 )−1 σ(a1 a2 a4 ). Hence,
N also contains the element

σ −1 (a1 a2 a4 )−1 σ(a1 a2 a4 )


= τ −1 (a4 a5 a6 )−1 (a1 a2 a3 )−1 (a4 a2 a1 )(a1 a2 a3 )(a4 a5 a6 )τ (a1 a2 a4 )
= (a6 a5 a4 )(a3 a2 a1 )(a4 a2 a1 )(a1 a2 a3 )(a4 a5 a6 )(a1 a2 a4 )
= (a1 a2 a4 a3 a6 ).

Since N contains a 5-cycle, then by the previous case, N also contains a 3-cycle.
Case 3. Suppose that N contains a permutation that can be written with disjoint cycles as σ =
(a1 a2 a3 )τ , where τ is a product of transpositions. Then σ 2 = (a1 a3 a2 ).
Case 4. Suppose that N contains a permutation that can be written with disjoint cycles as σ =
(a1 a2 )(a3 a4 )τ , where τ is a product of transpositions. Since N is normal it also contains the
element (a1 a2 3)−1 σ(a1 a2 3). Hence, it also contains

σ −1 (a1 a2 3)−1 σ(a1 a2 3)


= τ −1 (a3 a4 )(a1 a2 )(a1 a2 a3 )−1 (a1 a2 )(a3 a4 )τ (a1 a2 a3 )
= (a3 a4 )(a1 a2 )(a3 a2 a1 )(a1 a2 )(a3 a4 )(a1 a2 a3 )
= (a1 a4 )(a2 a3 ).

Since n ≥ 5, then N contains (a1 a2 a5 )−1 (a1 a4 )(a2 a3 )(a1 a2 a5 ) and also

(a1 a4 )(a2 a3 )(a1 a2 a5 )−1 (a1 a4 )(a2 a3 )(a1 a2 a5 ) = (a1 a2 a3 a4 a5 ).

By Case 1, N also contains a 3-cycle.

From the above four cases, we conclude that N contains a 3-cycle, say (a b c). Then for d distinct
from a, b, or c, N contains the conjugate

(a b)(c d)(a c b)(a b)(c d) = (a b d).

In a similar fashion, we find that N contains all 3-cycles that include exactly 2 of the three integers
a, b, or c. Furthermore, if {d, e} ∩ {a, b, c} = ∅, then N contains

(b d)(c e)(a b c)(b d)(c e) = (a d e).


440 CHAPTER 9. CLASSIFICATION OF GROUPS

By the same reasoning, N contains all 3-cycles that include exactly 1 of the three integers a, b, or
c. Finally, if {d, e, f } ∩ {a, b, c} = ∅, then N contains (a d e) and also

(a f )(b c)(a d e)(a f )(b c) = (f d e) = (d e f ).

In conclusion, we have determined that N contains all 3-cycles in An . Consequently, since An is


generated by the set of 3-cycles, N = An . Therefore, An has no nontrivial proper normal subgroup
and hence An is a simple group. 

9.2.4 – The Simplicity of PSLn (F )


We finish the section by discussing one of the families of simple groups of Lie type.
The general linear group GLn (F ), where F is a field and n ≥ 2, is not simple for two reason:
SLn (F ) E GLn (F ) as the kernel of the determinant homomorphism and because Z(GLn (F )) E
GLn (F ), which is nontrivial if F 6= F2 . As we will prove, intuitively speaking, if we quotient out by
these two groups, the remaining group is simple. However, we need to make this precise.
Example 1.3.9 introduced the notion of projective space. Projective space arises as an important
construction in geometry and algebraic geometry. We give a construction over any field. Let F be
a field and consider the equivalence relation ∼ on F n − {(0, 0, . . . , 0)} defined by

(a1 , a2 , . . . , an ) ∼ (b1 , b2 , . . . , bn ) ⇐⇒ (b1 , b2 , . . . , bn ) = (λa1 , λa2 , . . . , λan ) for some λ ∈ F.

Intuitively, we remove the ~0 from F n and then consider any two vectors equivalent if they are
multiples of each other. The (n − 1)-dimensional projective space over F , denoted P(F n ) is the
set of ∼-equivalence classes. Without going into the geometry details, the dimensional difference
arises because the parameter λ removes a degree of freedom. We denote the ∼-equivalence class of
(a1 , a2 , . . . , an ) by (a1 : a2 : · · · : an ). If F = R, then P(Rn ) classifies the concept of direction or
n-dimensional slope.

Definition 9.2.8
Let F be a field and n ≥ 2 and integer. The projective general linear group or order n,
denoted by PGLn (F ), is

PGLn (F ) = GLn (F )/Z(GLn (F )).

Similarly, the projective special linear group or order n, denoted by PSLn (F ), is

PSLn (F ) = SLn (F )/Z(SLn (F )).

The first family of Lie type listed in the Classification Theorem of Finite Simple Groups is the
projective special linear group.

Theorem 9.2.9
Let F be a finite field of order q and n ≥ 2. The group PSLn (F ) is simple, except for
PSL2 (F2 ) and PSL2 (F3 ).

A few lemmas are needed first.

Lemma 9.2.10
If i 6= j, define the matrix Xij (λ) as 1 on the diagonal, λ in the (i, j)th entry, and 0
elsewhere. The group SLn (F ) is generated by matrices Xij (λ) with λ ∈ F and 1 ≤ i, j ≤ n
and i 6= j.

Proof. We will temporarily call the matrices Xij (λ) elementary matrices. We will also interpret
SL1 (F ) as the trivial group of 1 × 1 matrix with entry 1.
9.2. FINITE SIMPLE GROUPS 441

Note first that Xij (λ)−1 = Xij (−λ). It suffices to show that for all A ∈ SLn (F ), there are
sequences X1 , X2 , . . . , Xr and Y1 , Y2 , . . . , Ys of elementary matrices such that

X1 X2 · · · Xr AY1 Y2 · · · Ys = I (9.2)

since then
A = Xr−1 · · · X2−1 X1−1 Ys−1 Ys−1
−1
· · · Y1−1 .
Let A ∈ SLn (F ). Suppose first that a21 6= 0. Then X11 (−a11 /a21 )A is a matrix with 1 in the
(1, 1)th entry. Suppose that a21 = 0. Then there is some row i with ai1 6= 0. Then X2i (1)A has
ai1 in the (2, 1)th entry. Then, repeating the previous step, we can multiply by another elementary
matrix and get 1 in the (1, 1)th entry.
Suppose now that A has a11 = 1. By multiplying on the left by Xi1 (−ai1 ) and on the right by
X1j (−a1j ) we obtain a matrix of the form
 
1 0
,
0 B

where B ∈ SLn−1 (F ). By induction on n, we obtain every matrix A ∈ SLn (F ) as (9.2) and the
lemma follows. 

Lemma 9.2.11
If n ≥ 3 or if n = 2 and |F | ≥ 4, then the matrices Xij (λ) are commutators of elements in
SLn (F ).

Proof. If n ≥ 3, then there are three distinct indices i, j, and k and

[Xik (λ), Xkj (1)] = Xij (λ).

If n = 2, consider the commutator relation

(α2 − 1)β
     
α 0 1 β 1
, = .
0 α−1 0 1 0 1

For any λ ∈ F , the equation λ = β(α2 − 1) can be solved for β as long as there exists α ∈ U (F )
such that α 6= ±1. This is possible for all fields F that contain more than 3 elements. 

Corollary 9.2.12
Let F be a field. The following are equivalent:
(1) SLn (F ) is solvable;
(2) the commutator subgroup SLn (F )0 is a proper subgroup of SLn (F );
(3) (n, F ) is (2, F2 ) or (2, F3 ).

Proof. By Lemmas 9.2.11 and 9.2.10, if n ≥ 3 or if n = 2 and |F | > 4, then all the generators
of SLn (F ) are in the commutator subgroup of SLn (F ). Hence, SLn (F )0 = SLn (F ). In particular,
the commutator subgroups have SLn (F )(s) 6= {1} for all s, so by Proposition 9.1.14, SLn (F ) is not
solvable.
In the remaining two cases, SL2 (F2 ) ∼= Z3 so is solvable. For SL2 (F3 ), note that Z(SL2 (F3 )) has
order 2. Then PSL2 (F3 ) has order 12 and it is easy to check by cases that all groups of order 12 are
solvable. By Exercise 9.1.9, SL2 (F3 ) is solvable. 

With this result about SLn (F ), we are in a position to prove the theorem.
442 CHAPTER 9. CLASSIFICATION OF GROUPS

Proof (Theorem 9.2.9). In Example 8.2.8 we saw that the group GLn (F ) acts transitively on F n −
{(0, 0, . . . 0)}. By multiplying the second column of M2 in Example 8.2.8 by an appropriate factor,
we can select M2 M1−1 ∈ SLn (F ). Hence, SLn (F ) acts transitively on F n − {(0, 0, . . . 0)}. This action
does have a system of blocks, namely the linear subspaces Span(v) − {0} = {λv | λ ∈ F ∗ }. These
blocks are precisely the elements in the projective space P(F n ).
We claim that the action of G = SLn (F ) on P(F n ) is 2-transitive. Let (a1 , a2 ) and (b1 , b2 ) be
two pairs of nonparallel vectors in F n − {(0, 0, . . . , 0)}. Each pair of vectors represents a pair of
distinct points in P(F n ). Let M2 ∈ GLn (F ) be any matrix whose first two columns are a1 and a2 ,
respectively, and let M1 ∈ GLn (F ) be any matrix whose first two columns are b1 and b2 . Note that
in P(F n ), two elements are equivalent if they are multiples of each other. Hence, if necessary, we
may replace ~b2 with a scalar multiple of itself so that M2 M1−1 ∈ SLn (F ). The matrix M1−1 maps a1
and a2 to
   
1 0
0 1
   
e1 = 0 and e2 = 0
   
 ..   .. 
. .
0 0

and M2 maps these to b1 and b2 respectively, thereby proving the claim.


Since the action is 2-transitive, the action of SLn (F ) on P(F n ) is primitive.
Let P be the stabilizer of the element (1 : 0 : · · · : 0), i.e., the stabilizer of block of the vector e1 .
Elements in P have the form
 
a w
0 B

where a ∈ F ∗ , w is a row vector in F n−1 , and B ∈ GLn−1 (F ) with det(B) = a1 . By Proposition 8.3.8,
P is a maximal subgroup of SLn (F ). Furthermore, if v ∈ F n − {(0, 0, . . . , 0)} with v = ge1 for some
g ∈ SLn (F ), then the stabilizer of v is the conjugate group gP g −1 .
We now claim that every normal subgroup N E SLn (F ) is in the center Z(SLn (F )). We consider
two cases.

Case 1. Suppose that N ≤ P . Then N = gN g −1 ≤ gP g −1 for all g ∈ SLn (F ) so N stabilizes all


linear subspaces Span(v). Thus, every element in N is a scalar multiple of the identity matrix
In so N ≤ Z(SLn (F )).

Case 2. Suppose that N 6≤ P . Then since N is normal, making P N is a group strictly larger than
the maximal subgroup P so P N = SLn (F ). Take K to be the subgroup of P of matrices
 
1 w
,
0 In−1

where w is an (n − 1)-row vector in F n−1 . We calculate that


−1    −1
−a−1 vB −1 awB −1
     
a v 1 w a v a aw + v a 1
= = .
0 B 0 In−1 0 B 0 B 0 B −1 0 In−1

Hence, K E P . Then P ≤ NG (KN ) since P ≤ NG (K) and N E G. Also, since N is normal,


KN = N K, so n(kn2 )n−1 = k 0 n0 n2 n−1 for k, k 0 ∈ K and n, n0 , n2 ∈ N . Hence, N normalizes
KN as well. Thus, KN E SLn (F ).
We now claim that all the elementary matrices Xij (λ) are in KN . Let σ ∈ Sn be a permutation
such that σ(1) = i and σ(2) = j. Recall that the permutation matrix Mσ satisfies Mσ (ei ) =
eσ(i) for all elementary basis vectors ei with 1 ≤ i ≤ n. However, det Mσ = sign σ. Write
9.2. FINITE SIMPLE GROUPS 443

 = sign σ. Let g ∈ SLn (F ) be the matrix with ge1 = ei and gei = eσ(i) for i ≥ 2. Then

gX12 (λ) = g e1 λe1 + e2 e3 . . . en

= ei λei + ej eσ(3) . . . eσ(n)

= Xij (λ) ei ej eσ(3) . . . eσ(n)
= Xij (λ)g.

Hence, Xij (λ) = gX12 (λ)g −1 . However, X12 (λ) ∈ K ⊆ KN and since KN E SLn (F ), then
Xij (λ) ∈ KN . Since these elementary matrices generate SLn (F ), we deduce that KN =
SLn (F ).
By the Second Isomorphism Theorem, SLn (F )/N = ∼ K/(K ∩ N ). But K/(K ∩ N ) is abelian
since K = ∼ F n−1 is abelian. By Proposition 9.1.9, the commutator subgroup SLn (F )0 ≤ N
is a proper subgroup. By Corollary 9.2.12, we conclude that the assumption that N is not
contained in P can only occur if n = 2 and F = F2 or F3 .

Suppose now that (n, F ) is neither (2, F2 ) or (2, F3 ). Then we are only in Case 1. Then every
normal subgroup of SLn (F ) is in the center. By the Fourth Isomorphism Theorem, the quotient
group
PSLn (F ) = SLn (F )/Z(SLn (F ))
has no normal subgroups. Hence, PSLn (F ) is a simple group. On the other hand, if (n, F ) is either
(2, F2 ) or (2, F3 ), then by Corollary 9.2.12, SLn (F ) is nonabelian and solvable and therefore not
simple. 

Exercises for Section 9.2


1. Prove that there is no simple group of order 300.
2. Prove that there is no simple group of order 936.
3. Prove that there is no simple group of order 315.
4. Prove that there is no simple group of order 2784.
5. Prove that if G is a group of order pqr with primes p < q < r, then G has a normal subgroup of order
p, q, or r.

6. Suppose that an integer n is divisible by a prime number p such that p ≥ n. Prove that a group G
with |G| = n is not simple.
7. 2×Odd Test. Prove that if n > 1 is an odd integer, then there is no simple group of order 2n.
8. Prove that if G has a nontrivial conjugacy class K such that |G| does not divides |K|!, then G is not
simple.
9. Prove that G is a finite group such that H is a normal subgroup of maximal order. Prove that G/H
is a simple group.
10. Prove that A5 is the only simple group of order 60.
11. Prove that | PSL2 (F5 )| = 60. Without using the previous exercise, show that PSL2 (F5 ) ∼
= A5 .
12. Prove that A6 is simple using the strategy given in Proposition 9.2.2.
13. Let G be a finite simple group. Suppose that H, K ≤ G such that |G : H| = p and |G : K| = q are
prime numbers. Prove that p = q.
14. If F is a finite field of order |F | = q, determine a formula for | PSLn (F )|.
15. In this exercise, we prove that the group of symmetries of the Fano plane is isomorphic to PSL3 (F2 ).
(See Exercise 8.3.17.)
(a) Show that there are 7 points in the projective space P(F32 ).
(b) Given that “lines” in P(F32 ) arise from subspaces in F32 , prove that the lines in the projective space
P(F32 ) correspond to triples (a1 : a2 : a3 ), (b1 : b2 : b3 ), and (c1 : c2 : c3 ) such that ai + bi + ci = 0
in F2 .
444 CHAPTER 9. CLASSIFICATION OF GROUPS

(c) Show that the space P(F32 ) along with its set of lines has the geometry of the Fano plane.
(d) The group of symmetries is the subgroup of S7 (acting on the points Fano plane) that preserves
collineations. Prove that this group of symmetries is PGL3 (F2 ) and observe that this group is
equal to PSL3 (F2 ).

9.3
Semidirect Product
The Classification Theorem of Finite Simple Groups completely solves the first part of the Hölder
Program. The second part involves classifying all the groups that have a given set of composition
factors. Taking the direct sum of all the composition factors is one way to find a group with those
given factors. However, it is not the only way, even if the group is abelian.
The second part of the Hölder Program turns out to be very challenging. In this section, we
introduce the semidirect product, a construction that creates a larger group from smaller ones that
generalizes the direct sum construction. Furthermore, we discuss under what circumstances this
construction is sufficient to find all the groups with a given set of composition factors.
We begin this section with a brief comment on terminology.

9.3.1 – Direct Sum; Direct Product


Early on in group theory (resp. ring theory), we encountered the concept of the direct sum of a finite
set of groups (resp. rings). The group operation of the direct sum is the component-wise operation.
Some authors retain the terminology of Cartesian product and refer to this same construction as
the direct product of a finite set of groups and denote it by G1 × G2 × · · · × Gn .
This divergence in common terminology arises because of two variations in construction when it
comes to an infinite collection of groups.
Let {Gi }i∈I be a collection of groups, where C is not necessarily finite (or even countable). By
the Axiom of Choice there exist choice functions f such that f (i) ∈ Gi for all i ∈ I.

Definition 9.3.1
The set of all choice functions is called the direct product of the collection {Gi }i∈I , and it
is denoted by Y
Gi .
i∈I

Proposition 9.3.2
The direct product of a collection of groups is a group under the operation ·, where f1 · f2
is the choice function defined by

(f1 · f2 )(i) = f1 (i)f2 (i) in Gi for all i ∈ I.

Furthermore, the direct product is abelian if and only if Gi is abelian for all i ∈ I.

Q
Proof. Let f1 , f2 , f3 ∈ i∈I Gi . Then for all i ∈ I,

(f1 · (f2 · f3 ))(i) = f1 (i) (f2 (i)f3 (i)) = (f1 (i)f2 (i)) f3 (i) = ((f1 · f2 ) · f3 )(i).

Hence, the operation · is associative. The choice function f (i) = 1i ∈ Gi for all i ∈ I serves as the
identity. The inverse of an element f in the direct product is f −1 with f −1 (i) = f (i)−1 taken in
each Gi .
9.3. SEMIDIRECT PRODUCT 445

It is easy to see that if each Gi is abelian, then the operation · on the direct product is abelian.
Conversely, suppose that · is abelian. Then f g = gf for all choice functions in the direct product.
Fix an index i0 ∈ I and let x, y ∈ Gi0 . Consider the choice functions f and g such that f (i0 ) = x,
g(i0 ) = y and f (i) = g(i) = 1 for all i ∈ I − {i0 }. Then since f g = gf , we deduce that xy = yx
in Gi0 . Since the choices of indices and elements was arbitrary, we deduce that every group Gi is
abelian. 
If I is finite with |I| = n ≥ 1, then I is in bijection with {1, 2, . . . , n}. A choice function from
I = {1, 2, . . . , n} is tantamount to an n-tuple (f1 , f2 , . . . , fn ) ∈ G1 × G2 × · · · × Gn with fi = f (i)
for 1 ≤ i ≤ n.

Definition 9.3.3
The direct sum of a collection {Gi }i∈I of groups is the subset of the direct product consisting
of choice functions f such that f (i) = 1 ∈ Gi for all but a finite number of indices i ∈ I.
The direct sum is denoted by M
Gi .
i∈I

If I is a finite set then the subset condition Definition 9.3.3 is trivially satisfied for all choice
functions. Hence, the direct sum and the direct product are the same whenever I is a finite set. On
the other hand, if I is infinite, then the direct sum is a strict subgroup of the direct product.

9.3.2 – Semidirect Product


Suppose a group G contains a normal subgroup H. Suppose also that K is another subgroup of
G with H ∩ K = 1. Then K acts on H by conjugation. This action defines a homomorphism
ϕ : K → Aut(H). Furthermore, HK is a subgroup of G such that
khk −1 = ϕ(h) ⇐⇒ kh = ϕ(h)k.
This remark sets up the construction of the semidirect product.

Proposition 9.3.4
Let H and K be groups and let ϕ : K → Aut(H) be a homomorphism. The Cartesian
product H × K, equipped with the operation

(h1 , k1 ) · (h2 , k2 ) = (h1 ϕ(k1 )(h2 ), k1 k2 ), (9.3)

is a group. Furthermore, if H and K are finite, then this group has order |H||K|.

Proof. Let (h1 , k1 ), (h2 , k2 ), and (h3 , k3 ) be three elements in the Cartesian product H × K. Then
(h1 , k1 ) · ((h2 , k2 ) · (h3 , k3 ))
= (h1 , k1 ) · (h2 ϕ(k2 )(h3 ), k2 k3 )
= (h1 ϕ(k1 )(h2 ϕ(k2 )(h3 )), k1 (k2 k3 ))
= (h1 ϕ(k1 )(h2 )ϕ(k1 )(ϕ(k2 )(h3 )), (k1 k2 )k3 ) because ϕ(k1 ) is a homomorphism
= (h1 ϕ(k1 )(h2 )ϕ(k1 k2 )(h3 ), (k1 k2 )k3 ) because ϕ is a homomorphism
= (h1 ϕ(k1 )(h2 ), k1 k2 ) · (h3 , k3 )
= ((h1 , k1 ) · (h2 , k2 )) · (h3 , k3 ).
This proves associativity.
The element (1, 1) serves as the identity because
(1, 1) · (h, k) = (1ϕ(1)(h), k) = (h, k) and
(h, k) · (1, 1) = (hϕ(k)(1), k) = (h, k),
446 CHAPTER 9. CLASSIFICATION OF GROUPS

where ϕ(1)(h) = h because ϕ(1) is the identity function and because ϕ(k)(1) = 1 since any homo-
morphism maps 1 to 1.
Let (h, k) ∈ H × K. We prove that (h, k)−1 = (ϕ(k −1 )(h−1 ), k −1 ). Indeed

(h, k) · (ϕ(k −1 )(h−1 ), k −1 ) = hϕ(k) ϕ(k −1 )(h−1 ) , kk −1 = (hϕ(1)(h−1 ), 1) = (1, 1)


 

and

(ϕ(k −1 )(h−1 ), k −1 ) · (h, k) = ϕ(k −1 )(h−1 )ϕ(k −1 )(h), k −1 k = (ϕ(k −1 )(h−1 h), 1) = (1, 1).


Hence, the operation defined in (9.3) has inverses in the set.


Finally, note that the order of the group is the cardinality of |H × K|, namely |H||K|. 

Definition 9.3.5
Suppose that H and K are groups such that there exists a homomorphism ϕ : K → Aut(H).
The Cartesian product H × K, equipped with the operation defined in (9.3), is called the
semidirect product of H and K with respect to ϕ and is denoted by H oϕ K.

Proposition 9.3.6
Suppose that G = H oϕ K. Then H̃ = {(h, 1) | h ∈ H} ∼= H and K̃ = {(1, k) | k ∈ K} are
subgroups with H̃ ∼
= H and K̃ ∼
= K. Furthermore, H̃ E H oϕ K and G/H̃ ∼ = K.

Proof. (Left as an exercise for the reader. See Exercise 9.3.6.) 

Because of this proposition, we will often abuse notation and write H EH oϕ K and K ≤ H oϕ K
instead of H̃ E H oϕ K and K̃ ≤ H oϕ K

Example 9.3.7. Let H = Z7 and K = Z3 . Write Z3 = hx | x3 = 1i and Z7 = hy | y 7 = 1i. We


know that Aut(Z7 ) ∼ = U (Z/7Z) ∼
= Z6 . Explicitly, Aut(Z7 ) contains the isomorphisms ψa (g) = g a
for a ∈ U (Z/7Z). The homomorphisms ϕ : Z3 → Aut(Z7 ) are determined by ϕ(x), which must be
an element of order 1 or 3. Hence, ϕ(x) can be ψ1 (the identity function), ψ2 , or ψ4 . We can see
directly that the automorphism ψ2 has order 3 because

ψ22 (g) = ψ2 (ψ2 (g)) = ψ2 (g 2 ) = (g 2 )2 = g 4 and ψ23 (g) = ψ2 (g 4 ) = (g 4 )2 = g 8 = g,

and so ψ23 is the identity function.


If ϕ(x) = ψ1 , then ϕ(xa ) = ψ1 for all a and from (9.3), we see that H oϕ K ∼
= Z7 ⊕ Z3 . 4

Before developing this example further and presenting more examples, it is useful to explore the
the relationship of H and K inside H oϕ K. However, implicit in Example 9.3.7 is that there always
exist a homomorphism K → Aut(H), namely the trivial homomorphism, which maps all elements
in K to the identity automorphism. The following proposition describes this situation.

Proposition 9.3.8
The following are equivalent.

(1) ϕ is the trivial homomorphism into Aut(H);


(2) H oϕ K ∼
= H ⊕ K with the isomorphism being the set identity function;
(3) K E H oϕ K.
9.3. SEMIDIRECT PRODUCT 447

Proof. (1) =⇒ (2) If ϕ is trivial then ϕ(k) : H → H is the identity function. Hence,

(h1 , k1 ) · (h2 , k2 ) = (h1 ϕ(k1 )(h2 ), k1 k2 ) = (h1 h2 , k1 k2 ).

Thus, H oϕ K ∼ = H ⊕ K.
(2) =⇒ (3) We know that K E H ⊕ K.
(3) =⇒ (1) Suppose that K E H oϕ K. Then for all h2 ∈ H and all k1 , k2 ∈ K, the following
element is in K:

(h2 , k2 ) · (1, k1 ) · (h2 , k2 )−1 = (h2 ϕ(k2 )(1), k2 k1 ) · (ϕ(k2−1 )(h−1 −1


2 ), k2 )
= (h2 ϕ(k2 k1 )(ϕ(k2−1 )(h−1 −1
2 )), k2 k1 k2 ).

Thus, h2 ϕ(k2 k1 k2−1 )(h−1 −1


2 ) = 1 for all h2 so ϕ(k2 k1 k2 ) is the identity function for all k1 , k2 . Setting
k2 = 1 shows that ϕ(k1 ) is the identity function for all k1 ∈ K so ϕ is trivial. 

By virtue of this proposition, if we know that ϕ : K → Aut(H) is the trivial homomorphism,


then we always write H ⊕ K instead of H oϕ K. If the homomorphism ϕ is understood by context
or if H oϕ K ∼
= H oψ K for any two nontrivial homomorphisms ϕ, ψ : K → Aut(H), then we simply
write H o K instead of H oϕ K.
From the identification of H with H × {1} and K with {1} × K, we see that H oϕ K = HK.
Furthermore,

(1, k) · (h, 1) · (1, k)−1 = (ϕ(k)(h), k) · (ϕ(k −1 )(1), k −1 ) = (ϕ(k)(h), k) · (1, k −1 )


= (ϕ(k)(h)ϕ(k)(1)kk −1 ) = (ϕ(k)(h), 1).

This calculation shows that ϕ(k)(h) corresponds to conjugation of the subgroup K on the normal
subgroup H.
This inspires us to provide a characterization of groups that arise as semidirect products.

Proposition 9.3.9
Suppose that a group G contains a normal subgroup H and a subgroup K such that
G = HK and H ∩ K = {1}. Then G ∼ = H oϕ K, where ϕ : K → Aut(H) is the
homomorphism defined by conjugation ϕ(k)(h) = khk −1 .

Proof. Consider the function f : H oϕ K → G given by f (h, k) = hk. Since G = HK, this function
is a surjection. If f (h1 , k1 ) = f (h2 , k2 ) then h1 k1 = h2 k2 so h−1 −1
2 h1 = k2 k1 . Since H ∩K = {1} then
−1 −1
h2 h1 = 1 = k2 k1 so (h1 , k1 ) = (h2 , k2 ). Thus, this function is injective and therefore bijective.
Let (h1 , k1 ), (h2 , k2 ) ∈ H o K. Then

f ((h1 , k1 ) · (h2 , k2 )) = f (h1 ϕ(k1 )(h2 ), k1 k2 ) = h1 ϕ(k1 )(h2 )k1 k2

and
f (h1 , k1 )f (h2 , k2 ) = h1 k1 h2 k2 = h1 (k1 h2 k1−1 )k1 k2 = h1 ϕ(k1 )(h2 )k1 k2 .
We conclude that f is a homomorphism and thus an isomorphism. 

Example 9.3.10. Let us revisit Example 9.3.7 in light of these propositions. We now give presen-
tations for all three of the semidirect products.

Case 1. If ϕ1 is such that ϕ1 (x) = ψ1 , then a presentation for H oϕ1 K is

hx, y | x3 = y 7 = 1, xyx−1 = yi = hx, y | x3 = y 7 = 1, xy = yxi = Z7 ⊕ Z3 ∼


= Z21 .

Case 2. If ϕ2 is such that ϕ2 (x) = ψ2 , then a presentation for the semidirect product is

Z7 oϕ2 Z3 = hx, y | x3 = y 7 = 1, xyx−1 = y 2 i.

This is a nonabelian group so in particular it is not isomorphic to Z21 .


448 CHAPTER 9. CLASSIFICATION OF GROUPS

Case 3. If ϕ3 is such that ϕ3 (x) = ψ4 , then a presentation for the semidirect product is
Z7 oϕ3 Z3 = hu, v | v 3 = v 7 = 1, uvu−1 = v 4 i.
Again, this is a nonabelian group.
Now consider the mapping f : Z7 oϕ2 Z3 → Z7 oϕ3 Z3 defined by f (x) = u2 and f (y) = v. It is
easy to to check that (u2 )3 = v 7 = 1 but we also have
u2 vu−2 = u(uvu−1 )u−1 = uv 4 u−1 = (uvu−1 )4 = (v 4 )4 = v 16 = v 2 .
Hence, u2 and v satisfy the same relations as x and y so f defines a homomorphism by the Generator
Extension Theorem. It is not hard to see that f is bijective and we deduce that the groups obtained
from Case 2 and Case 3 are isomorphic. Consequently, we refer to the group with the simplified
notation of Z7 o Z3 because it is the only nondirect semidirect product.
Proposition 9.3.9 allows us to take this example one step further to a classification result. Let
G be a group of order 21. By Sylow’s Theorem, we deduce that n7 (G) = 1 so G contains a normal
subgroup H of order 7. G must also contain a subgroup K of order 3. The action of K on H by
conjugation corresponds to a homomorphism ϕ : K → Aut(H). We have seen that there are only
two ϕ that lead to nonisomorphic groups, which are Z21 and Z7 o Z3 . 4
Example 9.3.11. A group H is abelian if and only if the inversion function h 7→ h−1 is an automor-
phism. As an automorphism, the inversion has order 2. Consider the cyclic group Z2 = hx | x2 = 1i
and the map ϕ : Z2 → Aut(H) such that ϕ(x)(h) = h−1 , and of course, ϕ(1)(h) = h. This allows
us to construct the semidirect H oϕ Z2 . However, the function ϕ might not give the only nondirect
semidirect product. 4
Example 9.3.12 (Dihedral Groups). A particular example of the previous construction occurs
with dihedral groups. Every dihedral group Dn is defined as Dn = Zn oϕ Z2 where ϕ : Z2 → Aut(Zn )
is defined by ϕ(y)(x) = x−1 where x generates Zn and y generates Z2 . This leads to the presentation
Zn oϕ Z2 = hx, y | xn = y 2 = 1, yxy −1 = x−1 i = Dn .
However, if n is not prime, there may be other nontrivial homomorphisms ϕ : Z2 → Aut(Zn ).
For example, if n = 15, then Aut(Z15 ) = U (15). We need to determine U (15). By the Chinese
Remainder Theorem, Z/15Z = Z/3Z ⊕ Z/5Z so
Aut(Z15 ) = U (Z/3Z ⊕ Z/5Z) = U (Z/3Z) ⊕ U (Z/5Z) = U (3) ⊕ U (5) ∼
= Z2 ⊕ Z4 .
Hence, U (15) contains three elements of order 2, namely 4, 11, and 14 = −1. Writing Z15 as Z3 ⊕Z5 ,
we see that these elements of order 2 in Aut(Z15 ) correspond to inversion on the Z5 component alone,
inversion on the Z3 component alone, or inversion on both. The three resulting nondirect semidirect
products of Z15 with Z2 are Z3 ⊕ D5 , Z5 ⊕ D3 , and D15 . 4
Remark 9.3.13. It is not always a simple task to determine if H oϕ1 K and H oϕ2 K are isomorphic,
given two different homomorphisms ϕ1 , ϕ2 : K → Aut(H). Exercises 9.3.3 and 9.3.4 show two general
situations in which there does exist an isomorphism between semidirect products. 4
Implicit in the second part of the Hölder program, is the ability to classify all groups G that
have a known normal subgroup N along with a known quotient group G/N . We reiterate that
the semidirect product of two groups captures situations in which G contains a subgroup K that
is isomorphic to G/N and G = N K. It is possible that a group G does not contain a subgroup
isomorphic to G/N . The simplest example occurs with Q8 . Every subgroup of Q8 is normal.
Consider first the normal N = h−1i with quotient group Q8 /h−1i ∼ = Z2 ⊕ Z2 . However, Q8 has no
subgroup isomorphic to Z2 ⊕ Z2 , so Q8 does not arise as the semidirect product over N . Consider
now the normal subgroup N 0 = hii with quotient group Q8 /hii ∼ = Z2 . The only element of order 2
in Q8 is in N 0 so there is no subgroup K such that N 0 K = G and N 0 ∩ K = {1}, because such a K
would need to contain an element of order 2. A same reasoning holds for N 0 = hji and hki. Hence,
Q8 does not arise as a semidirect product of its subgroups. In this case, in order to find such groups
G given N and G/N , we need group cohomology, a theory that is beyond the scope of this textbook.
9.3. SEMIDIRECT PRODUCT 449

9.3.3 – Some Automorphism Groups


In order to construct semidirect products H oϕ K between two groups H and K, it is essential to
know the automorphism group Aut(H). The only automorphism group we encountered so far is
Aut(Zn ) ∼= U (n), where U (n) is the group of units of Z/nZ. (See Exercise 3.7.40.) We need to
explore the structure of U (n) and other automorphism groups.

Proposition 9.3.14
If n = pk with p and odd prime and k ∈ N∗ , then Aut(Zpk ) ∼
= U (pk ) is a cyclic group of
k−1
order p (p − 1).

Proof. (Left as a guided exercise for the reader. See Exercise 9.3.10.) 

Proposition 9.3.15
The group U (2) is trivial. For k ≥ 2, the automorphism group of the cyclic group is
U (2k ) ∼
= Z2 ⊕ Z2k−2 .

Proof. The proposition is obvious for U (2k ) with k = 1, 2. We will suppose henceforth that k ≥ 3.
We show that U (2k ) = h5i ⊕ h−1i.
k−3
We first claim that 52 ≡ 1 + 2k−1 (mod 2k ). This is obvious for k = 3. Suppose that it is
true for some k. Note that

(1 + 2k−1 + c2k )2 = (1 + 2k−1 )2 + 2(1 + 2k−1 )c2k + c2 22k ≡ (1 + 2k−1 )2 (mod 2k+1 )

Therefore,
k−2
52 ≡ (1 + 2k−1 )2 ≡ 1 + 2k + 22k−2 ≡ 1 + 2k (mod 2k+1 ),
where the last congruence holds because 2k − 2 ≥ k + 1 for all k ≥ 3. This establishes the claim by
induction.
We see from the claim that 5 has order 2k−2 . Hence, the order |5| divides 2k−2 but if does not
divide 2k−3 . So |5| = 2k−2 .
Assume that −1 ≡ 5b (mod 2k ) for some b. This would imply that −1 ≡ 1 (mod 4), which is a
contradiction. Hence, −1 ∈ / h5i. By the Direct Sum Decomposition Theorem for groups, we deduce
that U (2k ) ∼
= h−1i ⊕ h5i. Knowing the order of 5, the result follows. 

The previous two propositions, coupled with the Chinese Remainder Theorem, give a complete
description of the automorphism groups of cyclic groups. However, if a group is not cyclic, even if
it is abelian, the automorphism groups can become rather complicated. The following proposition
begins to show this.

Proposition 9.3.16
Let p be any primes and let Zpn be the elementary abelian group Zp ⊕ Zp ⊕ · · · ⊕ Zp with
n copies of Zp . Then Aut(Zpn ) ∼
= GLn (Fp ).

Proof. Let V be the vector space of dimension n over the finite field Fp . The group with addition
(V, +) is isomorphic to Zpn . An automorphism ϕ of (V, +) is an invertible homomorphism with
ϕ(a + b) = ϕ(a) + ϕ(b) for all vectors a and b in (V, +). Furthermore, for any positive integer k,
k times
z }| {
ϕ(k · a) = ϕ(a) + ϕ(a) + · · · + ϕ(a) = k · ϕ(a).

Since this holds for all 1 ≤ k ≤ p, then we see that ϕ is an invertible linear transformation on
V = Fnp . Hence, Aut(Zpn ) = Aut(Fnp , +) ∼
= GLn (Fp ). 
450 CHAPTER 9. CLASSIFICATION OF GROUPS

Example 9.3.17. From the previous proposition, Aut(Z5 ⊕Z5 ) = GL2 (F5 ). Note that | GL2 (F5 )| =
(25 − 5)(25 − 1) = 480. By Cauchy’s Theorem, GL2 (F5 ) contains an element of order 3. One such
element is  
1 2
g= .
1 3

If we write Z3 = hz | z 3 = 1i, then the homomorphism ϕ : Z3 → GL2 (F5 ) defined by ϕ(z) = g


produces a nonabelian semidirect product G = (Z5 ⊕ Z5 ) oϕ Z3 .
It is possible to give a presentation of this semidirect product. Note that for Z5 ⊕Z5 a presentation
is
Z5 ⊕ Z5 = hx, y | x5 = y 5 = 1, xy = yxi.

Now Z5 ⊕ Z5 is isomorphic to the additive group (F25 , +) under the isomorphism xa y b ↔ (a, b). Since
   
a a + 2b
g = ,
b a + 3b

a presentation for (Z5 ⊕ Z5 ) oϕ Z3 is

hx, y, z | x5 = y 5 = z 3 = 1, xy = yx, zxz −1 = xy, zxz −1 = x2 y 3 i. 4

As a last example of an automorphism group, we prove in the exercises (Exercise 9.3.13) that if
n 6= 6, then Aut(Sn ) = Sn .

9.3.4 – Wreath Product (Optional)


Consider the group described in Example 8.3.7. As a subgroup of S9 it has generators

G = h(1 2 3), (1 4 7)(2 5 8)(3 6 9)i.

It consists of permutations that cycle within the blocks {1, 2, 3}, {4, 5, 6}, and {7, 8, 9} and permu-
tations that cycle through the three blocks. The action of H that stays within the blocks is the
subgroup
H = h(1 2 3), (4 5 6), (7 8 9)i

and is isomorphic to Z3 × Z3 × Z3 . The generating permutation σ = (1 4 7)(2 5 8)(3 6 9) satisfies

σ(1 2 3)σ −1 = (4 5 6), σ(4 5 6)σ −1 = (7 8 9), and σ(7 8 9)σ −1 = (1 2 3).

Hence, H E G. Setting K = hσi, the subgroups also satisfy G = HK. By Proposition 9.3.9,
G = H oϕ K where ϕ corresponds to K acting on H by conjugation. If x is a generator of Z3 , we
can describe G as
(Z3 ⊕ Z3 ⊕ Z3 ) oϕ Z3

where ϕ : Z3 → Aut(Z3 ⊕ Z3 ⊕ Z3 ) is defined by ϕ(x)(g1 , g2 , g3 ) = (g3 , g1 , g2 ).


This is an example of a more general construction.
Let K and L be groups and let ρ : K → Sn be a homomorphism. Consider the action of Sn on

n times
z }| {
L ⊕ L ⊕ ··· ⊕ L

by
σ · (x1 , x2 , . . . , xn ) = (xσ−1 (1) , xσ−1 (2) , . . . , xσ−1 (n) ),

which corresponds to moving the ith entry to the σ(i)th location.


9.3. SEMIDIRECT PRODUCT 451

Definition 9.3.18
The wreath product of K on L by the homomorphism ρ : K → Sn is the semidirect product

L oρ K = (L ⊕ L ⊕ · · · ⊕ L) oϕ K

where ϕ : K → Aut(Ln ) is the homomorphism ϕ(k)(x1 , x2 , . . . , xn ) = ρ(k)·(x1 , x2 , . . . , xn ).

We point out that the order of the wreath product is |L oρ K| = |L|n |K| and that elements of a
wreath product are (n + 1)-tuples in the set Ln × K.
It is possible to give an alternative approach to the wreath product. Set Γ = {1, 2, . . . , n}.
Consider the isomorphism between Fun(Γ, L) and Ln defined by f 7→ (f (1), f (2), . . . , f (n)) and
where the group operation on functions f, g ∈ Fun(Γ, L) is

(f · g)(i) = f (i)g(i),

where the latter operation is in the group L. Then elements of the wreath product L oρ K are pairs
(f, k) ∈ Fun(Γ, L) × K. The operation between elements in the wreath product is

(f1 , k1 ) · (f2 , k2 ) = (i 7→ f (i)g(ρ(k1 )−1 (i)), k1 k2 ).

This is called the functional form of the wreath product.


In the scenario when n = |K| and ρ : K → SK corresponds to the action of K on itself by left
multiplication, the wreath product is called the standard wreath product of L by K and is denoted
L o K.

Example 9.3.19. The motivating example for this subsection is a wreath product of Z3 by Z3 .
The homomorphism ρ : Z3 → S3 sends the generator w of Z3 to the 3-cycle (1 2 3). So

ρ(w) · (x1 , x2 , x3 ) = (x3 , x1 , x2 ).

We leave it as an exercise for the reader to prove that up to isomorphism there is only one nonabelian
wreath product of Z3 on Z3 and that it has a presentation of

Z3 o Z3 = hx, y, z, w |x3 = y 3 = z 3 = w3 = 1, xy = yx, xz = zx, yz = zy,


wxw−1 = y, wyw−1 = z, wzw−1 = xi. 4

We saw earlier that given a group N and a group K, the semidirect product does not give us all
groups G that have a normal subgroup N with an associated quotient group K = G/N . However,
the following theorem gives a structural upper bound on groups that have a known normal subgroup
with a known associated quotient group.

Theorem 9.3.20 (Universal Embedding Theorem)


Let G be a group with a normal subgroup N such that G/N = K for some known group
K. Then there is an embedding ϕ : G → N o K into the standard wreath product of N by
K.

Proof. Let π : G → G/N = K be the canonical projection homomorphism π(g) = gN . Choose in G


a complete set of distinct coset representatives {tk | k ∈ K} so that π(tk ) = k for each k ∈ K. For
each g ∈ G, define the function fg : K → N by

fg (k) = t−1
k gtπ(g)−1 k .

We define this function in this way for two reasons. First,

π(fg (k)) = π(tk )−1 π(g)π(tπ(g)−1 k ) = k −1 π(g)π(g)−1 k = 1,


452 CHAPTER 9. CLASSIFICATION OF GROUPS

so for all g ∈ G and all k ∈ G/N , the element fg (k) is in ker π = N . Second, expressing the wreath
product N o K in its functional form, the function Ψ : G → N o K defined by

Ψ(g) = (fg , π(g))

is a homomorphism. For any x, y ∈ G, we have Ψ(xy) = (fgh , π(xy)) while

Ψ(x)Ψ(y) = (fx , π(x))(fy , π(y)) = (fxy , π(xy)),

where f : K → N is the function satisfying

f (k) = fx (k)fy (π(g)−1 k)


= (t−1 −1
k xtπ(g)−1 k )(tπ(g)−1 k ytπ(y)−1 π(x)−1 k )

= t−1
k xytπ(xy)−1 k
= fxy (k).

Thus, Ψ(x)Ψ(y) = Ψ(xy).


Finally, suppose that Ψ(x) = Ψ(y) for two x, y ∈ G. Then π(x) = π(y). We also have fx = fy
so for all k ∈ K,

fx (k) = fy (k) =⇒ t−1 −1


k xtπ(x)−1 k = tk ytπ(y)−1 k
=⇒ xtπ(x)−1 k = ytπ(x)−1 k
=⇒ x = y.

The function Ψ is an injective homomorphism (embedding) of G into N o K. 

Exercises for Section 9.3


1. Prove that Sn = An oϕ Z2 for some appropriate ϕ.
2. Find a nontrivial homomorphism ϕ : Z3 → Aut(Z2 ⊕ Z2 ). Prove that the resulting semidirect product
is (Z2 ⊕ Z2 ) o Z3 ∼
= A4 .
3. Let G be an arbitrary group and let n be a positive integer. Let ϕ1 and ϕ2 be homomorphism
Zn → Aut(G) such that ϕ1 (Zn ) and ϕ2 (Zn ) are conjugate subgroups in Aut(G). Suppose that Zn is
generated by z.
(a) Prove that there exists an automorphism ψ ∈ Aut(G) and a ∈ U (n) such that ϕ2 (z)a = ψ ◦
ϕ1 (z) ◦ ψ −1 .
(b) Prove that the function f : G oϕ1 Zn → G oϕ2 Zn defined by f (g, x) = (ψ(g), xa ) is an isomor-
phism.
4. Let P and Q be groups. Let ϕ1 and ϕ2 be homomorphisms Q → Aut(P ) such that there exists an
automorphism ψ ∈ Aut Q such that ϕ1 ◦ ψ = ϕ2 . Show that the function Ψ : P oϕ1 Q → P oϕ2 Q
defined by Ψ(a, b) = (a, ψ −1 (b)) is an isomorphism.
5. Suppose that p < q are primes with p | (q − 1). Prove that all nontrivial homomorphisms ϕ : Zp →
Aut(Zq ) lead to isomorphic semidirect product Zq oϕ Zp . [That is why the unique nonabelian group
of order pq is written Zq o Zp .]
6. Prove Proposition 9.3.6.
7. Fix a positive integer n and let F be a field. Let T be the subgroup of GLn (F ). Let D ≤ T be the
subgroup of diagonal matrices and let U = {g ∈ T | gii = 1 for all i}. Prove that T is a semidirect
product U o D. Explicitly describe the relevant homomorphism ϕ : D → Aut(U ) for n = 2 and n = 3.
8. Prove that the symmetries of the cube (including reflections) is a group of the form S4 o Z2 .
9. Prove that there are 4 distinct homomorphisms from Z2 into Aut(Z8 ). Show that the resulting
semidirect products are Z8 ⊕ Z2 , D8 , the quasidihedral group QD16 (Exercise 3.8.9) and the modular
group (Exercise 4.3.17).
10. In this exercise, we prove that is p is an odd prime then U (pk ) is a cyclic group of order pk−1 (p − 1).
9.4. CLASSIFICATION THEOREMS 453

k−2
(a) Prove that if k ≥ 2 and a ∈ Z with p - a, then (1 + ap)p ≡ 1 + apk−1 (mod pk ).
(b) Deduce that for any a with p - a, the element 1 + ap has order pk−1 in U (pk ).
(c) By Proposition 7.5.2, U (Fp ) = U (p) is a cyclic group. Show that there exists g ∈ Z that is a
generator of U (p) and such that g p−1 6≡ 1 (mod p2 ).
(d) Prove that a g found in the previous part generates U (pk ) to deduce that U (pk ) is cyclic.
[Hint: Recall that p | pj for all j with 1 ≤ j ≤ p − 1.]


11. Determine the isomorphism type of Aut(Z40 ) and express the result in invariant factors form.
12. Determine the isomorphism type of Aut(Z210 ) and express the result in invariant factors form.
13. This exercise guides a proof that Aut(Sn ) = Sn for all n 6= 6.
(a) Prove that for all ψ ∈ Aut(Sn ) and all conjugacy classes K of Sn , the subset ψ(K) is another
conjugacy class.
(b) Let K be the conjugacy class of transpositions and let K0 be another conjugacy class of elements
of order 2 (e.g., cycle type like (a b)(c d)). Prove that |K| 6= |K0 |, unless possibly if n = 6.
(c) Prove that for each ψ ∈ Aut(Sn ) and for all k with 2 ≤ k ≤ n, we have ψ((1 k)) = (a bk ) for
some distinct integers a, b2 , b3 , . . . , bn in {1, 2, . . . , n}.
(d) Show that the transpositions (1 2), (1 3), . . . , (1 n) generate Sn .
(e) Deduce that Aut(Sn ) = Inn(Sn ) ∼ = Sn .
14. Let G be a group. Consider the homomorphism ϕ : G → Aut(G) defined by ϕ(g)(x) = gxg −1 . Prove
that the resulting semidirect product G oϕ G is equal to G ⊕ G if and only if G is abelian. Find a
presentation for D3 oϕ D3 .
15. Give a presentation for a nonabelian semidirect product (Z7 ⊕ Z7 ) oϕ Z3 .
16. Let ϕ : Z4 → S5 be the homomorphism that sends the generator x of Z4 to ϕ(x) = (1 2 3 4). Exer-
cise 9.3.13 showed that Aut(S5 ) = Inn(S5 ) = S5 . Let G = S5 oϕ Z4 . Perform the following calculations
in G.
(a) ((1 4 3)(2 5), x2 ) · ((2 4 5 3), x).
(b) ((1 4 3)(2 5), x2 )−1 .
(c) ((1 3 5 2 4), x3 ) · ((1 3)(2 4), 1) · ((1 3 5 2 4), x3 )−1 .
17. Let G be any group. We define the holomorph of G as the group Hol(G) = G o Aut(G), where
the semidirect product is the natural one where ϕ : Aut(G) → Aut(G) is the identity (not trivial)
homomorphism.
(a) Prove that the holomorph of Zp is a nonabelian group and give a presentation of it.
∼ S4 .
(b) Prove that Hol(Z2 × Z2 ) =
18. Let ρ be the standard permutation representation of S3 acting on {1, 2, 3} and let G = Z5 oρ S3 . Use
the presentation of Z5 = hx | x5 = 1i.
(a) Calculate the product in G of (x, x2 , x, (1 2)) · (x3 , 1, x2 , (1 2 3)).
(b) Calculate the inverse (x, x2 , x4 , (1 3))−1 .
(c) Calculate the general conjugate (xa , xb , xc , σ) · (xp , xq , xr , 1) · (xa , xb , xc , σ)−1 .
19. Give a presentation for the group Z5 o Z3 .
20. Let n be a positive integer and let d be a nontrivial divisor. Show that the largest subgroup of Sn
acting naturally on X = {1, 2 . . . , n} that has n/d blocks of size d is a wreath product Sd o Sn/d .
Calculate the order of this group.
21. Prove that Zp o Zp is a nonabelian group of order pp+1 that is isomorphic to the Sylow p-subgroup of
Sp2 . (See Exercise 8.5.3.)

9.4
Classification Theorems
This section consists primarily of examples of classification of groups of a given order. For certain
integers n, Sylow’s Theorem may allow us to find a normal subgroup of a given order. Then, it may
454 CHAPTER 9. CLASSIFICATION OF GROUPS

be possible to use a semidirect product construction and determine all possible groups of a given
order.
Example 9.4.1 (Groups of Order pq). Example 8.5.10 and Exercise 9.3.5 already established
this classification. We restate the results here. Let |G| = pq.

Case 1. If p = q then G is isomorphic to Zp2 or to Zp2 .


Case 2. If p < q and p - (q − 1) then G is isomorphic to Zpq .
Case 3. If p < q and p | (q − 1) then G is isomorphic to Zpq or to the only (nondirect) semidirect
product Zq o Zp . 4

Example 9.4.2 (Groups of Order 12). Let G be a group of order 12.


If G is abelian, G is isomorphic to Z12 or Z6 ⊕ Z2 .
Suppose that G is not abelian. By Cauchy’s Theorem, G contains elements of order 2 and 3. By
Sylow’s Theorem, n3 is equal to 1 or 4 and n2 is equal to 1 or 3. If n3 = 4, then G must contain 8
elements of order 3 and the identity. A single Sylow 2-subgroup contributes 3 more elements that
would account for all elements of G. Therefore, if n3 = 4, we cannot have n2 > 1. Hence, G cannot
have both n3 (G) = 4 and n2 (G) = 3.
Note that if n3 = 1 and n2 = 1, then if P is a Sylow 2-subgroup and Q a Sylow 3-subgroup,
then P, Q E G and P ∩ Q = 1. By the Direct Sum Decomposition Theorem, G = P Q ∼ = P ⊕ Q.
However, P is isomorphic to Z4 or Z2 ⊕ Z2 , while Q ∼ = Z3 . Therefore, if n3 = n2 = 1, we obtain
the abelian cases. The nonabelian cases arise when: (1) n3 = 4 and n2 = 1; and (2) n3 = 1 and
n2 = 3. However, in these nonabelian cases, the group can be expressed as G = P Q, with P a Sylow
2-subgroup and Q a Sylow 3-subgroup and where either P or Q is normal. Since P ∩ Q = {1}, then
by Proposition 9.3.9, G is a semidirect product P oϕ Q or Q oϕ P .

Case 1. Let P be the normal Sylow 2-subgroup of G. P can be isomorphic to Z4 or Z2 ⊕ Z2 . All


the elements not in P must have order 3 since all elements of order 2 and 4 must be in the
unique Sylow 2-subgroup of G. Note that Aut(Z4 ) ∼ = U (4) ∼
= Z2 , so there is no nontrivial
homomorphism from Z3 into Aut(Z4 ). The only group of order 12 with a normal Sylow 2-
subgroup that is isomorphic Z4 is in fact abelian, Z12 . On the other hand, Aut(Z2 ⊕ Z2 ) ∼ =
GL2 (F2 ). This is nonabelian of order 6 so contains two elements of order 3, in particular
   
0 1 1 1
and .
1 1 1 0

So there exists a nontrivial homomorphism ϕ of Z3 = hxi into Aut(Z2 ⊕ Z2 ) that sends the
generator x to one of the above automorphisms, described by the matrices. With the first
matrix, we have

ϕ(x)(1, 0) = (0, 1) ϕ(x)(0, 1) = (1, 1) ϕ(x)(1, 1) = (1, 0).

So we can create the group (Z2 ⊕ Z2 ) oϕ Z3 . However, this is isomorphic to a group we already
know, namely A4 . Furthermore, the two different homomorphisms into Aut(P ) given by the
two different matrices both produce semidirect products that are isomorphic to A4 .
Case 2. Suppose now that Q is a normal subgroup of G of order 3. The quotient group G/Q is
isomorphic to Z4 or Z2 ⊕ Z2 and we look for nontrivial homomorphisms of each of these groups
into Aut(Z3 ) ∼
= U (3) ∼
= Z2 . The only nontrivial element of Aut(Z3 ) is inversion which we will
call λ.
If G/Q ∼= Z4 = hxi, then the only nontrivial homomorphism into Aut(Z3 ) has ϕ(x) = λ, or in
other words ϕ(x)(h) = h−1 for all h ∈ Q. This is a new semidirect product with a presentation
of
Z3 o Z4 = hx, y | x4 = y 3 = 1, xyx−1 = y −1 i.
9.4. CLASSIFICATION THEOREMS 455

On the other hand, if G/Q ∼ = Z2 ⊕ Z2 , then we have three choices for ϕ depending on which
two (nontrivial) elements of Z2 ⊕ Z2 get sent to λ. One can easily check that the resulting
semidirect products are all isomorphic to S3 ⊕ Z2 ∼
= D6 .

In conclusion, if |G| = 12, then G is isomorphic to one of the following (nonisomorphic) groups:

Z12 , Z6 ⊕ Z2 , D6 , A4 , Z3 o Z4 . 4

Example 9.4.3 (Groups of Order 1225). Let G be a group of order |G| = 1225 = 52 · 72 . We
prove that G is abelian. The index n5 must satisfy n5 | 49 and n5 ≡ 1 (mod 5). The divisors of
49 are 1, 7, and 49. Only 1 satisfies the second condition so n5 = 1. Hence, G has a normal Sylow
5-subgroup P . Let Q be any Sylow 7-subgroup of G. Then P Q ≤ G and

|P | |Q| 52 · 72
|P Q| = = = 1225,
|P ∩ Q| 1

so P Q = G. Since P ∩ Q = {1}, then by Proposition 9.3.9, G is a semidirect product P oϕ Q.


Since |P | = 52 , we know that P ∼ = Z25 or Z5 ⊕ Z5 . Suppose first that P ∼ = Z25 . Consider the
action of Q on P by conjugation. This induces a homomorphism from Q into Aut(P ) ∼ = U (25).
However, |U (25)| = 24, which is relatively prime with |Q| = 49 so the only homomorphism from Q
to Aut(P ) is trivial. Hence, in this cases, Q commutes with all of P . Similarly, suppose P ∼
= Z5 ⊕ Z5
and consider the action of Q on P by conjugation. This induces a homomorphism Q → Aut(Z5 ⊕Z5 ).
However, Aut(Z5 ⊕ Z5 ) ∼ = GL2 (F5 ) which has order (52 − 1)(52 − 5) = 480, which is relatively prime
to |Q| = 49. Hence, the only homomorphism Q → Aut(P ) is trivial, so again Q commutes with all
of P . So we have shown that G = P ⊕ Q in both possibilities. We already know that if a group has
order p2 , where p is prime, then the group is isomorphic to Zp2 or Zp2 .
We conclude that if G has order 1225, then G is abelian and the possible groups of order 1225
are given by FTFGAG. 4

Example 9.4.4 (Groups of Order 286). Let G be a group of order |G| = 286 = 2 · 11 · 13. By
Sylow’s Theorem, n11 (G) divides 26 and n11 (G) ∼ = 1 (mod 11). This implies that n11 (G) = 1 so G
has a normal Sylow 11-subgroup P . Similarly, by Sylow’s Theorem n13 (G) divides 22 and n13 (G) ∼
=1
(mod 13). This implies that n13 (G) = 1 so G has a normal Sylow 13-subgroup Q. The subgroup
P Q is a group of order 143. It is normal since |G : P Q| = 2. By Example 9.4.1, P Q ∼
= Z143 .
Write P Q = hxi with |x| = 143. By Cauchy’s Theorem, G has an element of order 2, say the
element y. Obviously, P Qhyi = G so G is a semidirect product Z143 oϕ Z2 .
Since as rings Z/143Z = Z/11Z ⊕ Z/13Z, the automorphism group is

Aut(Z143 ) = U (Z/11Z) ⊕ U (Z/13Z) = U (11) ⊕ U (13) ∼


= Z10 ⊕ Z12 .

The group Z10 ⊕ Z12 has three elements of order 2. In U (143), these elements are 12, −1 = 142,
and −12 = 131. Each of these elements leads to three homomorphisms ϕi : Z2 → Aut(Z143 ) with
ϕ1 (y)(x) = x12 , ϕ2 (y)(x) = x131 , and ϕ3 (y)(x) = x−1 . These give 3 semidirect products

G1 = Z143 o1 Z2 = hx, y | x143 = y 2 = 1, yxy −1 = x12 i,


G2 = Z143 o2 Z2 = hu, v | u143 = v 2 = 1, vuv −1 = u131 i,
G3 = Z143 o3 Z2 = ha, b | a143 = b2 = 1, bab−1 = a−1 i.

We could have approached the classification somewhat differently and we do so now to show the
benefit. First note that Z143 = Z11 ⊕ Z13 and that Aut(Z143 ) = U (11) ⊕ U (13) = Aut(Z11 ) ⊕
Aut(Z13 ). Setting g1 and g2 as generators for Z11 and Z13 respectively, the homomorphisms ϕi
correspond to
( ( (
g1 7→ g1 g1 7→ g1−1 g1 7→ g1−1
ϕ1 (y) : ϕ 2 (y) : ϕ 3 (y) :
g2 7→ g2−1 , g2 7→ g2 , g2 7→ g2−1 .
456 CHAPTER 9. CLASSIFICATION OF GROUPS

Consequently,

G1 ∼
= D13 ⊕ Z11 , G2 ∼
= D11 ⊕ Z13 , and G3 ∼
= D143 .
Furthermore, we easily determine that no two of these groups are isomorphic by counting the ele-
ments of order 2: 13 in G1 , 11 in G2 , and 143 in G3 . 4

In some of the examples above, we encountered a few common situations that we delineate in
the following propositions.

Proposition 9.4.5
Suppose that G is a group with |G| = pa q b where p and q are distinct primes and a, b ≥ 1
are integers. Suppose also that np (G) = 1 or nq (G) = 1. Then G is a semidirect product
between a Sylow p-subgroup and a Sylow q-subgroup.

Proof. If np (G) = 1 or nq (G) = 1 then there is a normal Sylow p-subgroup or a normal Sylow
q-subgroup. Let P be a Sylow p-subgroup and let Q be a Sylow q-subgroup. Then P ∩ Q is a
subgroup of P and of Q so its order must divide gcd(pa , q b ) = 1. Hence, P ∩ Q = {1}. Also, since
P or Q is normal, P Q is a subgroup of G and it has order |P ||Q|/|P ∩ Q| = pa q b . Thus, P Q = G.
By Proposition 9.3.9, G is a semidirect product P oϕ Q or Q oϕ P . 

Example 9.4.6 (Groups of Order p2 q, with p 6= q). Let p and q be primes and let G be a
group of order p2 q. Let P ∈ Sylp (G) and let Q ∈ Sylq (G). We break this example into cases.

Case 1: p > q. Since np divides q and np ≡ 1 (mod p) then we must have np = 1. Thus, P E G
and G = P oϕ Q for some homomorphism ϕ : Q → Aut(P ). We now have two subcases:
• P ∼= Zp ⊕ Zp . Then Aut(P ) =∼ GL2 (Fp ) and | Aut(P )| = p(p − 1)2 (p + 1). There exist
nontrivial homomorphisms ϕ when q|(p − 1) or q|(p + 1). Note that if q 1 is the highest
power of q dividing p + 1, then by Sylow’s Theorem applied to Aut(P ), all Sylow q-
subgroups of Aut(P ) are conjugate to each other. Hence, by Exercise 9.3.3, there exists
a unique (up to isomorphism) semidirect product P oϕ Q.
• P ∼= Zp2 . Then Aut(P ) = U (p2 ) ∼
= Zp(p−1) . There exist nontrivial homomorphisms ϕ
when q|(p − 1).
Case 2: p < q. Then nq = 1 + kq and since nq divides p2 then nq must be 1, p or p2 . If nq = 1,
then Q E G. Since q > p then if nq 6= 1, we cannot have nq = 1 + kq = p. Thus, nq = p2 .
We then have kq = p2 − 1 = (p − 1)(p + 1). Hence, q divides p − 1 or p + 1. Since q > p then
q = p + 1 which leads us to the case p = 2 and q = 3, so we are left with discussing the case
|G| = 12. This case was settled in Example 9.4.2. We suppose now that |G| = 6 12.
We know from Proposition 9.4.5 that G = Q oϕ P for some homomorphism ϕ : P → Aut(Q).
We have two subcases.
• P ∼= Zp ⊕ Zp . Since Aut(Q) = U (q) ∼ = Zq−1 , then if p|(q − 1), there exist nontrivial
homomorphisms ϕ : P → Aut(Q).
• P =∼ Zp2 . Again, Aut(Q) = U (q) ∼ = Zq−1 . A homomorphism ϕ : Zp2 → Aut(Zq ) is
determined by where it maps the generator x of Zp2 . By Lagrange’s Theorem |ϕ(x)|
divides gcd(p2 , q − 1), which, depending on p and q might be 1, p, or p2 . 4

Example 9.4.7 (Groups of Order p3 ). Let p be an odd prime and let G be a group of order
p3 . (If p = 2, it is easy to find all the groups of order 8 so we refer the reader to the table in
Section A.2.1.)
For this example, we cannot play Sylow p-subgroups off themselves since the group itself is a
p-group. By FTFGAG, we know that there are three nonisomorphic abelian groups of order p3 ,
namely Zp3 , Zp2 ⊕ Zp , and Zp3 .
9.4. CLASSIFICATION THEOREMS 457

From now on assume G is nonabelian. By the Class Equation, it is possible to prove (Exer-
cise 8.4.14) that every p-group has a nontrivial center. Also, we know that if G/Z(G) is cyclic, then
G is abelian. (See Exercise 4.3.21.) So we must have Z(G) ∼ = Zp . Then G/Z(G) is a p group of
order p2 , which contains a normal N subgroup of order p. By the Fourth Isomorphism Theorem, G
contains a normal subgroup N of order such that N/Z(G) = N .
There are two cases for the isomorphism type of N .

Case 1. G has a normal subgroup N that is isomorphic to Zp2 . Let N = hxi. Assume that G − N
does not contain an element of order p so that all elements of order p are in N . In general, if
g is an element of order p2 , then hgi contains p(p − 1) other generators (of order p2 ). Also, if
g1 and g2 are both of order p2 , then g1k = g2` with k and ` bother relatively prime to p2 , then
hg1 i = hg2 i. Hence, the assumption that N is the only subgroup that contains elements of order
p implies that, if there are k distinct subgroups of order p2 , then p3 = 1 + (p − 1) + kp(p − 1).
Thus, p2 = k(p − 1). This is a contradiction. Hence, G − N contains an element y of order p.
Thus, by Proposition 9.3.9, G is a semidirect product of Zp2 by Zp .
We know that Aut(N ) = ∼ (Z/p2 Z)× = ∼ Zp(p−1) . Hence, again by Cauchy’s Theorem Aut(N )
contains an element of order p. By Exercise 9.3.10, the element 1 + p has order p modulo p2 .
A nontrivial homomorphism ϕhyi → Aut(N ) has ϕ(y)(x) = x1+p . As a group presentation,
2
Zp2 oϕ Zp = hx, y | xp = y p = 1, yxy −1 = x1+p i.

Again, though there are choices for the homomorphism ϕ : hyi → Aut(H), because of the
options for generators of H, the different nontrivial semidirect products are all isomorphic.
Case 2. G does not contain a normal subgroup that is isomorphic to Zp2 . Note that if |x| = p2 ,
then Proposition 9.2.4 hxi E G. Hence, this case implies that all the nonidentity elements in
G have order p. So the normal subgroup N of order p2 has N ∼ = Zp ⊕ Zp and G − N contains
an element z of order p. So G = N hzi and by Proposition 9.3.9, G is a semidirect product
(Zp ⊕ Zp ) oϕ Zp .
By Proposition 9.3.16, Aut(N ) ∼= GL2 (Fp ), which has order (p2 − p)(p2 − 1). By Cauchy’s
Theorem, since p divides (p − p)(p2 − 1), then Aut(N ) contains an automorphism ψ of order
2

p. So we need a homomorphism ϕ : hzi → Aut(N ) by ϕ(z) = ψ for some such ψ.


To understand this group more explicitly, suppose that N = hxi ⊕ hyi. An element in GL2 (Fp )
of order p is  
1 1
A= .
0 1
If ψ corresponds to this matrix then we have ψ(x) = x and ψ(y) = xy. As a group presentation,
we have with this specific ψ,

(Zp ⊕ Zp ) oϕ Zp = hx, y, z | xp = y p = z p = 1, xy = yx, zxz −1 = x, zyz −1 = xyi.

But p is the highest order of p dividing | GL2 (Fp )| so, by Sylow’s Theorem, all subgroups of
order p in GL2 (Fp ) are conjugate. Consequently, Exercise 9.3.3, all nontrivial homomorphisms
ϕ : Zp → Aut(Ap ⊕ Zp ) produce isomorphic semidirect products. 4

We reiterate that the methods at our disposal up to this point to classify groups of a given order
n often involve using Sylow p-subgroups, proving that all groups of order n must be a combination
semidirect product of their Sylow p-subgroups, and then determining how many such products are
distinct. Remark 9.3.13 pointed out that it is not always easy to tell when certain semidirect
products are isomorphic. However, there are a few situations in which it is possible to tell, as seen
in Exercises 9.3.3 and 9.3.4. In particular, in Exercise 9.3.3, the desired condition when comparing
H oϕ1 K and H oϕ2 K is that Im ϕ1 and Im ϕ2 are conjugate subgroups in Aut(H). As an additional
strategy using the result of this exercise, if Im ϕ1 and Im ϕ2 happen to be Sylow subgroups of Aut(H),
then by Sylow’s Theorem, they are conjugate subgroups.
458 CHAPTER 9. CLASSIFICATION OF GROUPS

Exercises for Section 9.4


1. Prove that there is only one nonabelian group of order 1183 = 7 · 132 . Give an explicit presentation
of it.
2. Prove that all groups of order 4225 are abelian.
3. Show that every group of order 30 has a normal subgroup of order 15. Then prove that there are
exactly 4 nonisomorphic groups of order 30.
4. Prove that all groups of order 14161 = 72 · 172 are abelian.
5. Consider groups G of order 351 = 33 · 13. Prove that G has a normal Sylow 13-subgroup or a normal
Sylow 3-subgroup. Show that there exists a unique nonabelian group of order 351 with a normal
subgroup isomorphic to Z33 and give its presentation. [Hint: The Sylow 13-subgroups of Aut(Z33 ) are
conjugate.]
6. Classify the groups of order 105. [There are 2 nonisomorphic groups.]
7. Classify the groups of order 20. [There are 5 nonisomorphic groups.]
8. Classify the groups of order 154.
9. Classify the groups of order 333.
10. Let p be an odd prime. Classify the groups of order 4p. [Hint: The number of nonisomorphic groups
is different depending on whether p ≡ 1 (mod 4) or p ≡ 3 (mod 4).]
11. Let (p, p + 2) be a prime pair, i.e., both p and p + 2 are primes. Consider groups G of order 2p(p + 2).
(a) Prove that G contains a normal subgroup N of order p(p + 2).
(b) Prove that G must be a semidirect product of N with another subgroup.
(c) Prove that G must be isomorphic to one of the following four groups: Z2p(p+2) , Dp ⊕ Zp+2 ,
Dp+2 ⊕ Zp , or Dp(p+2) .
12. Let p be an odd prime. Prove that every element in GL2 (Fp ) of order 2 is conjugate to a diagonal
matrix with 1 or −1 on the diagonal. Use this result to classify the groups of order 2p2 .
13. Recall the Heisenberg group H(Fp ) introduced in Exercise 3.2.35. We observe that it is a group of
order p3 . Referring to Example 9.4.7, determine the isomorphism type of H(Fp ).
14. In this exercise, we classify groups G of order 56 = 23 · 7. Let P be a Sylow 2-subgroup and let Q be
a Sylow 7-subgroup.
(a) Prove that G has a normal Sylow 2-subgroup or a normal Sylow 7-subgroup. Deduce that G is
P oϕ Q or Q oϕ Q for an appropriate ϕ.
(b) List all (three) nonabelian groups of order 56. (In the remainder of the exercise, assume G is
nonabelian.)
(c) Suppose that Q is normal. Prove that there is: (i) one group when Q ∼ = Z23 ; (ii) two nonisomor-
∼ ∼
phic groups when Q = Z4 ⊕ Z2 ; (iii) one group when Q = Z8 ; (iv) three non-isomorphic groups
when Q ∼ = D4 ; and (v) two nonisomorphic groups when Q ∼ = Q8 .
(d) Suppose that Q is not a normal subgroup. Prove that the only nontrivial homomorphism ϕ :
Q → Aut(P ) occurs when P ∼ = Z23 . Prove that there is only one nonisomorphic semidirect
3
product Z2 o Z7 . Give a presentation of this group.

9.5
Nilpotent Groups
We finish this chapter with a brief section on specific class of groups, called nilpotent groups.
Nilpotent groups have classifications that are essentially as easy as one can expect. In essence, the
only complexity of the groups structure comes from the Sylow subgroups. Consequently, we first
explore in more detail properties of p-groups.
9.5. NILPOTENT GROUPS 459

9.5.1 – Properties of p-Groups


Recall that a p-group is a group P of order pk where p is prime and k is a positive integer. From the
Class Equation, we deduce that every p-group has a nontrivial center. However, this result, along
with further uses of the class equation, implies still stronger results.

Proposition 9.5.1
Let P be a p-group. If N E P , then N ∩ Z(P ) is nontrivial.

Proof. Suppose |P | = pk and let N be a nontrivial normal subgroup of P . By Proposition 4.2.14, N


is a union of conjugacy classes K1 , K2 , . . . , Kr . Suppose that these conjugacy classes are indexed so
that |Ki | = 1 for i ≤ s and |Ki | > 1 for s + 1 ≤ i ≤ r. By the Orbit-Stabilizer Theorem, if i ≥ s + 1,
then |Ki | = |P : CP (gi )| for each gi ∈ Ki . But |P : CP (gi )| divides pk . Since |P : CP (gi )| > 1 for
s + 1 ≤ i ≤ r, then p divides |P : CP (gi )|. By the Class Equation,

|P | = |K1 | + · · · + |Ks | + |Ks+1 | + · · · + |Kr |,

so pk = s + pm for some integer m. Since s = pk − pm, we see that p|s and in particular s 6= 1.
Singleton conjugacy classes correspond to elements in the center. Thus, K1 ∪ K2 ∪ · · · ∪ Ks =
N ∩ Z(P ) and |N ∩ Z(P )| = s. Since s > 1, the intersection N ∩ Z(P ) is a nontrivial subgroup. 

In the classification of groups of order p3 (Example 9.4.7), we repeatedly used the fact that
Z(P ) 6= {1} for all p-groups P . This generalizes to the following more profound property.

Proposition 9.5.2
Let P be a p-group of order pk . Then P contains a normal subgroup of order pa for all
0 ≤ a ≤ k.

Proof. We first claim that every p-group of order pk has subgroups of all orders pa for 0 ≤ a ≤ k. We
prove the claim by induction on k. The case k = 1 is trivial. Suppose that the claim is true for k; we
will prove the claim for k + 1. Let P be a p-group of order pk+1 . The center is nontrivial p-subgroup
so by Cauchy’s Theorem, Z(P ) contains an element x of order p. Then hxi E P since x ∈ Z(P ).
But P/hxi is a p-group with order pk so by the induction hypothesis, it contains subgroups H i with
|H i | = pi for i = 1, 2, . . . , k. However, by the Fourth Isomorphism Theorem, P contains subgroups
Hi such that H i = Hi /hxi. But in P , the subgroups have order |Hi | = pi+1 for 1 ≤ i ≤ k, so
together with hxi, we deduce that P has subgroups of order pj for 1 ≤ j ≤ k + 1.
We now prove that a p-group has normal subgroups for all orders pa . Again we prove this by
induction. The case k = 1 is trivial. Suppose the proposition is true for all ` < k; we prove it is true
for k. Let P be a p-group of order pk . Then Z(P ) = pm for some m ≥ 1. Then P/Z(P ) has order
pk−m and k − m < k. By our first claim, Z(P ) has subgroups of all orders,

1 = K0 ≤ K1 ≤ K2 ≤ · · · ≤ Km = Z(P )

with |Ki | = pi . Since they are in Z(P ), they are all normal in P . The quotient group P/Z(P ) is
a p-group of order strictly less than pk . By the induction hypothesis, this group contains normal
subgroups H j of all orders pj with 0 ≤ j ≤ k − m. By the Fourth Isomorphism Theorem, P contains
normal subgroups Hj such that Hj /Z(P ) = H j . Furthermore, |Hj | = pj+m . Hence, the list

1 = K0 , K1 , K2 , . . . , Km = Z(P ) = H0 , H1 , H2 , . . . , Hk−m = P

is a list of normal subgroups of all orders pa with 0 ≤ a ≤ k. 

Cauchy’s Theorem and Sylow’s Theorem gave partial converses to Lagrange’s Theorem; this
proposition gives us yet another partial converse. By Sylow’s Theorem, every group G possesses a
460 CHAPTER 9. CLASSIFICATION OF GROUPS

Sylow p-subgroup P but by Proposition 9.5.2, every group possesses a subgroup of order pi for all
1 ≤ i ≤ k, where |G| = pk m and p - m.
Combining the previous two propositions leads to a yet slightly stronger result about normality.

Proposition 9.5.3
Let P be a p-group and let N E P with |N | = pa . In N there exist normal subgroups in P
of all orders pi with 0 ≤ i ≤ a.

Proof. We prove this by induction on k, where pk is the order of P . If k ≤ 1, the result is trivial.
Suppose that the proposition is true for k; we will show the proposition holds for k + 1. Let P have
order pk+1 and suppose that N is a nontrivial normal subgroup of P of order pa . By Proposition 9.5.1,
N ∩ Z(P ) is nontrivial and by Cauchy’s Theorem contains an element x of order p. By the Third
Isomorphism Theorem, N = N/hxi E P/hxi = P and N has order pa−1 . The group P has order k so
by the induction hypothesis, N contains subgroups N j normal in P of orders pj for all 0 ≤ j ≤ a − 1.
By the Fourth Isomorphism Theorem, for each j, there is a normal subgroup group Nj EP satisfying
hxi ≤ Nj ≤ N
and |Nj | = pj+1 for all 0 ≤ j ≤ a − 1. This proves the claim for k + 1. By induction, the proposition
holds for all positive integers k. 
The point about this proposition is that not only does P have normal subgroups of all possible
orders by Proposition 9.5.2, but every normal subgroup also contains normal subgroups in P of all
possible orders dividing |N |.
In the above proofs, we used Cauchy’s Theorem and the Fourth Isomorphism Theorem to prove
results by “working our way up” from an element of order p in Z(G). However, it is also possible to
deduce some properties about the subgroup lattice of a p-group from the top down.

Lemma 9.5.4
If H is a proper subgroup of a p-group P , then H NP (H), i.e., H is strictly contained in
its normalizer.

Proof. The proof of this lemma is similar to the above proofs and uses induction on k = logp |P |.
For k = 1 or 2, the lemma holds trivially since P is abelian. Suppose the lemma is true for all ` ≤ k;
we will prove it for k. Since Z(P ) commutes with all elements in P , it satisfies Z(P ) ≤ NP (H). But
H ≤ NP (H) so we have two cases. Case 1: If Z(P ) is not contained in H, then hZ(P ), Hi ≤ NP (H)
and H is a strict subgroup of hZ(P ), Hi, thereby establishing the induction. Case 2: If Z(P ) ≤ H
then by the Fourth Isomorphism Theorem, H corresponds uniquely with a subgroup H in P =
P/Z(P ), via H = H/Z(P ). Since Z(P ) 6= {1}, then |P/Z(P )| < pk so by the induction hypothesis
H NP (H). It follows from the Fourth Isomorphism Theorem that NP (H) = NP (H)/Z(P ) and
also that H NP (H). 
A priori, maximal subgroups of a group are maximal by containment (not order). Hence, maximal
subgroups do not have to all be of the same order. The lattice subgroup of D6 in Example 3.6.8
shows that D6 has maximal subgroups of order 6 and of order 4. With p-groups the situation is
more limited.

Proposition 9.5.5
Every maximal subgroup of a p-group has index p and is a normal subgroup.

Proof. Let M be a maximal subgroup of a p-group P . By Lemma 9.5.4, M is a strict subgroup of


NP (M ). Since M is maximal, then NP (M ) = P , so M E P . By maximality of M and the Fourth
Isomorphism Theorem, P/M is a p-group with no proper nontrivial subgroups. By Proposition 9.5.2,
the only such p-group is Zp and the proposition follows. 
9.5. NILPOTENT GROUPS 461

9.5.2 – Nilpotent Groups


The proposition that Z(P ) 6= {1} for p-groups P and generalizations thereof lead to many specific
properties. One of the key properties we used repeatedly was that for a p-group G, the quotient
G/Z(G) does not necessarily have a trivial center. Furthermore, the center of G/Z(G), being normal
in G/Z(G), pulls back to a normal subgroup in G. We extend this observation as follows.
For any finite group G, define the following subgroups inductively: Z0 (G) = {1}, and Z1 (G) =
Z(G), but for all i ≥ 2, define Zi (G) as the subgroup of G that satisfies Zi (G)/Zi−1 (G) =
Z(G/Zi−1 (G)). This subgroup exists again by virtue of the Fourth Isomorphism Theorem but
more importantly, Zi (G) E G because Z(G/Zi−1 (G)) E G/Zi−1 (G).

Definition 9.5.6
The chain of subgroups
Z0 (G) ≤ Z1 (G) ≤ Z2 (G) ≤ · · ·
is called the upper central series of G.

Definition 9.5.7
A group G is called nilpotent if Zc (G) = G for some index c. The minimum index c such
that Zc (G) = G is called the nilpotence class of G.

If G is abelian, then Z1 (G) is abelian and hence Zi (G) = G for all i ≥ 1. In fact, a group is
abelian if and only if it is nilpotent of nilpotence class 1. In contrast, since Z(Sn ) = 1 whenever
n ≥ 3, so Z1 (Sn ) = {1} and also Zi (Sn ) = {1} for all i ≥ 1. If G is a finite group, then the upper
central series has a maximal element but from the two extreme examples just mentioned, it need
not terminate at G.

Proposition 9.5.8
A nilpotent group is solvable. In contrast, not every solvable group is nilpotent.

Proof. The chain of subgroups

{1} = Z0 (G) E Z1 (G) E · · · E Zc (G) = G

is a chain of successively normal subgroups of G in which Zi (G)/Zi−1 (G) is the center of G/Zi−1 (G).
The center of any group is abelian so the quotients are all abelian. Hence, G is solvable.
For the second part of the proposition, recall that the group S4 is solvable. (See the commutator
series in Example 9.1.12.) However, S4 is not nilpotent. Hence, though all nilpotent groups are
solvable, the reverse is not true. 

Example 9.5.9. We propose to calculate the upper central series of D12 . Recall that Z(Dn ) is
hrn/2 i is n is even and {1} if n is odd.

definition Zi (D12 )
Z1 (D12 ) = hr6 i =⇒ D12 /Z1 (D12 ) ∼
= D6
Z2 (D12 )/hr6 i = Z(D12 /Z1 (D12 )) ∼
= Z(D6 ) = hr3 i =⇒ Z2 (D12 ) = hr3 i =⇒ D12 /Z2 (D12 ) ∼
= D3
Z3 (D12 )/hr3 i = Z(D12 /Z2 (D12 )) ∼
= Z(D3 ) = {1} =⇒ Z3 (D12 ) = hr3 i

The upper central series now terminates. Note that Zi (D12 ) = hr3 i for all i ≥ 2. Hence, D12 is not
a nilpotent group. 4

The following proposition shows why nilpotent groups generalize p-groups. We leave the proof as
an exercise for the reader since it is similar to many of the proofs for the propositions on p-groups.
462 CHAPTER 9. CLASSIFICATION OF GROUPS

Proposition 9.5.10
Let p be a prime and let P be a p-group of order pk , with k ≥ 2. Then P is nilpotent of
nilpotence class at most k − 1. (If k = 1, then the nilpotence class is 1.)

The class of nilpotent groups is important because of the characterization provided in the follow-
ing theorem. Most importantly, part (4), along with knowledge of possibilities for p groups, gives a
classification of nilpotent groups.

Theorem 9.5.11
Let G be a finite group of order pα 1 α2 αs
1 p2 · · · ps , with pi distinct primes. For all i = 1, 2, . . . , s,
let Pi ∈ Sylpi (G). The following are equivalent:
(1) G is nilpotent;
(2) Every proper subgroup of G is a proper subgroup of its normalizer;
(3) npi (G) = 1 for all pi ;

(4) G ∼
= P1 ⊕ P2 ⊕ · · · ⊕ P s .

Proof. (1) =⇒ (2): Similar to the proof of Lemma 9.5.4. (Left as an exercise for the reader. See
Exercise 9.5.4.)
(2) =⇒ (3): Call Hi = NG (Pi ) for all ≤ i ≤ s. Since Pi E Hi , Pi is in fact characteristic in
Hi by Proposition 8.5.9. Furthermore, Pi is characteristic in the possibly larger group NG (Hi ). By
definition of the normalizer, this entails that Hi = NG (Hi ). The hypothesis (2) implies that Hi is
not a proper subgroup, so Hi = G for all i, which means that Pi E G.
(3) =⇒ (4): This implication follows immediately from the Direct Sum Decomposition Theorem.
(4) =⇒ (1): This follows from the property that the direct sum of nilpotent groups is again
nilpotent. We leave this as an exercise for the reader. (Exercise 9.5.5.) 

Exercises for Section 9.5


1. Prove that Zi (G) is a characteristic subgroup of G for all i.
2. Prove that Dn is nilpotent if and only if n is a power of 2.
3. Prove Proposition 9.5.10.
4. Prove part 1 of Theorem 9.5.11.
5. Prove that if G1 and G2 are nilpotent groups, then G1 ⊕ G2 is also nilpotent.
6. Prove that subgroups and quotient groups of nilpotent groups are again nilpotent. Deduce also that
homomorphic images of a nilpotent group is again a nilpotent group.
7. Prove that if G/Z(G) is nilpotent, then G is nilpotent.
8. Prove that if |G| = n2 with n = p1 p2 · · · pr with all pi distinct such that pi - p2j − 1 for all pairs of
i 6= j, then G is abelian.
9. Prove that a maximal subgroup of a nilpotent group has a prime index.
10. Let G be a nilpotent group. Prove that x, y ∈ G commute whenever gcd(|x|, |y|) = 1.
11. Let F be a finite field and let Un (F ) be the subgroup of GLn (F ) of upper triangular matrices with 1s
on the diagonal. Prove Un (F ) is a nilpotent group of nilpotence class n − 1.
12. Consider the Heisenberg group H(Fp ). (See Exercises 3.2.35 and 9.4.13.) Since H(Fp ) is a p-group it
has a trivial center. Find it.
9.6. PROJECTS 463

9.6
Projects
Project I. Automorphism Groups. Section 9.3 gave the isomorphism type of Aut(G) for a few
groups G. Can you come up with others? For example, can you determine the automorphism
group of Z3 o Z4 , Z5 o Z4 , Z7 o Z3 , (Z3 × Z3 ) o Z2 , or others?
Project II. Semidirect Products Zq o Zp in R3 . From a geometric perspective, the dihedral
group Dn = Zn oZ2 can be viewed as a group of transformations in R3 generated by a rotation
by π around one axis and a rotation by 2π/n around a perpendicular axis. Are there primes
p < q such that the nontrivial semidirect product Zq oZp can be viewed as a group of rotations
in R3 ? (In other words, is there an injective homomorphism f : Zq o Zp → GL3 (R)?) If so,
what is the angle between the generating axes? For a specific example, give rotation matrices
for the generators of Zq o Zp .

Project III. Abelian by Number. Explore conditions on integers n such that a group of order
n must be abelian.
Project IV. Groups of Order p4 . Explore the automorphism groups of groups of order p3 . Use
this information to try to classify groups (or find as many groups as possible) of order p4 .
Project V. Groups of Order p3 q. Revisit Exercise 9.4.14. Explore how much of the exercise
generalizes to groups of order 33 · 37. Explore how much of the exercise generalizes to groups
of order 8p, where p is an odd prime. Generalize the explorations as much as possible.
Project VI. Big & Nasty Groups. Be creative and use semidirect products to come up with
large groups whose centers are trivial. Give presentations for your groups.

Project VII. Unipotent Matrices. If F is a finite field, the group of unipotent triangular
matrices is the group Un (F ) of matrices in GLn (F ) of upper triangular matrices with 1s on
the diagonal. Obviously, Un (Fp ) is a p-group of order pn(n−1)/2 . Try to determine the upper
central series of Un (Fp ); also study the commutator series.
Project VIII. Exploring the Universal Embedding Theorem. Recall the Universal Embed-
ding Theorem (Theorem 9.3.20). Explore this theorem with some explicit examples of groups
G with a normal subgroup N such that G is not a semidirect product of N and G/N (i.e.,
such that G does not contain a subgroup isomorphic to G/N ). The quaternion group Q8 with
N = h−1i satisfies this property, so determine the appropriate wreath product and explicitly
describe the embedding of Q8 in this wreath product. Find and study the same questions with
a least one other group besides Q8 to which this criterion applies.
10. Modules and Algebras

Chapter 8 presented the concept of the action of a group on a set. Recall that if a group G acts on a
set X, the elements of G behave as functions on X in such a way that the group operation behaves
as the function composition on X.
The general concept of an action of one algebraic structure on another structure is similar. The
elements of a first structure behave as functions/morphisms on the second structure but in such
a way that operations in the first structure behave as operations on the relevant set of functions.
Section 8.6 began to explore properties of groups acting on vector spaces. There, we emphasized
that the resulting action becomes a new algebraic structure in its own right. Furthermore, we also
saw how fruitful the study of action structures is to the understanding of both structures separately.
This chapter introduces action structures of rings. These are called modules and algebras de-
pending on the structure upon which the ring acts. Sections 10.1 and 10.2 present specific examples.
Boolean algebras are important in their own right and could be studied independently of our theme
of action structures. In Section 10.2, we review concepts of vector spaces, emphasizing the points
listed in the preface.
Sections 10.3 through 10.5 introduce the structure of a module over a ring. This includes most
of the guiding themes in the Outline for each algebraic structure: modules, submodules, examples,
homomorphisms, quotient modules, and convenient ways of describing the internal structure.
As an application of the theory of modules and their decompositions, we discuss finitely generated
modules over a PID in Sections 10.6 and 10.7. This leads to a generalization of the Fundamental
Theorem of Finitely Generated Abelian Groups. In turn, applying theorems for decomposition of
modules to F [x]-modules leads us to classification results for linear transformations. Section 10.8
presents the method of applying the Fundamental Theorem for Finitely Generated Modules over a
PID to linear transformations and presents the rational canonical form. Section 10.9 introduces the
Jordan canonical form of a matrix, an important result for many applications of linear algebras.
Finally, Section 10.11 introduces path algebras and their modules. Though there are many other
important modules and algebras that arise in various branches of mathematics (e.g., Lie algebras,
differential graded algebras, etc.), path algebras and their modules have drawn some interest in
research recently and have interesting consequences for linear algebra.

10.1
Boolean Algebras
Logic underlies mathematical reasoning. The epistemological strength of mathematical theorems rely
on the strict rules of logic. It may come as a surprise to some readers (though not to philosophers)
that there exist various types of logic [51]. Mathematical reasoning follows Boolean logic.
Boolean logic presupposes incontrovertible notions of true and false and involves statements,
called propositions, that are decidably true or false. A philosopher would view the previous sentence
as riddled with deficiencies. The word “incontrovertible” is problematic; the linguistic construct of
“statement” would require a clear definition; the bond between a string of symbols and meaning
ascribed to it by a reader would need careful investigation; and the process of deciding whether a
proposition is true or false poses yet more challenges. These issues and many more are the purview
of the philosophy of mathematics. In fact, a few philosophers of mathematics propose the use of
alternate logics, perhaps most notably intuitionist logic.
Boolean logic holds many of the deep problems of philosophy at bay by proposing an algebraic
model for logical reasoning. This algebra provides a method to deduce whenever one proposition is
equivalent to another or when one implies the other.

465
466 CHAPTER 10. MODULES AND ALGEBRAS

The propositional logic of Boolean logic is just one example of an algebraic structure called
Boolean algebra.

10.1.1 – Definition of a Boolean Algebra

Definition 10.1.1
A Boolean algebra is a quadruple (B, ∧, ∨, ) where B is a set, ∨ and ∧ are binary operations
B × B → B, and is a function B → B such that there exist elements 0, 1 ∈ B and for all
x, y, z ∈ B the following identities hold:

Identity laws x∧1=x


x∨0=x
Complement laws x∧x=0
x∨x=1
Associative laws (x ∧ y) ∧ z = x ∧ (y ∧ z)
(x ∨ y) ∨ z = x ∨ (y ∨ z)
Commutative laws x∧y =y∧x
x∨y =y∨x
Distributive laws (x ∧ y) ∨ z = (x ∨ z) ∧ (y ∨ z)
(x ∨ y) ∧ z = (x ∧ z) ∨ (y ∧ z)

We pronounce the operations ∧ as “wedge” and ∨ as “vee” and as the complement. Because
of applications to logic, which we will explain below, we also call these operations ∧ as AND, ∨ as
OR, and as NOT.
Boolean algebras satisfy a number of other “laws.” However, we do not list them in the definition
because these other laws can be proven from the given set of axioms.

Proposition 10.1.2
Let (B, ∧, ∨, ) be a Boolean algebra, the following properties also hold for all x, y ∈ B.
( (
0=1 0∧x=0
Basic negations Dominance laws
1=0 1∨x=1
(
x∧x=x
Double negative x = x Idempotent laws
x∨x=x
( (
x∧y =x∨y x ∧ (x ∨ y) = x
DeMorgan laws Absorption laws
x∨y =x∧y x ∨ (x ∧ y) = x

Proof. Let us first prove the basic negations. Applying the the first complement law to 1 gives
1 ∧ 1̄ = 0. However, by the first identity law we get 1̄ = 0. The second basic negation is proved
similarly.
Next, we prove the idempotent laws. For any x ∈ B,

x=x∨0 Identity law


= x ∨ (x ∧ x̄) Complement law
= (x ∨ x) ∧ (x ∨ x̄) Distributivity
= (x ∨ x) ∧ 1 Complement law
= x ∨ x. Identity law

The other idempotent law is similar.


10.1. BOOLEAN ALGEBRAS 467

For the first dominance law, notice that for all x ∈ B,


0 ∧ x = (x ∧ x) ∧ x = x ∧ (x ∧ x) = x ∧ x = 0.
The third equality follows from the idempotent laws. Again, the second dominance law is similar.
Next we show the double complement. For all x ∈ B, we have
x̄ = x̄ ∧ 1 = x̄ ∧ (x̄ ∨ x) = (x̄ ∧ x̄) ∨ (x̄ ∧ x)
= x̄ ∧ x = (x̄ ∧ x) ∨ (x ∧ x̄) = x ∨ (x̄ ∧ x̄) = x ∨ 0 = x.
The first absorption law follows from
x ∧ (x ∨ y) = (x ∨ 0) ∧ (x ∨ y) = (x ∨ (ȳ ∧ y)) ∧ (x ∨ y) = x ∨ (ȳ ∧ y ∧ y) = x ∨ 0 = x.
The second absorption law is similar.
The proof of the DeMorgan laws is the most challenging. We first claim that if a ∧ b = 0 and
a ∨ b = 1, then a = b. We can see this as follows
b = b ∧ 1 = b ∧ (a ∨ b)
= (b ∧ a) ∨ (b ∧ b) = (b ∧ a) ∨ 0 = (b ∧ a) ∨ (a ∧ b)
= a ∨ (b ∧ b) = a ∨ 0
= a.

To prove the first DeMorgan law, we prove a ∧ b = 0 and a ∨ b = 1 for a = x ∨ y and b = x ∧ y. First,
(x ∨ y) ∧ (x ∧ y) = (x ∨ y) ∧ (x ∧ y) = (x ∧ x ∧ y) ∨ (y ∧ y ∧ x) = (0 ∧ y) ∨ (0 ∧ x) = 0 ∨ 0.
Second,
(x ∨ y) ∨ (x ∧ y) = (x ∨ y) ∨ (x ∧ y) = (x ∨ x ∨ x) ∧ (y ∨ x ∨ y) = (1 ∨ y) ∧ (1∨) = 1 ∧ 1 = 1
Using the claim, we conclude that
(x ∨ y) = x ∧ y.
A similar argument establishes the second DeMorgan law. 
Example 10.1.3 (Boolean Logic). Boolean logic served as a motivating example for this section.
We can view Boolean logic as a Boolean algebra in the following sense.
Let S be the set of all logical propositions, statements that are decidably true or false. Note that
this set is enormous. The conjunction AND, the disjunction OR, and the negation NOT, give the
quadruple (S, AND, OR, NOT) the structure of a Boolean algebra. This is precisely the algebraic
structure of Boolean logic with one slightly unnatural adjustment. A Boolean algebra must contain
identities 0 and 1, which for the purpose of Boolean logic, are statements that are respectively always
true and always false. It is not uncommon to denote these respectively by T and F.
Boolean logic involves a variety of other symbols and notational habits. As in algebra, a propo-
sition is denoted with letters, such as p and q. For example, we might say that p=“Springfield is the
capital of Illinois” and q =“2+2=5.” Then p ∧ q, read “p and q” is the proposition
“Springfield is the capital of Illinois and 2 + 2 = 5.”
Among some notational habits, in the propositional calculus of Boolean logic, we typically use
the symbol ≡ for logically equivalent even though in a generic Boolean algebra we retain the use
of =. In this context, logicians often write ¬p instead of p. Also in Boolean logic, the implication
operation p → q is important as is the biconditional p ↔ q. These are defined algebraically by
p→q ≡p∨q or ¬p ∨ q
p ↔ q ≡ (p ∧ q) ∨ (p ∧ q) or (p ∧ q) ∨ (¬p ∧ ¬q).
Recall that in propositional logic, a tautology is a propositional form that is always true, i.e., a
Boolean algebra expression that reduces to T. 4
468 CHAPTER 10. MODULES AND ALGEBRAS

The definitions allow for a trivial Boolean algebra that consists of a single element 0 = 1. Then
the binary operations and the ) function are all trivial and all the axioms hold trivially. The basic
Boolean algebra is more important.
Example 10.1.4 (Basic Boolean Algebra). From the definition, a Boolean algebra must con-
tain at least two distinct elements. Interestingly enough, there exists a Boolean algebra with exactly
two elements and the operations on the elements are completely determined by the axioms of a
Boolean algebra:
0 ∧ 0 = 0, 0 ∧ 1 = 0, 1 ∧ 0 = 0, 1 ∧ 1 = 1,
0 ∨ 0 = 0, 0 ∨ 1 = 1, 1 ∨ 0 = 1, 1 ∨ 1 = 1,
0 = 1, 1 = 0.
Because of its application to Boolean logic, it is also common to consider the basic Boolean algebra
as the pair B = {F, T}, where we read F as the English “false” or Boolean 0, and T as the English
“true” and Boolean 1. Then the operators ∧, ∨, and correspond to AND, OR, and NOT applied
to these logical values. 4

Example 10.1.5 (Set Theory). Another important application of Boolean algebra is the set of
subsets and the standard operations on them. Though we did not dwell on the algebraic properties
of the operations on sets in Section 1.1, it is not hard to show that for any set S, the quadruple
(P(S), ∩, ∪, ) is a Boolean algebra in which the subset S serves as the 1 and the empty set ∅ serves
as the 0. 4

Example 10.1.6 (Boolean Functions). Let B be any Boolean algebra and let X be any set.
Consider the set of functions from X to B denoted by Fun(X, B).
Define binary operations ∧, ∨ and the function on Fun(X, B) by
def
(f ∧ g)(x) = f (x) ∧ g(x) for all x ∈ X,
def
(f ∨ g)(x) = f (x) ∨ g(x) for all x ∈ X,
def
f (x) = f (x) for all x ∈ X.

These operations make the quadruple (Fun(X, B), ∧, ∨, ) into another Boolean algebra. 4

Just as in high school algebra and elementary ring theory, a common exercise in Boolean algebras
involves simplifying an expression in a Boolean algebra as much as possible. Some of the section
exercises practice this skill but the reader can find examples of such simplification in the proofs of
various propositions and theorems in this section.

10.1.2 – Homomorphisms, Subalgebras, Direct Sums


Having defined the algebraic structure of a Boolean algebra, it is natural (for an algebraist) to
be aware of or provide definitions for morphisms between Boolean algebras, subalgebras, direct
sums, and other concepts described in the preface to this book. We outline some of these here for
completeness.

Definition 10.1.7
Let (B1 , ∧1 , ∨1 , ) and (B2 , ∧2 , ∨2 , ) be two Boolean algebras. A function ϕ : B1 → B2 is
called a Boolean algebra homomorphism if

ϕ(x ∧1 y) = ϕ(x) ∧2 ϕ(y) for all x, y ∈ B1 ,


ϕ(x ∨1 y) = ϕ(x) ∨2 ϕ(y) for all x, y ∈ B1 ,
ϕ(x) = ϕ(x) for all x ∈ B1 .

A homomorphism ϕ that is also a bijection is an isomorphism of Boolean algebras.


10.1. BOOLEAN ALGEBRAS 469

Example 10.1.8. Let P be the set of logical propositions and let B = {F, T} be the basic Boolean
algebra. The function ϕ : P → B that associates to each proposition in Boolean logic its truth value
is a Boolean algebra homomorphism. 4

Example 10.1.9. Let S be a set and let B = {0, 1} be the basic Boolean algebra. Consider the
function ϕ : Fun(S, B) → P(S) by

ϕ(f ) = {s ∈ S | f (s) = 1}.

The function ϕ is a Boolean algebra isomorphism. We might suspect this by viewing functions in
Fun(S, B) as deciding whether a given element is in a subset or not. We first point out that ϕ is a
bijection with ϕ−1 (A) = χA , where χA is the characteristic function defined by
(
1 if s ∈ A
χA (s) =
0 if s ∈/ A.

(See Exercise 1.1.28.) We show that ϕ is a homomorphism directly. Observe first that in the basic
Boolean algebra, x ∧ y = 1 if and only if x = y = 1. Let f, g ∈ Fun(S, B). Then

ϕ(f ∧ g) = {s ∈ S | f (s) ∧ g(s) = 1}


= {s ∈ S | f (s) = 1 and g(s) = 1}
= {s ∈ S | f (s) = 1} ∩ {s ∈ S | g(s) = 1}
= ϕ(f ) ∩ ϕ(g).

The property that ϕ(f ∨ g) = ϕ(f ) ∪ ϕ(g) follows in an identical manner from the fact that in the
basic Boolean algebra x ∨ y = 1 if and only if x = 1 or y = 1. Finally,

ϕ(f ) = {s ∈ S | f (s) = 1} = {s ∈ S | f (s) = 0} = ϕ(f ).

Since ϕ is a Boolean algebra homomorphism and a bijection, it is an isomorphism. 4

By now, having seen and worked with subobjects in the contexts of groups and rings, the reader
should be able to define the concepts of a Boolean subalgebra and the direct sum of two Boolean
algebras. We provide it for completeness.

Definition 10.1.10
Let (B, ∧, ∨, ) be a Boolean algebra. A Boolean subalgebra of B is a nonempty subset B 0
such that B 0 is closed under ∧, under ∨ and under . In other words, for all x, y ∈ B 0 , the
elements x ∧ y, x ∨ y and x are in B 0 .

Definition 10.1.11
Let (B1 , ∧1 , ∨1 , ) and (B2 , ∧2 , ∨2 , ) be two Boolean algebras. The direct sum of these
Boolean algebras, denoted by B1 ⊕ B2 is the quadruple (B1 × B2 , ∧, ∨, ), where

(x1 , x2 ) ∧ (y1 , y2 ) = (x1 ∧1 y1 , x2 ∧2 y2 ),


(x1 , x2 ) ∨ (y1 , y2 ) = (x1 ∨1 y1 , x2 ∨2 y2 ),
(x1 , x2 ) = (x1 , x2 )

for all x1 , y1 ∈ B1 and all x2 , y2 ∈ B2 .

As with other structures, it is possible to define the n-tuple direct sum of Boolean algebras. Also,
we denote by B n the n-tuple direct sum of a Boolean algebra B with itself.
470 CHAPTER 10. MODULES AND ALGEBRAS

10.1.3 – Boolean Algebras as Boolean Rings


The first axioms of a Boolean algebra posit that 1 is the identity for ∧ and 0 is the identity for
∨. The dominance law imposes that 0 has no inverse with respect to ∧ and 1 has no inverse with
respect to ∨. Hence, neither (B, ∧, ∨) nor (B, ∨, ∧) is a ring. However, Boolean algebras do have a
ring structure.

Theorem 10.1.12
Let (B, ∧, ∨, ) be a Boolean algebra. Define the binary operation + : B × B → B by
def
x + y = (x ∧ y) ∨ (x ∧ y).

Then (B, +, ∧) is a commutative ring with identity 1 6= 0. The additive identity is 0 and
for all x ∈ B is x + x = 0.

Proof. The axioms of a Boolean algebra give us that ∧ is associative, commutative, and has the
identity of 1. We need to prove the other ring axioms.
Let x, y, z ∈ B. To prove commutativity,
y + x = (y ∧ x) ∨ (y ∧ x) = (y ∧ x) ∨ (y ∧ x)
= (x ∧ y) ∨ (x ∧ y) = x + y.
For associativity, we first simplify
     
(x + y) + z = (x ∧ y) ∨ (x ∧ y) ∧ z ∨ (x ∧ y) ∨ (x ∧ y) ∧ z
  
= (x ∧ y ∧ z) ∨ (x ∧ y ∧ z) ∨ (x ∨ y) ∧ (x ∨ y) ∧ z
  
= (x ∧ y ∧ z) ∨ (x ∧ y ∧ z) ∨ (x ∧ x) ∨ (x ∧ y) ∨ (y ∧ x) ∨ (y ∧ y) ∧ z
  
= (x ∧ y ∧ z) ∨ (x ∧ y ∧ z) ∨ 0 ∨ (x ∧ y) ∨ (y ∧ x) ∨ 0 ∧ z
= (x ∧ y ∧ z) ∨ (x ∧ y ∧ z) ∨ (x ∧ y ∧ z) ∨ (x ∧ y ∧ z).
We see that this result is symmetric in x, y and z, so we deduce that (x + y) + z = (y + z) + x =
x + (y + z), by commutativity of +. Hence, + is associative.
It is easy to see that for all x ∈ B,
x + 0 = (x ∧ 1) ∨ (x ∧ 0) = x ∨ 0 = x,
so 0 is the identity for +. Furthermore, + has inverses since for all x ∈ B,
x + x = (x ∧ x) ∨ (x ∧ x) = 0 ∧ 0 = 0.
Finally, we need to show that ∧ is distributive over +. Let x, y, z ∈ B. Then

(x + y) ∧ z = (x ∧ y) ∨ (x ∧ y) ∧ z = (x ∧ y ∧ z) ∨ (x ∧ y ∧ z)
whereas
(x ∧ z) + (y ∧ z) = ((x ∧ z) ∧ (y ∧ z)) ∨ ((x ∧ z) ∧ (y ∧ z))
 
= (x ∧ z) ∧ (y ∨ z) ∨ x ∨ z) ∧ (y ∧ z)
= (x ∧ z ∧ y) ∨ (x ∧ z ∧ z) ∨ (x ∧ y ∧ z) ∨ (z ∧ y ∧ z)
= (x ∧ z ∧ y) ∨ (x ∧ 0) ∨ (x ∧ y ∧ z) ∨ (y ∧ 0)
= (x ∧ z ∧ y) ∨ 0 ∨ (x ∧ y ∧ z) ∨ 0
= (x ∧ y ∧ z) ∨ (x ∧ y ∧ z).
This shows right-distributivity. Since, ∧ is commutative, left-distributivity also holds. 
10.1. BOOLEAN ALGEBRAS 471

Exercise 5.1.35 presented the concept of a Boolean ring as a ring R in which r2 = r for all r ∈ R.
In that exercise, one is asked to show that r + r = 0 for all r ∈ R and that R is commutative.
Consequently, the ring (B, +, ∧) in Theorem 10.1.12 is a Boolean ring. A converse is also true.

Theorem 10.1.13
Let (B, +, ∧) be a Boolean ring. Then defining the operations,

x ∨ y = (x ∧ y) + (x + y) and x = 1 + x,

makes (B, ∧, ∨, ) into a Boolean algebra.

Proof. (Left as an exercise for the reader. See Exercise 10.1.14.) 

It is not uncommon to use abstract ring notation and not to write the symbol ∧ so that the
expression xy means x ∧ y, or xyz means x ∧ y ∧ z. In this notation, the operator ∧ takes precedence
over + so that xy ∨ z stands for (x ∧ y) ∨ z. Some authors carry this habit over to Boolean algebras
so that the expression xyz ∨ xyz stands for

(x ∧ y ∧ z) ∨ (x ∧ y ∧ z).

The expression xyz ∨ xyz is sometimes called the multiplicative notation.

10.1.4 – Boolean Functions and Logic Gates


We end this section with a cursory application to Boolean functions and logic gates.
Let B = {0, 1} be the basic Boolean algebra. Define Fn = F(B n , B), namely the set of all
functions f : B n → B. In other words, any function f ∈ Fn has n input Boolean variables and one
output Boolean variable.
n
The size of Fn grows quickly with n. Since |B n | = 2n , then |Fn | = 22 .

Example 10.1.14. For example, as elements of F3 , we could have

f1 (x, y, z) = 1 or f2 (x, y, z) = x ∧ (y ∨ z) or f3 (x, y, z) = (x ∨ y) ∧ (x ∨ y ∨ z).

As practice with Boolean algebra, consider the function

(f2 ∧ f3 )(x, y, z) = (x ∧ (y ∨ z)) ∧ ((x̄ ∨ y) ∧ (x ∨ ȳ ∨ z).

We proceed to simplify this expression, using multiplicative notation for brevity:

(f2 ∧ f3 )(x, y, z) = (x(y ∨ z))(x̄ ∨ y)(x ∨ ȳ ∨ z)


= (xy ∨ xz)(x̄x ∨ x̄ȳ ∨ x̄z ∨ xy ∨ ȳy ∨ yz)
= (xy ∨ xz)(0 ∨ x̄ȳ ∨ x̄z ∨ xy ∨ 0 ∨ yz)
= (xy ∨ xz)(x̄ȳ ∨ x̄z ∨ xy ∨ yz)
= xyx̄ȳ ∨ xyx̄z ∨ xyxy ∨ xyyz ∨ xz x̄ȳ ∨ xz x̄z ∨ xzxy ∨ xzyz
= 0 ∨ 0 ∨ xy ∨ xyz ∨ 0 ∨ xyz ∨ xyz
= xy ∨ xyz = xy1 ∨ xyz
= xy(1 ∨ z) = xy1
= xy. 4

It is an important property of multivariable Boolean functions that every element in Fn can


be expressed using the operations ∧, ∨, and of the basic Boolean algebra applied to the n input
variables of the function. See Exercise 10.1.16.
472 CHAPTER 10. MODULES AND ALGEBRAS

The Boolean algebra Fn is particularly important in digital electronics. In digital electronics, an


electric circuit has wires and circuit elements. When current is flowing along a wire, we think of it
as a Boolean 1 and when there is no current flowing we think of it as a Boolean 0. (In practice, there
might always be a current on the wires but a wire will be considered as 0 if the current is below a
fixed value and as 1 if the current is above that same fixed value.)
A logic gate is a circuit element that takes any number of input currents, say n, and outputs a
current intensity (possibly 0) that behaves as a function in Fn depending on the input currents.
Some common logic gates are: (1) the negation NOT gate, (2) an AND gate with any number
of inputs, or (3) an OR gate with any number of inputs. Since every Boolean function can by
created by a combination of ∧, ∨, and , these three logic gates are sufficient to design an electrical
circuit that implements any Boolean function. Consequently, at the lowest level of functionality, the
Boolean algebra Fn underlies all computing and digital electronics.
The electronic circuit diagram for the AND gate with two inputs and also with three inputs is

x x
y xy y xyz
z

The electronic circuit diagram for the OR gate with two inputs and also with three inputs is

x x
y x∨y y x∨y∨z
z

The electronic circuit diagram for the NOT gate (with a single input) is

x x

Example 10.1.15. Typically, to represent a Boolean function with logic gates, we draw the inputs
to the left with lines emanating from them representing wires from the inputs. The wires may split
in the circuit that is necessary. For example, a logic gate diagram for the function F (x, y) = xȳ ∨ x̄y
is the following.

F (x, y)

The bump on one wire is used to indicate that the two crossing wires do not intersect. 4

Exercises for Section 10.1


1. Use rules of Boolean algebra to simplify the following expression as much as possible:

(x ∧ y) ∨ (x ∧ (y ∨ x)).

2. Use rules of Boolean algebra to simplify the following expression as much as possible:

((a ∨ b) ∧ (a ∨ b)) ∨ ((a ∨ b) ∧ b).

3. Prove that x̄y ∨ ȳz ∨ z̄x = ȳx ∨ z̄y ∨ x̄z for all x, y, z in a Boolean algebra B.
10.1. BOOLEAN ALGEBRAS 473

4. Consider the expression in a Boolean algebra (x → y) → ¬(y → z), where we are borrowing symbols
from propositional (Boolean) logic.
(a) Rewrite only using ∧, ∨, and x for negation.
(b) If we regard this expression as a Boolean function F (x, y, z), evaluate F (1, 0, 0).
5. Prove that the Boolean ring associated to the basic Boolean algebra B = {0, 1} is the ring (Z/2Z, +, ×).
6. On the interval of real numbers [0, 1], define the binary operators x ∧ y = min(x, y) and x ∨ y =
max(x, y), and the function x = 1 − x, using 0 as 0 and 1 as the 1. Prove that ([0, 1], ∧, ∨, ) satisfies
all the axioms of a Boolean algebra except the complement laws. Prove also that ([0, 1], ∧, ∨, ) satisfies
all the supplementary laws in Proposition 10.1.2.
7. Use the axioms of a Boolean algebra to prove that the following propositional forms from Boolean
logic are tautologies.
(a) p → (p ∨ q)
(b) ((p → q) ∧ ¬q) → ¬p
(c) ((p → q) ∧ (q → r)) → (p → r)
8. Prove that for all x, y in a Boolean algebra B, x ∧ y = y if and only if x ∨ y = x.
9. Let (B, ∧, ∨, ) be a Boolean algebra. Define the relation 4 on B by x 4 y when x ∧ y = x.
(a) Prove that 4 defines a partial order on B.
(b) Show this partial order 4 on the Boolean algebra P(S), where S is a set, is precisely the con-
tainment partial order ⊆.
10. Let S be a set and let A be a fixed subset of S. Consider the Boolean algebras of P(S) and P(A). Prove
that the function ϕ : P(S) → P(A) defined by ϕ(X) = X ∩ A is a Boolean algebra homomorphism.
[Hint: Distinguish the complement in S from the complement in A.]
11. Let (B, ∧, ∨, ) be a Boolean algebra.
(a) Show that (B, ∨, ∧, ) is also a Boolean algebra.
(b) Prove that the function ϕ : B → B defined by ϕ(x) = x is an isomorphism from (B, ∧, ∨, ) to
(B, ∨, ∧, ).
12. Let f : S → T be a function between sets. Prove that the function ϕ : P(T ) → P(S) defined by
ϕ(X) = f −1 (X) is a Boolean algebra homomorphism.
13. Let (B1 , ∧1 , ∨1 , ) and (B2 , ∧2 , ∨2 , ) be two Boolean algebras and let ϕ : B1 → B2 be a Boolean
algebra homomorphism. Define the kernel and image of ϕ

Ker ϕ = {x ∈ B1 | ϕ(x) = 0} and Im ϕ = {y ∈ B2 | ∃x ∈ B1 , y = ϕ(x).

(a) Prove that Ker ϕ is not necessarily a Boolean subalgebra of B1 .


(b) Prove that Im ϕ is a Boolean subalgebra of B2 .
14. Prove Theorem 10.1.13.
15. Let f ∈ Fn . Prove that f (x1 , x2 , . . . , xn ) = (f (1, x2 , . . . , xn ) ∧ x1 ) ∨ (f (0, x2 , . . . , xn ) ∧ x1 ) for all
(x1 , x2 , . . . , xn ) ∈ {0, 1}n .
16. Disjunctive Normal Form. Use the previous exercise to prove that every function f ∈ Fn can be
written in the form
f (x1 , x2 , . . . , xn ) = p1 ∨ p2 ∨ · · · ∨ pr
where each pi is a function involving only ∧ (multiplication) of the inputs xi or the negations xi .
17. Find the disjunctive normal form (see Exercise 10.1.16) of the function f ∈ F4 that returns:
(a) 1 exactly when exactly one of the inputs xi = 1;
(b) 1 exactly when x1 or x2 are 1 and when x3 or x4 are 1.
18. Let B be a Boolean algebra. Define the operator x ↓ y = x ∨ y. This is also called the NOR operator
since it takes the OR and then negates it.
(a) Show that x can be obtained using just the ↓ operator (and perhaps combinations thereof).
(b) Show that x ∧ y can be obtained using just the ↓ operator (and perhaps combinations thereof).
(c) Show that x ∨ y can be obtained using just the ↓ operator (and perhaps combinations thereof).
(d) Deduce that every operation in a Boolean algebra can be deduced by a single operation.
474 CHAPTER 10. MODULES AND ALGEBRAS

19. Draw the logic gate diagrams corresponding to each of the following Boolean functions.
(a) x̄ ∨ y
(b) (x ∨ y) ∧ x
20. Draw the logic gate diagrams corresponding to each of the following Boolean functions.
(a) xyz ∨ x̄ȳz̄
(b) (x̄ ∨ z)(y ∨ z̄)

10.2
Vector Spaces
Many students of natural sciences and even some in social sciences must study linear algebra. Indeed,
any problem that involves some form of analysis of functions involving more than one variable—
whether they be prices, quantities of commodities, coordinates, populations of various species, or
any type of data—requires linear algebra. Just like any branch of mathematics that has many
applications in science and mathematics, linear algebra encompasses a variety of standard topics,
theorems, and algorithms that are specific to it. However, at the heart of linear algebra lies the
algebraic structure of vector spaces.
This textbook assumes linear algebra as a prerequisite. Vector spaces over R, should be familiar
to the reader. Consequently, this section neither serves as an introduction to vector spaces nor
attempts to mention all the interesting applications. Instead, as another motivation for modules,
this section serves as a summary and reminder of various definitions and theorems of vector spaces,
with an emphasis on a structuralist view, following the Outline provided in the preface.

10.2.1 – Definition, Motivation, and Elementary Examples

Definition 10.2.1
Let F be a field. A vector space over F (or F -vector space) is a nonempty set V equipped
with a binary operation + and a function F × V → V , called scalar multiplication, denoted
(c, v) 7→ cv, such that (V, +) is an abelian group with identity 0 and such that the scalar
multiplication satisfies the following identities:

(1) ∀c ∈ F, ∀v, w ∈ V, c(v + w) = cv + cw;


(2) ∀c, d ∈ F, ∀v ∈ V, (c + d)v = cv + dv;
(3) ∀c, d ∈ F, ∀v ∈ V, c(dv) = (cd)v;
(4) ∀v ∈ V, 1v = v.

Elements of V are generically called vectors and the elements of F are called the scalars.

It is valuable to compare the axioms of this definition to those of a group action. The scalar
multiplication means that each scalar in F acts on the vectors (elements of the abelian group V )
as functions V → V . However, axiom (1) states that each scalar acts not just as a function but as
a group homomorphism on (V, +). Axiom (2) requires the group (F, +) behaves like a subgroup of
(Fun(V, V ), +). Finally, axioms (3) and (4) precisely require that scalar multiplication is a group
action of (U (F ), ×) on V .
From a historical perspective it might have taken time for the axioms of a vector space to emerge
in this way. In retrospect they should seem natural. Vector spaces over a fixed field F are an action
structure in which the field F acts on the structure of an abelian group. As with group actions,
vector spaces over different fields are different algebraic structures.
10.2. VECTOR SPACES 475

Recall that the additive identity in V is denoted by 0. Also recall that one of the first propositions
in vector spaces shows that 0v = 0 for all vectors v ∈ V (where the 0 on the left is the additive
identity in (F, +) and the 0 on the right is the additive identity in V ). A corollary thereof is that
(−1)v = −v, where −1 ∈ F and −v is the additive inverse of the vector v.
Large portions of linear algebra courses usually serve as motivation for defining abstract vector
spaces. The ability to solve a system of linear equations (along with all the usual applications to
natural and social sciences), applications to transformation geometry, computer graphics, and many
others motivate the study of vectors and matrices. Introductions to real vector spaces show how not
only Rn satisfies the axioms of a vector space but so do polynomials of degree n or less with real
coefficients; all real polynomials, R[x]; n × m matrices; the set of complex numbers C; the set of real
functions Fun(R, R); the set C 0 ([a, b], R) of continuous real-valued functions over the interval [a, b];
the set `∞ (R) of real valued sequences; and many more.

Example 10.2.2 (Trivial Vector Space). The simplest vector space consists of the abelian group
V of a single element V = {0}. All the axioms of a vector space are trivially verified. 4

Example 10.2.3. For any vector space over a field F , the first example is that V = F is itself
a vector space. Note that (F, +) is an abelian group. The scalar multiplication is simply the
usual multiplication in the field F . Axioms (1) and (2) correspond to left- and right-distributivity,
respectively. Axiom (3) is associativity of multiplication in F and axiom (4) is the identity axiom in
(F, ×). For the purposes of showing that F is itself an F -vector space, that the multiplication has
inverses is irrelevant. 4

The following proposition leads to many more examples of vector spaces.

Proposition 10.2.4
Let F be a field and let V and W be vector spaces. The Cartesian product V × W equipped
with addition and scalar multiplication
def def
(v1 , w1 ) + (v2 , w2 ) = (v1 + v2 , w1 + w2 ) and c(v, w) = (cv, cw)

is a vector space.

Proof. (Left as an exercise for the reader. See Exercise 10.2.4.) 

Definition 10.2.5
The vector space defined in Proposition 10.2.4 is called the direct sum of V and W and is
denoted V ⊕ W .

It is equally possible to define the direct sum of n different vector spaces. The direct sum of a
vector space V with itself n times is denoted by V n . In particular, for all positive integers n, the
Cartesian product F n has the structure of a vector space over F in which the addition of n-tuples
and scalar multiplication on n-tuples is performed component-wise.
As with groups, the concept of direct sum generalizes to an arbitrary collection of vector spaces.

Definition 10.2.6
Let V = {Vi }i∈I be a collection of vector spaces over the field F , indexed by the set I.
The collection of all choice functions from I into the collection {Vi }i∈I is called the direct
product of this collection and is denoted
Y
Vi .
i∈I
476 CHAPTER 10. MODULES AND ALGEBRAS

We remind the reader that the existence of choice functions is the content of the Axiom of Choice.
They are not functionsQin the usual sense, i.e., from a given set to another given set. It is convenient
to denote elements in i∈I Vi with letters such as f or g, to designate choice functions.

Definition 10.2.7
The direct sum of the collection {Vi }i∈I is the subset of the direct product consisting of
choice functions that map to the 0 element in Vi for all but a finite number of indices i ∈ I.
The direct sum is denoted M
Vi .
i∈I

If I is finite with I = {1, 2, . . . , n}, then the direct product and the direct sum are both V1 ⊕
V2 ⊕ · · · ⊕ Vn . If I is countable, then we can think of the direct product as sequences of elements,
with each one taken from a different Vi . However, if I is uncountable, the intuition of sequences no
longer applies.

Proposition 10.2.8
Let V = {Vi }i∈I be a collection of vector spaces over the field F . The direct product
Q and
the direct sum of V are vector spaces under the operations f + g and cf for f, g ∈ i∈I Vi
and c ∈ F defined by
def def
(f + g)(i) = f (i) + g(i) and (cf )(i) = c(f (i)) for all i ∈ I.

Q
Proof. Proposition 9.3.2 tells us that i∈I Vi is an abelian group with respect to the addition of
choice functions. To prove axiom (1) in Definition 10.2.1, let c ∈ F and let f, g be two choice
functions. For all i ∈ I, the choice function c(f + g) satisfies

(c(f + g))(i) = c ((f + g)(i)) = c(f (i) + g(i)) = c(f (i)) + c(g(i)) = (cf )(i) + (cg)(i).

Hence, c(f + g) = (cf ) + (cg) and (1) holds. The proofs of the remaining axioms are similar. 

10.2.2 – Vector Subspaces


A subobject of an F -vector space V is what we called in linear algebra a vector subspace, a nonempty
subgroup W that is closed under addition and scalar multiplication. Note that since W is closed
under scalar multiplication, for all w ∈ W , we have −w = (−1)w ∈ W so W is closed under additive
inverses, ensuring that (W, +) is a subgroup of (V, +).
In group theory and ring theory, we encountered the idea of using a generating set to describe a
subgroup or ideal. The concept arises in vector spaces though with different terms.

Definition 10.2.9
Let V be a vector space over a field F and let S be a subset of V . A linear combination of
elements in S is any vector of the form

c1 v1 + c2 v2 + · · · + cr vr

where ci ∈ F and vi ∈ S for i = 1, 2, . . . , r and where r is a positive integer. The set of all
linear combinations of elements in S is called the span of S and is denoted Span(S).

It is important to note that linear combinations only involve a finite number of vectors from S,
even if S is an infinite set. This requirement avoids concerns over convergence.
10.2. VECTOR SPACES 477

Proposition 10.2.10
Let S be any subset of a vector space V over a field F . The span Span(S) is a vector
subspace of V .

Proof. Let v, w ∈ Span(S). Then there exist scalars c1 , c2 , . . . , cr , d1 , d2 , . . . , ds ∈ F and vectors


v1 , v2 , . . . , vr , w1 , w2 , . . . , ws ∈ S such that

v = c1 v1 + c2 v2 + · · · + cr vr and w = d1 w1 + d2 w2 + · · · + ds ws .

Then
v + w = c1 v1 + c2 v2 + · · · + cr vr + d1 w1 + d2 w2 + · · · + ds ws ,
which is again a linear combination of elements in S. Also, if λ ∈ F , then

λv = λ(c1 v1 + c2 v2 + · · · + cr vr ) = λc1 v1 + λc2 v2 + · · · + λcr vr ,

which is a linear combination of elements in S because λci ∈ F for all i = 1, 2, . . . , r. 

10.2.3 – Linear Transformations


The morphisms in the algebraic structure of vector spaces over F are precisely linear transformations.

Definition 10.2.11
Let V and W be vector spaces over a field F . A function T : V → W is called a linear
transformation from V to W if
(1) T (v1 + v2 ) = T (v1 ) + T (v2 ) for all v2 , v2 ∈ V ;
(2) T (cv) = cT (v) for all v ∈ V and all c ∈ F .

Definition 10.2.12
If a linear transformation T : V → W is also a bijective function, then T is called an
isomorphism. If an isomorphism exists between two vector spaces V and W , then we say
they are isomorphic and we write V ∼
= W.

Some linear algebra books define an isomorphism between vector spaces V and W as a bijection
T : V → W such that T and T −1 are both linear transformations. This is indeed correct. In fact,
an equivalent definition is valid for groups and rings as well. However, as with groups and rings, it
is easy to show that if T : V → W is a bijection and a linear transformation, then T −1 : W → V is
also a linear transformation. Consequently, Definition 10.2.12 suffices.

10.2.4 – Subclasses
The Outline given in the preface mentions subclasses, generally described as interesting objects
obtained by adding additional structure. For the algebraic structure of vector spaces, there are
many interesting and useful subclasses: vector spaces with bilinear forms, topological vector spaces,
Hilbert spaces, Banach spaces, and Lie algebras just to name a few. Many of these subclasses become
new algebraic structures of their own and deserve full books for a proper introduction.

10.2.5 – General Properties: Basis and Dimension


There are many general properties of vector spaces but we focus on two in particular: basis and
dimension.
478 CHAPTER 10. MODULES AND ALGEBRAS

Definition 10.2.13
Let V be a vector space over a field F . A nonempty subset S ⊆ V is called linearly
independent if for any finite subset {v1 , v2 , . . . , vr } ⊆ S

c1 v1 + c2 v2 + · · · + cr vr = 0 implies that ci = 0 for all i.

Otherwise, S is called linearly dependent.

This is the most general definition of linear independence and it applies to subsets S that are
not necessarily finite. Note that again, every linear combination is always assumed to be a finite.

Proposition 10.2.14
Let V be a vector space over a field F and let S be a linearly independent set. Then for
any strict subset S 0 ( S the span Span(S 0 ) is a strict subspace of Span(S).

Proof. Let v ∈ S − S 0 . Assume that v ∈ Span(S 0 ). Then

v = c1 v1 + c2 v2 + · · · + cr vr

for some ci ∈ F and some vi ∈ S 0 . But then

v − c1 v1 + c2 v2 + · · · + cr vr = 0

with v, v1 , v2 , . . . , vr ∈ S. Since the coefficient in front of v is nonzero, this contradicts the assumption
that S is linearly independent. By contradiction, v ∈ / Span(S 0 ). 

Definition 10.2.15
Let V be a vector space over a field F . A subset B of V is called a basis of V if B is linearly
independent and if Span(B) = V .

The condition that Span(B) = V means that V is generated by the elements of B. In contrast
by Proposition 10.2.14, the requirement that B be linearly independent means that no strict subset
of B generates all of V .
There are two crucial but sticky points concerning bases that, in introductory courses on linear
algebra, are treated cursorily. This is because to prove them in their full generality requires Zorn’s
Lemma, which is equivalent to the Axiom of Choice. Proofs for the finite-dimensional cases are
easier. These crucial properties are the content of the next two theorems.

Theorem 10.2.16
Let V be a nontrivial vector space over a field F . Let Γ ⊆ V with V = Span(Γ) and let
S ⊆ Γ be a linearly independent set. Then there exists a basis B of V such that S ⊆ B ⊆ Γ.

Proof. Let I be the set of subsets A such that S ⊆ A ⊆ Γ and A is linearly independent. The
collection I is not empty since it contains S. Subset containment ⊆ is a partial order on I. Let
{Aj }j∈J be a chain (a totally ordered subset of I).
We claim that [
A= Aj
j∈J

is linearly independent. Let {v1 , v2 , . . . , vr } be a finite set of vectors in A. For each i = 1, 2, . . . , r,


there exists a ji ∈ J such that vi ∈ Aji . Since {Aj } is a chain, then

Aj1 ∪ Aj2 ∪ · · · ∪ Ajr ⊆ Ak


10.2. VECTOR SPACES 479

where k = max(j1 , j2 , . . . , jr ). Hence, v1 , v2 , . . . , vr ∈ Ak . Since Ak is a linearly independent set,


we know that c1 v1 + c2 v2 + · · · + cr vr = 0 implies that c1 = c2 = · · · = cr = 0. Hence, A is linearly
independent. It is also obvious that A contains S. Thus, A ∈ I and contains all Aj so A is an upper
bound. By Zorn’s Lemma, since every chain in I has an upper bound in I, there exists a maximal
element B in I.
We claim that B is a basis of V . As an element of I, the set B is linearly independent. Now
assume that W = Span(B) 6= V . Then there exists some vector v ∈ Γ such that v ∈ / W . Call
B 0 = B ∪ {v}. Let {v1 , v2 , . . . , vr } be a finite subset of B 0 . Suppose first that none of the vi = v.
Then {v1 , v2 , . . . , vr } ⊆ B and c1 v1 + c2 v2 + · · · + cr vr = 0 implies that c1 = c2 = · · · = cr = 0. Now
suppose that one of the vi = v, say v1 = v without loss of generality. Assume that

c1 v1 + c2 v2 + · · · + cr vr = 0

with c1 6= 0. Then
v = −(c−1 −1
1 c2 v2 + · · · + c1 cr vr ),

which contradicts the assumption that v ∈ / Span(B). Hence, c1 v1 + c2 v2 + · · · + cr vr = 0 implies


that c1 = 0. But then since all v2 , . . . , vr ∈ B and B is linearly independent, we also know that
c2 = · · · = cr = 0. Then B 0 is linearly independent, contains S and is a subset of Γ and is strictly
larger than B. This contradicts the maximality of B in I. Consequently, the assumption that
Span(B) 6= V is false and we conclude that B also spans V and is therefore a basis of V . 

Corollary 10.2.17
Every nontrivial vector space V over a field F has a basis.

Proof. Let v be a nonzero vector in V . The corollary follows from Theorem 10.2.16 with S = {v}
and Γ = V . 

It is important to underscore that Theorem 10.2.16 establishes two useful results: (1) it is always
possible to complete a set of linearly independent vectors S in V to a basis of B; (2) it is always
possible to select a basis from any generating set Γ. Both of these arise often in proofs.

Proposition 10.2.18
Let V be a vector space over a field F and let B be a basis of V . Every element v ∈ V can
be written in a unique way as a linear combination of elements in B.

Proof. That v ∈ V is a linear combination of elements in B follows simply from the fact that B spans
V . Now suppose that

v = c1 u1 + c2 u2 + · · · + cr ur , (10.1)
v= d1 u01 + d2 u02 + ··· + ds u0s (10.2)

are two linear combinations of v with u1 , u2 , . . . , ur , u01 , u02 , . . . , u0s ∈ B. The sets {u1 , u2 , . . . , ur } and
{u01 , u02 , . . . , u0s } may or may not be disjoint so write

{u1 , u2 , . . . , ur } ∪ {u01 , u02 , . . . , u0s } = {w1 , w2 , . . . , wt }

with all the wi distinct. Note that t ≤ r + s so t is finite. Let us now rewrite the linear combinations
(10.1) and (10.2) as

v = c1 w1 + c2 w2 + · · · + ct wt ,
v = d1 w1 + d2 w2 + · · · + dt wt
480 CHAPTER 10. MODULES AND ALGEBRAS

allowing for some of the ci and some di to be 0 so that these expressions are identical to those in
(10.1) and (10.2).
Then

0 = c1 w1 + c2 w2 + · · · + ct wt − d1 w1 − d2 w2 − · · · − dt wt
= (c1 − d1 )w1 + (c2 − d2 )w2 + · · · + (ct − dt )wt .

Since {w1 , w2 , . . . , wt } ⊆ B, it is a linearly independent set, so ci − di = 0 for all i = 1, 2, . . . , t. The


uniqueness of the linear combination expressing v follows. 

Anyone who has studied linear algebra knows that bases are not at all unique for a given vector
space. The following theorem establishes an important invariant property about bases.

Theorem 10.2.19
Let V be a vector space over a field F and let B and B 0 be two bases of V . Then B and B 0
have the same cardinality.

Proof. The case when one of the bases is finite is proved in most introductory linear algebra courses.
(A more general proof for free modules is also given in Theorem 10.5.11.) Consequently, we prove
the situation when B and B 0 are both infinite.
Since B 0 is a basis, for each v ∈ B,

v = c1 w1 + c2 w2 + · · · + cr wr

for some positive integer r, for some ci ∈ F − {0}, and some wi ∈ B 0 . Furthermore, by Proposi-
tion 10.2.18, this expression is unique. As v varies through B, each vector w ∈ B 0 appears as one of
the linear combinations associated to elements in B. Otherwise, B 0 − {w} would span all of V and
by Proposition 10.2.14, B 0 would be linearly dependent.
Let f : B 0 → B be a function that satisfies f (w) = v, where v is one of the vectors in B for which
w is used to represent it. If v ∈ f (B 0 ), then since the expression of v as a linear combination of
elements in B 0 is unique and finite, the fiber f −1 (v) is finite. Note that
[
B0 = f −1 (v),
v∈f (B0 )

so since each fiber is finite, f (B 0 ) must be infinite since B 0 is infinite. Moreover f (B 0 ) and B 0 have
the same cardinality. Consequently, since f (B 0 ) ⊆ B,

|B 0 | = |f (B 0 )| ≤ |B|.

Interchanging the roles of B and B 0 , we also deduce that |B| ≤ |B 0 |. By the Schröder-Bernstein
Theorem, |B 0 | = |B|. 

Theorem 10.2.19 leads to the notion of dimension.

Definition 10.2.20
Let V be a vector space over a field F . If B is a basis of V , then the dimension of V ,
denoted by dimF V is the cardinality |B|.

If a vector space V over a field F has a finite basis of n vectors, then V is said to be finite
dimensional with dimension n. We denote this briefly by dimF V = n. If the field F is understood
by context, then this is abbreviated to dim V . If V does not have a finite basis, we say that V is
infinite dimensional. Though we do not discuss the notion of a basis for the trivial vector space
V = {0}, for completeness, we say that dim{0} = 0.
10.2. VECTOR SPACES 481

Example 10.2.21. Let F be a field and let V = F n . The standard basis of V consists of vectors
{e1 , e2 , . . . , en }, where ei is the n-tuple in V that is 0 in all components except for 1 in the ith
component. Obviously, dim F n = n. 4

Example 10.2.22. Recall that the set Mm×n (R) of m×n matrices with entries in R is a real vector
space and the standard basis consists of the matrices Eij with 1 ≤ i ≤ m and 1 ≤ j ≤ n defined as 0
in all entries except for a 1 in the (i, j)th entry. Since a basis of Mm×n (R) consists of mn matrices,
dim Mm×n (R) = mn. 4

Example 10.2.23. Let F be a field. The polynomial ring F [x] is a vector space with basis B =
{1, x, x2 , x3 , . . .}. One of the key points in this example is that every polynomial has a term of
highest degree. Therefore, every polynomial consists of a finite number of terms. Similarly, elements
in Span(B) consist of finite linear combinations of the xi . Thus, B spans F [x]. The vector space
F [x] is infinite dimensional but we have presented a basis that is countable. By Theorem 10.2.19,
every basis of F [x] is countable and we can write dimF F [x] = |N|. 4

Example 10.2.24. Contrast the previous example with V = `∞ (R), the vector space of real se-
quences. For each i, consider the sequences ei that consists of 1 in the ith output and 0 for all other
outputs. The set {ei | i ∈ N} is not a basis of `∞ (R) because not every sequence in R is a (finite)
linear combination of the ei sequences. Indeed, any finite linear combination of the ei sequences
would be 0 for all but a finite number of terms in the sequence. The vector space V is also finite
dimensional but we have not presented a basis. 4

10.2.6 – Description: Coordinates


The Outline mentions the topic of how to conveniently describe objects in an algebraic structure.
With vector spaces, this turns out to be simple (relative to other structures) and amenable to
computations.
The reader may have noticed (perhaps with some dismay) that heretofore, we never referred to
coordinates or columns vectors. This is as it should be. Elements of a vector space exist without any
reference to coordinates. Coordinates of vectors only have meaning in reference to an ordered basis.
Some vector spaces do have natural standard bases (see Examples 10.2.21 and 10.2.22) but these
should never be taken as unique or special in some way. Though it is possible to discuss coordinates
for infinite-dimensional vector spaces, we focus on finite-dimensional vector spaces. The key theorem
behind coordinates is Proposition 10.2.18.

Definition 10.2.25
Let V be a vector space over a field F with dim V = n. Let B = (u1 , u2 , . . . , un ) be an
ordered basis of V . (By ordered basis we mean an n-tuple of distinct vectors, the set of
whose entries forms a basis of V .) The coordinates of a vector v ∈ V with respect to B
consist of the unique ci ∈ F such that

v = c1 u1 + c2 u2 + · · · + cn un .

We denote this relationship as  


c1
 c2 
[v]B =  .  .
 
 .. 
cn

The concept of coordinates defines an isomorphism T : V → F n via T (v) = [v]B . This implies
the important classification theorem about vector spaces that if dim V = dim W , then V ∼ = W.
Consequently, unlike groups or rings, there is a simple property that classifies all vector spaces: the
dimension.
482 CHAPTER 10. MODULES AND ALGEBRAS

Coordinates also lead to the concept of matrices. Most linear algebra courses introduce matrices
as rectangular arrays of numbers and first define multiplication of matrices in a manner than seems
somewhat arbitrary. However, the purpose of defining matrix multiplication as it is done comes from
the theorem that every linear transformation T : F m → F n can be expressed as
 
v1
 v2 
T (v) = A  . 
 
 .. 
vm

for some n × m matrix A and where we have expressed a vector v ∈ F m as the m-tuple v =
(v1 , v2 , . . . , vm ). This theorem, along with the isomorphism between a vector space and its coordi-
nates with respect to a basis, lead to the following useful notion.

Definition 10.2.26
Let V and W be vector spaces over a field F and let T : V → W be a linear transformation.
Suppose that dim V = m and that dim W = n. Let B be a basis of V and let B 0 be a basis
of W . The (B, B 0 )-matrix of T , is the unique n × m matrix [T ]B
B0 such that for all v ∈ V ,
   B  
T (v) B0 = T B0 v B .

This matrix representing T with respect to the bases makes it possible to perform all calculations
pertaining to linear transformations between finite-dimensional vector spaces using matrix algebra
and operations.

10.2.7 – Quotient Vector Spaces


Following the Outline, we would like to consider quotient vector spaces. We saw the value of
studying quotient groups and quotient rings, however, most introductory courses on linear algebra
do not introduce this quotient vector spaces. These are valuable but for the sake of conciseness, we
delay the discussion of quotient spaces until Section 10.4.4, when it will be subsumed by a treatment
of quotient modules.

10.2.8 – Hom Spaces


We end this section with a few useful constructions on vector spaces.
Let V and W be vector spaces over a field F . Denote by HomF (V, W ) the set of all linear trans-
formations from V to W . Define the addition of two linear transformations T1 , T2 ∈ HomF (V, W )
and scalar multiplication by c ∈ F on a linear transformation T by
def def
(T1 + T2 )(v) = T1 (v) + T2 (v) and (cT )(v) = c(T (v)). (10.3)

If the field F is understood by context or remains unchanged through a discussion, we sometimes


abbreviate HomF (V, W ) to Hom(V, W ).

Proposition 10.2.27
The set of linear transformations HomF (V, W ), equipped with the operations as defined
in (10.3), is vector space over F . Furthermore, if dim V = m and dim W = n, then
dim Hom(V, W ) = mn.

Proof. We must first show that the addition and scalar multiplication defined in (10.3) are actually a
binary operation and a scalar multiplication, in other words that (T1 + T2 ) is a linear transformation
10.2. VECTOR SPACES 483

and that (cT ) is also a linear transformation. For all v1 , v2 ∈ V ,

(T1 + T2 )(v1 + v2 ) = T1 (v1 + v2 ) + T2 (v1 + v2 ) by (10.3)


= T1 (v1 ) + T1 (v2 ) + T2 (v1 ) + T2 (v2 ) T1 and T2 are linear
= T1 (v1 ) + T2 (v1 ) + T1 (v2 ) + T2 (v2 )
= (T1 + T2 )(v1 ) + (T1 + T2 )(v2 ).

Hence, T1 + T2 ∈ Hom(V, W ). The proof for cT is similar.


That (Hom(V, W ), +) is an abelian group follows from (10.3) and the fact that W is an abelian
group. The remaining four axioms of a vector space also follow from W being a vector space and
the definition of scalar multiplication in (10.3). 

Because of Proposition 10.2.27, we often refer to Hom(V, W ) briefly as the hom-space between
V and W (where “hom” is reminiscent of homomorphism).
There are two particularly important instances of hom-spaces: the endomorphism ring of a vector
space and the dual of a vector space.
The set of linear transformations of V into itself (an endomorphism on V ) is denoted by End(V ).
This vector space has the additional structure of being a ring with composition as the multiplication.
(See Exercise 10.2.13.)
The group of units of End(V ), consists of invertible linear transformations of V into itself and
is called the general linear group of V , denoted GL(V ). Previously, we denoted by GLn (F ) the
group of n × n invertible matrices with entries in the field F . We see that GLn (F ) ∼ = GL(F n ).
The notation GL(V ) is more general than our previous notation as the data concerning dimension
and field is contained in the properties and data of the vector space V . However, GL(V ) refers to
invertible linear transformations without immediate reference to a given basis. In particular, any
isomorphism GLn (F ) ∼ = GL(F n ) is determined in reference to some basis on F n .

10.2.9 – The Dual of a Vector Space

Definition 10.2.28
Let V be a vector space over a field F . The dual of V , denoted V ∗ , is the vector space of
homomorphisms from V to F , i.e., V ∗ = Hom(V, F ).

Example 10.2.29. Let V = C 0 ([a, b], R) be the vector space over R of continuous real-valued
functions over the interval [a, b]. For some c ∈ [a, b], the evaluation function evc : V → R defined
by evc (f ) = f (c) is an element of the dual. Also, by linearity properties of integral, the function
F : V → R defined by
Z b
F (f ) = f (t) dt
a
is a linear transformation and hence is an element of V ∗ . 4

Suppose that V is finite dimensional with ordered basis B = (e1 , e2 , . . . , en ). Then by considering
coordinates of vectors in V with respect to B defines an isomorphism between V and F n . As usual,
we assume we write the coordinates of vectors as column vectors. We know from linear algebra that
any linear transformation F n → F is represented by a a × n matrix, which corresponds to a row
vector. So row vectors have an interpretation as representing elements in V ∗ (with respect to the
ordered basis B). The elements of V are technically vectors since V ∗ is a vector space in its own
right, we often call an element of V ∗ a covector of V .
Associated to an ordered basis B, there is a standard ordered basis B ∗ = (e∗1 , e∗2 , . . . , e∗n ) of the
dual vector space defined by (
∗ 1 if i = j
ej (ei ) = δji =
0 if i 6= j.
484 CHAPTER 10. MODULES AND ALGEBRAS

With respect to B the matrix of e∗j is the row vector,


j
(0, . . . , 0, 1, 0, . . . , 0).

Since the e∗j form a basis of V ∗ , we see that if V is finite dimensional, then dim V = dim V ∗ and the
mapping ei ↔ e∗i sets up an isomorphism between V and V ∗ . However, it is proper to distinguish
between vectors and covectors because their coordinates do not change in the same way under a
change of basis.

Proposition 10.2.30
Let B = (e1 , e2 , . . . , en ) and B 0 = (f1 , f2 , . . . , fn ) be two ordered bases of a vector space V .
Let B ∗ and B 0∗ be the respective associated dual bases. If the coordinate change matrix
from B to B 0 coordinates is the n × n matrix M , then the coordinate change matrix from
B ∗ to B 0∗ coordinates is M −1 .

Proof. From linear algebra, we know that the matrix M = (mij ) is defined by
n
X
ei = mij fj . (10.4)
j=1

Consequently, for any vector a ∈ V , if [a]B = (ai ) and [a]B0 = (a0i ) as a column vectors, then
n
X
[a]B0 = M [a]B ⇐⇒ a0i = mij aj .
j=1

Let us call M = (mij ) the coordinate change matrix from B ∗ to B 0 ∗ coordinates. Then
n
X
e∗k = f`∗ m`k .
`=1

The order of writing is changed as compared to (10.4) to reflect the fact that coordinates of vectors
in V ∗ are row vectors. Then we have
 
n
X Xn
δki = e∗k (ei ) = f`∗  mij fj  m`k
`=1 j=1
n X
X n X
X
= mij f`∗ (fj )m`k = mij δ`j m`k
`=1 j=1 `=1 j=1
n
!
X X X
= mij δj` m`k = mij mjk .
j=1 `=1 j=1

This last summation corresponds to the entries of the matrix product M M so we have proven that
I = M M . Hence, we conclude that M = M −1 . 

We could foreseeably decide to consider co-covectors of V , i.e., elements in V ∗∗ or higher duals.


However, using the bases as in Proposition 10.2.30, if (e∗∗ ∗∗ ∗∗
1 , e2 , . . . , en ) is the associated basis of V
∗∗
∗∗ ∗∗ ∗∗
with respect to the ordered basis (e1 , e2 , . . . , en ) and if (f1 , f2 , . . . , fn ) is the associated basis of
V ∗∗ with respect to (f1 , f2 , . . . , fn ), then the change of coordinate matrix from (e∗∗ ∗∗ ∗∗
1 , e2 , . . . , en ) to
∗∗ ∗∗ ∗∗ ∗∗ −1 −1
(f1 , f2 , . . . , fn ) coordinates on V is (M ) = M . Hence, co-covectors change bases exactly
as vectors do. Consequently, we do not distinguish co-covectors from vectors.
Having introduced the dual, we now change the notation to the more standard one. When dealing
with coordinates of vector spaces at the same time of coordinates of the dual, it is typical to use
10.2. VECTOR SPACES 485

superscripts for the coordinates of vectors and subscripts for coordinates of covectors. More precisely,
we usually refer to an ordered basis of V as B = (e1 , e2 , . . . , en ) and the associated dual basis of
V ∗ as B ∗ = (e∗1 , e∗2 , . . . , e∗n ). For a vector v ∈ V , we denote its coordinates with superscripts as
(v 1 , v 2 , . . . , v n ) ∈ F n so that
n
X
1 2 n
v = v e1 + v e2 + · · · + v en = v i ei .
i=1

We emphasize that these are superscripts and not powers on the symbol v. As a matrix, the
>
coordinates of v are depicted by a column vector v 1 v 2 · · · v n . For a covector µ ∈ V ∗ , we
denote the coordinates with subscripts as (µ1 , µ2 , . . . , µn ) ∈ F n so that
n
X
µ = µ1 e∗1 + µ2 e∗2 + · · · + µn e∗n = µi e∗i .
i=1

As a matrix, the coordinates of µ are depicted by a row vector µ1 µ2 · · · µn . It is useful to
note in the above summations, superscript indices are paired up with subscript indices.

Exercises for Section 10.2


1. Show that if T : V → W is a linear transformation between vector spaces, then T (0V ) = 0W , where
0V and 0W are the additive identity elements in V and W , respectively.
2. Let T : V → W be a linear transformation between vector spaces. Show that if a set of vectors
{T (v1 ), T (v2 ), . . . , T (vn )} in Im T is linearly independent, then the set of vectors {v1 , v2 , . . . , vn } in V
is linearly independent.
3. Let U, V, W be vector spaces and let L1 : U → V and L2 : V → W be linear transformations. Prove
that the composition L2 ◦ L1 : U → W is a linear transformation. If L1 and L2 are isomorphisms, is
it also true that L2 ◦ L1 is an isomorphism?
4. Prove Proposition 10.2.4.
5. Let V and W be finite-dimensional vector spaces over a field F . Prove that dim(V ⊕ W ) = (dim V ) +
(dim W ).
6. Let W be a subspace of a vector space V . Prove that dim W ≤ dim V .
7. Let V be a vector space over a field F and let W1 and W2 be subspaces of V . Prove that U ∪ W is
not necessarily a subspace of V .
8. Sum of Subspaces. Let V be a vector space and let U and W be subspaces of V . Define the sum of
the subspaces, denoted U + W as the set

U + W = {v ∈ V | v = u + w for some u ∈ U and w ∈ W }.

Show that U + W is a subspace of V .


9. Let V be a vector space over a field F and let U and W be subspaces of V . Prove that U ∩ W is a
subspace of V .
10. Let V be a vector space over a field F and let U and W be subspaces of V . Prove that dim(U + W ) =
dim U + dim W − dim(U ∩ W ). [See Exercise 10.2.8.]
11. Subspace Complement. Let U be a vector space and V a subspace of U . Another subspace W of U is
called a complement to V if V ∩ W = {0} and if every vector u ∈ U can be written as u = v + w for
some v ∈ V and some w ∈ W .
(a) Show that if W is a complement to V , then every vector u ∈ U can be written as u = v + w in
a unique way.
(b) Show that if W is a complement to V , then dim U = dim V + dim W .
12. Let F be a field, let K be a finite field extensions of F and let V be a vector space over K. Prove
that V is a vector space over F and that dimF V = [K : F ](dimK V ).
13. Let V be a vector space over a field F and define End(V ) = HomF (V, V ), as the space of endomor-
phisms on V (linear transformations of V into itself). Prove that (End(V ), +, ◦) is a ring, where ◦ is
function composition.
486 CHAPTER 10. MODULES AND ALGEBRAS

14. Let T : V → W be a linear transformation between vector spaces. Let U be a subspace of V . Prove
that T (U ) = {T (u) | u ∈ U } is a subspace of W . [This generalizes the fact that Im T is a subspace of
W since Im T = T (V ).]
15. Let T : V → W be a linear transformation between vector spaces. Let U be a subspace of W . Define
the set
T −1 (U ) = {v ∈ V | T (v) ∈ U }.
Prove that T −1 (U ) is a subspace of V . [This generalizes the fact that Ker T is a subspace of V because
Ker T = T −1 ({0}).]
16. Consider the vector space V = C 0 ([a, b], R) as in Example 10.2.29. Let g ∈ V .
(a) Prove that the function Ψg : V → R defined by
Z b
Ψg (f ) = f (t)g(t) dt
a


is an element of V .
(b) Prove that the mapping g 7→ Ψg is an injective linear transformation from V to V ∗ .
(c) Let c ∈ [a, b]. Prove that the evaluation evc ∈ V ∗ is not equal to Ψg for any function g ∈ V .

10.3
Introduction to Modules
Vector spaces are a particular instance of a more general structure. As we saw in the previous
section, a vector space is an action structure in which a field acts on an abelian group. Modules are
“simply” an algebraic structure in which a ring acts on an abelian group.
Because modules generalize vector spaces, the reader should anticipate that modules have even
more uses than vector spaces. It is initially useful for intuition to think of modules as vector
spaces where the scalars come from a ring. However, since the ring of scalars could have zero
divisors, might not have inverses of nonzero elements, might not have unique factorization, might
be noncommutative and so on, not all the familiar and useful theorems in vector spaces have an
equivalent in the context of modules. Instead, other concepts, which are trivial (uninteresting) for
vector spaces, take on greater importance.

10.3.1 – Definition of a Module

Definition 10.3.1
Let R be a ring. A left R-module is an abelian group (M, +) and a function R × M → M ,
called scalar multiplication denoted (r, m) 7→ rm or r · m, that satisfies the following
identities:
(1) ∀r ∈ R, ∀m, n ∈ M, r(m + n) = rm + rn;

(2) ∀r, s ∈ R, ∀m ∈ M, (r + s)m = rm + sm;


(3) ∀r, s ∈ R, ∀m ∈ M, r(sm) = (rs)m;
(4) ∀m ∈ M, 1m = m, whenever R possesses a multiplicative identity 1.

As with vector spaces, the elements in R are called scalars in relation to the elements in M (but
we do not call elements of M vectors).
The adjective “left” in this definition refers to the fact that the elements of R act on the left of
elements of M . We could change the axioms as appropriate to define a right R-module. If R is not
10.3. INTRODUCTION TO MODULES 487

commutative, then a left R-module might not be a right R-module. Example 10.3.11 illustrates this
difference. However, if R is commutative, every left R-module is in one-to-one correspondence with
a right R-module. (See Exercise 10.3.10.) Consequently, whenever R is commutative, we simply
refer to an R-module without distinguishing between right or left.
Nearly all theorems for left R-modules carry over with appropriate changes to right R-modules.
Consequently, unless we specify otherwise, we will state theorems in reference to left R-modules with
the implicit understanding that they are equally valid for right R-modules with appropriate changes
of left- to right-actions.

Proposition 10.3.2
Let 0R be the additive identity in a ring R and let 0M be the additive identity in a left
R-module M .
(1) 0R · m = 0M for all m ∈ M .
(2) r0M = 0M for all r ∈ R.

(3) For all r ∈ R and m ∈ M , we have (−r)m = −(rm) = r(−m).

Proof. (1) For all m ∈ M , we have 0R m + 0R m = (0R + 0R )m = 0R m. Adding −(0R m) to both


sides of the equality 0R m + 0R m = 0R m gives 0R m = 0M .
(2) For all r ∈ R, we have r0M + r0M = r(0M + 0M ) = r0M . Adding −(r0M ) to both sides to
the equality r0M + r0M = r0M gives r0M = 0M .
(3) Consider any r ∈ R and m ∈ M . Then

(−r)m + rm = (−r + r)m = 0R m = 0M ,

where the last equality holds by (1). Thus, (−r)m is the additive inverse to rm so (−r)m = −(rm).
Furthermore,
rm + r(−m) = r(m − m) = r0M = 0M ,
where the last equality holds by (1). Thus, r(−m) is the additive inverse to rm so r(−m) = −(rm).

10.3.2 – Examples of Modules


As the introduction to this section promised, modules arise in many contexts. This list of examples
gives a glimpse of the ubiquity of modules in algebra.

Example 10.3.3 (Vector Spaces). Section 10.2 served as a motivation for modules. The reader
should notice how the axioms of vector spaces over a field F exactly match up to the axioms of a
module over the ring F . Hence, vector spaces are precisely modules over fields. 4

Example 10.3.4 (Trivial Module). Let R be any ring. The trivial abelian group of one element
{0} is both a left R-module with the action defined by r · 0 = 0 and similarly for a right R-module.4

Example 10.3.5. Let R be a ring. If we set M = R, then R is a left- or right-module over itself,
depending on whether we consider the multiplication on the left or the multiplication on the right
of a given element. 4

Example 10.3.6 (Z-Modules). Consider the ring of integers Z and let M be a Z-module. (We
need not refer to a left- or right-module since Z is commutative.) If r is a positive integer, then the
axioms of the definition give

r times r times r times


z }| { z }| { z }| {
rm = (1 + 1 + · · · + 1)m = (1m) + (1m) + · · · + (1m) = m + m + · · · + m.
488 CHAPTER 10. MODULES AND ALGEBRAS

By Proposition 10.3.2, 0m = m and for any negative integer r, we have rm = −(|r|m). With
this notation, all the axioms of modules are simply the power rules for an abelian group (Propo-
sition 3.2.13). Consequently, with the ring Z, the axioms (1–4) in Definition 10.3.1 do not impose
any further conditions beyond the requirement that M is an abelian group. Hence, the algebraic
structure of Z-modules is precisely the collection of abelian groups. 4

The above example shows how Z-modules corresponds to an algebraic structure that we encoun-
tered, abelian groups. In a similar way, it is not uncommon that for a given ring R, the associated
structure of left R-modules corresponds to a different algebraic structure, first encountered in an-
other way. Because of this regular occurrence, the algebraic structures of modules provide a common
context for many different algebraic structures.

Proposition 10.3.7
Let R be a ring and let M and N be left R-modules. The Cartesian product M × N
equipped with addition and scalar multiplication
def def
(m1 , n1 ) + (m2 , n2 ) = (m1 + m2 , n1 + n2 ) and r(m, n) = (rm, rn)

is a left R-module.

Proof. (Left as an exercise for the reader. See Exercise 10.3.6.) 

Definition 10.3.8
The left R-module constructed in Proposition 10.3.7 is called the direct sum of M and N
and is denoted M ⊕ N .

As with vector spaces, the notion of a direct sum of two modules generalizes to the direct sum
of a finite number of modules and also to the direct sum and the direct product of any collection of
modules.

Example 10.3.9. Let R be a ring and let M = Rk = R ⊕ R ⊕ · · · ⊕ R, be the direct sum of R with
itself k-times (where k is a positive integer). Any element m ∈ M is an k-tuple m = (r1 , r2 , . . . , rk ).
We define addition component-wise and the left-action of R on M by

sm = s(r1 , r2 , . . . , rk ) = (sr1 , sr2 , . . . , srk ).

This makes Rk into a left-module over R. Likewise, Rk also has the structure of a right R-module if
we take multiplication of the k-tuple on the right. If R is not commutative, the corresponding left-
and right-modules are not necessarily in a natural one-to-one correspondence with each other. 4

Example 10.3.10. Consider R = Z and M = {(x, y, z) ∈ Z3 | x + y + 2z ∈ 6Z}. Define the addition


component-wise and define the action n(x, y, z) = (nx, ny, nz). The set M is a subgroup of (Z3 , +).
Furthermore, it is not hard to see directly that the action of Z on M makes M into a Z-module.
Now Z3 is a free abelian group of rank 3. By Theorem 4.5.9, M must also be a free abelian group
of rank ≤ 3. In this example, it turns out that the rank of M is 3.
Note that if x + y + 2z = 6t, then x = 6t − y − 2z. Hence, the elements of M are

(x, y, z) = (6t − y − 2z, y, z) for arbitrary t, y, z ∈ Z


= t(6, 0, 0) + y(−1, 1, 0) + z(−2, 0, 1).

Thus, M is a free abelian group of rank 3 with basis {(6, 0, 0), (−1, 1, 0), (−2, 0, 1)}. 4

Example 10.3.11 (Matrix Multiplication). Let F be a field and let n be a positive integer.
Consider the ring Mn×n (F ) of n × n matrices with coefficients in F . Consider also the abelian group
10.3. INTRODUCTION TO MODULES 489

M = F n , with the operation of addition. Think of elements in M as column vectors. The usual
properties of of matrix-vector multiplication satisfy axioms (1–4) in Definition 10.3.1. Therefore, the
ring Mn×n (F ) acts by left-multiplication on F n in such as way that F n is a left Mn×n (F )-module.
However, it is not possible to multiply an n × n matrix on the right of a n × 1 column vector. Hence,
in this setup F n is not a right-module. 4

Example 10.3.12 (Ideals; Quotient Rings). Let R be a ring and let I be a left ideal in R. Then
I satisfies all the axioms of a left R-module. If I is a right ideal in R, then I is a right R-module.
If I is a (two-sided) ideal, then I is both a left and right R-module. Furthermore, the quotient ring
R/I is also an R-module under the action

r · (s + I) = rs + I in other words r · s̄ = rs. 4

Example 10.3.13 (F [x]-modules). Let F be a field and let R be the polynomial ring R = F [x].
We propose to determine what left F [x]-module may be. Let M be a left F [x]-module.
First note that F (as a subring of F [x]) must act on M according to the left R-module axioms.
Thus, M is a vector space over the field F . Now consider the action of the element x on M . The
axioms for a left-module, show that the element x behaves as a linear transformation T on the F -
vector space M . Furthermore, the axioms impose no further restrictions on the linear transformation
T we use for x. Now for any polynomial p(x) = ad xd + · · · + a1 x + a0 ∈ F [x] and for any v ∈ M ,

p(x) · v = (ad xd + · · · + a1 x + a0 )v
d times
z }| {
= ad T (T · · · T (v) · · · )) + · · · + a1 T (v) + a0 v. (10.5)

Consequently, an F [x]-module consists of two data: an F -vector space V and a linear transformation
T : V → V . The above equation gives the action of any polynomial on a vector in V .
As a specific example, let F = R and consider the R[x] module that consists of the vector space
V = R2 and with the associated linear transformation T : R2 → R2 that, with respect to the
standard basis, has the matrix  
3 4
A= .
2 3
Then, for example,
   2       
2 4 3 4 4 3 4 4 4
(3x − 2x + 4) · =3 −2 +4
−1 2 3 −1 2 3 −1 −1
       
51 72 4 −6 −8 4 16
= + +
36 51 −1 −4 −6 −1 −4
 
132
= . 4
79

Example 10.3.14 (Group Representations). Section 8.6 offered a brief introduction to group
representations. Let G be a finite group and let F be a field. Consider the group ring F [G] and
let V be a left F [G]-modules. As in the previous example, F acts on V (as a subring of F [G]) and
hence V must be an F -module and therefore a vector space over the field F . Let g ∈ G. Axiom (1)
of Definition 10.3.1 shows that g(v1 + v2 ) = gv1 + gv2 for all v1 , v2 ∈ V . Furthermore, by axiom (3),
for any c ∈ F ,
c(gv) = (c1)(gv) = (c1g)v = (g(c1))v = g(c1v) = g(cv).
Hence, g acts on V as linear transformation, which is invertible with inverse g −1 . Hence, an F [G]-
module defines a function ρ : G → GL(V ). However, Axiom (3) in Definition 10.3.1 shows that ρ is
a homomorphism. A homomorphism ρ : G → GL(V ) is precisely a representation of G in the vector
space V . With this data, the remaining axioms for an F [G]-module are satisfied.
490 CHAPTER 10. MODULES AND ALGEBRAS

Conversely, any group representation ρ : G → GL(V ) defines an F [G]-module via the left action
 
X X
 ag g  · v = ag ρ(g)(v).
g∈G g∈G

(The details of this proof are left as an exercise. See Exercise 10.3.7.)
In conclusion, representations of a group into vector spaces over a field F are in a one-to-one
correspondence with left F [G]-modules. 4

Though it is valuable to compare modules to vector spaces (since after all vector spaces are just
modules over a field), modules generalize vector spaces so much that they differ in many significant
ways. It may be instructive for a student to review the proofs of theorems about vector spaces with
a view of clearly noting what properties of fields are required at each step.

10.3.3 – Submodules

Definition 10.3.15
Let R be a ring and let M be a left R-module. A submodule of M is a nonempty subset
N such that (N, +) is a subgroup of (M, +) and rN ⊆ N for all r ∈ R.

Every R-module M (left or right) always contains at least two submodules, namely the trivial
submodule {0} and M itself.
Example 10.3.16. Example 10.3.5 pointed out that R is a left R-module over itself. A submodule
of R is a subgroup of the additive group (R, +) that is also closed under scalar multiplication. This
is precisely the definition of a left ideal. In particular, not all subrings of R are submodules but only
left ideals are submodules of R. 4

Example 10.3.17. More generally than the previous example, if M is a left R-module and I is a
left ideal of R, then the set

IM = {a1 m1 + a2 m2 + · · · + ak mk | ai ∈ I, mi ∈ M }

is a submodule of M . Indeed, the sum of two linear combinations in IM is again a linear combination
in IM . Furthermore, for all α = a1 m1 + a2 m2 + · · · + ak mk ∈ IM and all r ∈ R, we have

rα = (ra1 )m1 + (ra2 )m2 + · · · + (rak )mk .

Since I is a left ideal in R, all the terms rai are in I. Hence, rα ∈ IM . 4

Similar to the One-Step Subgroup Criterion for the contexts of groups, the following proposition
is often convenient for proofs that certain subsets of a module are submodules.

Proposition 10.3.18 (One-Step Submodule Criterion)


Let R be a ring with an identity 1 6= 0 and let M be a left R-module. A nonempty subset
N of M is a submodule if and only if for all x, y ∈ N and all r ∈ R, x + ry ∈ N .

Proof. (=⇒) Suppose that N is a left submodule of N . Then for all x, y ∈ N and for all r ∈ R, we
have ry ∈ N since N is closed under scalar multiplication and hence x + ry ∈ N .
(=⇒) Suppose that N is a nonempty subset of M such that for all x, y ∈ N and all r ∈ R,
we have x + ry ∈ N . Then setting r = −1, we deduce that x − y ∈ N . Hence, by the One-Step
Subgroup Criterion (N, +) is a subgroup of (M, +). In particular, N contains 0. Then for all y ∈ N
and all r ∈ R, ry = 0 + ry ∈ N . Hence, N is a submodule of M . 
10.3. INTRODUCTION TO MODULES 491

Example 10.3.19 (F [x]-Submodules). Let F be a field. Recall from Example 10.3.13 that F [x]-
modules consist of a vector field V over F along with a linear transformation T : V → V , where
the action of polynomials on V is given by (10.5). Let W be an F [x]-submodule. Since W is closed
under addition and under scalar multiplication by elements in F , which is a subring of F [x], then W
is a vector subspace of V . However, W must also be closed under the action of the element x. For
all w ∈ W , we have x · w = T (w). Thus, W must satisfy the additional condition that T (w) ∈ W
for all w ∈ W , i.e., invariant under T . Finally, the action (10.5) shows that any subspace invariant
under T satisfies p(x) · W ⊆ W .
Suppose that V is a finite-dimensional F -vector space with a basis B = {v1 , v2 , . . . , vk } such that
B 0 {v1 , v2 , . . . , v` }, where ` ≤ k is a basis of W . Then the matrix of T with respect to B has the form
 
a11 a12 · · · a1`
 a21 a22 · · · a2` 
 .. .. . ∗
 
. . . 
 . . . . 
B
[T ]B =  a`1 a`2 · · · a`` ,
 
 
 
 
 0 ∗ 

0
where ∗ represents unconstrained entries and where [T ]B
B0 is the matrix (aij ).
As a first specific example, consider the R[x]-module V = R2 equipped with the linear transfor-
mation T whose matrix with respect to the standard basis is
 
0 −1
A= .
1 0
Suppose W is a nontrivial submodule of V . Then for any nonzero vector given in coordinates
~ = ab ∈ W satisfies xw~ = −b

(with respect to the standard basis) by w a . Since these vectors are
orthogonal to each other, they are linearly independent. But then Span(w,~ xw)~ = R2 so W = R2 .
Consequently, the only submodules of this R[x]-module are {0} and V .
In contrast, consider the R[x]-module V = R2 equipped with the linear transformation T whose
matrix with respect to the standard basis is
 
1 3
B= .
3 1
Suppose that W is a nontrivial submodule of V . The only nontrivial strict subspaces of V are
one-dimensional so suppose that W = Span(w). ~ Then W is a R[x]-submodule of V if and only if
~ ∈ W , so T (w)
T (w) ~ = λw~ for some constant λ. Hence, in this case where V is two-dimensional,
nontrivial R[x]-submodules are the eigenspaces. With the matrix B, these are
   
−1 1
W−2 = Span and W4 = Span ,
1 1
where Wλ is the eigenspace for the eigenvalue λ. 4
Example 10.3.20 (Fractional Ideals). Let R be an integral domain and let F be its field of
fractions. The field F is a R-module via the scalar multiplication r · f = rf in F for all r ∈ R and
all f ∈ F . A fractional ideal of R is defined as an R-submodule of F .
For example, if R = Z, then for example I =  a7 | a ∈ Z is a Z-submodule of Q. Following


the notation for usual ideals, we can write I = 71 , indicating that I is generated by the set of
elements { 17 } as a Z-submodule of Q. The submodule I is not a ring, since it is not closed under
multiplication. Indeed, 71 · 17 = 491

/ I.
It is interesting to note that the set of nonzero fractional ideals associated to an integral domain R
forms a group under ideal multiplication. As with usual ideals, define the multiplication of fractional
ideals by the set of linear combinations,
IJ = {a1 b1 + a2 b2 + · · · + ak bk | ai ∈ I, bi ∈ J}.
492 CHAPTER 10. MODULES AND ALGEBRAS

If I, J, K are fractional ideals of R, then the ideal I(JK) consists of combinations

I(JK) = {a1 b1 c1 + a2 b2 c2 + · · · + ak bk ck | ai ∈ I, bi ∈ J, ci ∈ K}

and likewise for the ideal (IJ)K. Hence, multiplication of fraction ideals is associative. Since
integral domains have an identity, the ring R serves as the identity for ideal multiplication because
RI = IR = I. We claim that the inverse of a fractional ideal I is

I −1 = {x ∈ F | xI ⊆ R}.

First, we show that I −1 is a fractional ideal. Let x, y ∈ I −1 . Then for all a ∈ I, (x − y)a = xa − ya.
Since xa, ya ∈ R, then (x − y)a ∈ R as well, so x − y ∈ I −1 . If r ∈ R, then for all a ∈ I we
have (rx)a = r(xa), but since xa ∈ R, (rx)a is also in R so rx ∈ I −1 . This shows that I −1 is an
R-submodule of F . Furthermore,

I −1 I = {a1 b1 + a2 b2 + · · · + ak bk | ai ∈ I −1 , bi ∈ I}.

By definition of I −1 , for all 1 ≤ i ≤ k, the product ai bi is in R and so is any linear combination


a1 b1 + a2 b2 + · · · + ak bk . Hence, I −1 I ⊆ R. Let b ∈ I be a nonzero element. Since b ∈ F , the
multiplicative element b−1 is in F and in particular, b−1 ∈ I −1 . Hence, the identity 1 = b−1 b is in the
product fractional ideal I −1 I. Then since I −1 I is an R-submodule of F , we deduce that R ⊆ I −1 I.
Therefore I −1 I = R, showing that I −1 is indeed the multiplicative inverse to the fractional ideal
I. 4

In Example 10.3.12 where the R-module is M = R/I, for all a ∈ I and all m ∈ M , we have
am = 0. This cancellation resembles zero division but is different since a is an element of the ring
R and m is an element of the module. This is an example of a more general concept. However, it
works both ways: thinking of R acting on M and of M being acted on by R.

Definition 10.3.21
Let R be a ring. Let I be a right ideal in R. The (left-)annihilator of I in M is the set

Ann(I) = {m ∈ M | am = 0 for all a ∈ I}.

Let N be a submodule of the module M . The (left-)annihilator of N in R is the set

Ann(N ) = {a ∈ R | an = 0 for all n ∈ M }.

For example, if R = Z, and M = Z/6Z, then the annihilator of M is the ideal 6Z.

Proposition 10.3.22
Let M be a left R-module.
(1) If I is a right ideal in R, then Ann(I) is a submodule of M .

(2) If N is a submodule of M , then Ann(N ) is an ideal (two-sided) of R.

Proof. (1) Let m1 , m2 ∈ Ann(I) and let r ∈ R. Let a ∈ I. Then a(m1 + rm2 ) = am1 + arm2 . Since
m1 ∈ Ann(I), then am1 = 0. Since I is a right ideal, ar ∈ I and thus (ar)m2 = 0 as well. Hence,
a(m1 + rm2 ) = 0 and by the One-Step Submodule Criterion, Ann(I) is a submodule of M .
(2) Let a, b ∈ Ann(N ) and let r ∈ R. For all n ∈ N , we have (a−b)n = an−bn = 0−0 = 0. Thus,
by the One-Step Subgroup Criterion, (Ann(N ), +) is a subgroup of (R, +). Also, (ra)n = r(an) = 0
so ra ∈ Ann(N ). Since N is a submodule of M , (ar)n = a(rn) = 0 because rn ∈ N . Thus,
ar ∈ Ann(N ), and we deduce that Ann(N ) is a two-sided ideal of R. 
10.3. INTRODUCTION TO MODULES 493

Also associated to the issue of zero divisors and annihilators are torsion elements. An element
m in a left R-module M is called a torsion element if rm = 0 for some nonzero r ∈ R. The subset
of torsion elements in M is denoted by

Tor(M ) = {m ∈ M | ∃r ∈ R, rm = 0}. (10.6)

The torsion subset is not always a submodule of M . See Exercise 10.3.20. We also say that a module
is a torsion module if every element in M is a torsion elements, i.e., M = Tor(M ).

10.3.4 – Algebras
Occasionally, we encounter examples of modules that have additional structure. For example, R[x]
is an R-module with the usual multiplication of polynomials by scalars; but R[x] is also a ring and
so has a multiplication that behaves well with multiplication by scalars.

Definition 10.3.23
Let R be a commutative ring and let M, N, P be three R-modules. A function ϕ : M ×N →
P is called bilinear or R-bilinear if

(1) ϕ(m1 + m2 , n) = ϕ(m1 , n) + ϕ(m2 , n) for all m1 , m2 ∈ M and all n ∈ N ;


(2) ϕ(rm, n) = rϕ(m, n) for all r ∈ R, all m ∈ M , and all n ∈ N ;
(3) ϕ(m, n1 + n2 ) = ϕ(m, n1 ) + ϕ(m, n2 ) for all m ∈ M and all n1 , n2 ∈ N ;

(4) ϕ(m, rn) = rϕ(m, n) for all r ∈ R, all m ∈ M , and all n ∈ N .

The definition of a bilinear function generalizes in two ways. By changing the axioms (2) and
(4) to multiplication on the right, the definition can apply to right R-modules. Furthermore, it is
also possible to talk about multilinear functions but we delay discussing such functions until later.

Definition 10.3.24
Let R be a commutative ring. An R-algebra is a pair (M, [, ]) where M is an R-module M
and where [, ] is a R-bilinear map [, ] : M × M → M .

By axioms (1) and (3) in Definition 10.3.23, the bilinear map behaves like a product in that it
distributes over the addition + in the module M . Intuitively speaking, axioms (2) and (4) require
that this bilinear map (product) on M behaves well with respect to the scalar multiplication.
We conclude this section with a few examples of R-algebras.
Example 10.3.25 (Trivial Algebra). Let R be a commutative ring. Every R-module can be
given a trivial R-algebra structure with [m1 , m2 ] = 0 for all m1 , m2 ∈ M . 4

Example 10.3.26 (Polynomial Algebra). The ring R[x], along with polynomial multiplication
as the bilinear map, has the structure of an R-algebra. Consequently, that is why it is technically
proper to talk about the “polynomial algebra” over R. 4

Example 10.3.27 (Vector Cross Product). Obviously R3 is a vector space over the field R.
However, in analytic geometry, we encounter the cross product of vectors in R3 . The algebraic
properties of the vector cross product include

(~u + ~v ) × w
~ = ~u × w
~ + ~v × w
~ (r~u) × w
~ = r(~u × w)
~
~u × (~v + w)
~ = ~u × ~v + ~u × w
~ ~u × (rw)
~ = r(~u × w)
~

~ ∈ R3 and all r ∈ R. Consequently, equipped with vector cross product, not only is R3
for all ~u, ~v , w
an R-module (i.e., vector space over R), it is also an algebra over R. It is interesting to recall that
(R3 , +, ×) is not a ring because × is not associative. 4
494 CHAPTER 10. MODULES AND ALGEBRAS

Example 10.3.28 (Matrix Algebra). Let R be a commutative ring and consider the set Mn (R)
of n × n matrices with coefficients in R. The scalar multiplication R × Mn (R) → Mn (R) of a matrix
rA equips Mn (R) with the structure of an R-algebra. We already know that Mn (R) is a ring with
the addition and multiplication of matrices. This set Mn (R) is also an R-algebra.
The distributivity axiom for the ring Mn (R) states that

(A1 + A2 )B = A1 B + A2 B and A(B1 + B2 ) = AB1 + AB2

but scalar-matrix multiplication also satisfies c(AB) = (cA)B = A(cB). Hence, the ring Mn (R) is
an R-algebra. 4

Example 10.3.29 (Boolean Algebra). By Theorems 10.1.12 and 10.1.13, Boolean algebras are
Boolean rings and vice versa. It is useful to see how Boolean algebras are algebras by this new
definition. The property that x + x = 0 for all x in a Boolean ring, along with the other axioms,
implies that every Boolean ring is an F2 -algebra. Hence, Boolean algebras are F2 -algebras along
with some other axioms. Hence, not every F2 -algebra is a Boolean algebra, since for example, a
trivial F2 -algebra is not a Boolean algebra. 4

Polynomial algebras and matrix algebras are a specific instance of another class of algebras.
Definition 10.3.24 only required the pairing on M to be bilinear. In particular, [, ] need not be
associative. If the pairing is associative, then M also carries the structure of a ring. We given this
situation a specific name.

Definition 10.3.30
Let R be a commutative ring and let (A, [, ]) be an R-algebra. If [, ] is associative, then
A is called an associative R-algebra. If in addition A contains an identity for [, ], then the
associative algebra is called unital.

An associative R-algebra A is a ring in its own right but in which the ring R serves as set of scalars.
Exercise 10.3.34 gives another construction for unital associative R-algebras. Polynomial algebras
over a commutative ring R and a matrix algebra over a commutative ring R are unital associative
R-algebras. In contrast, R3 equipped with the addition and the cross product is a nonassociative
R-algebra.

Exercises for Section 10.3


1. Let R be a ring and let M be a left R-module. Prove that the action of R on M also induces a group
action of U (R) on M .
2. Let R be a ring and let M be a left R-module. Suppose that rm = 0 for some r ∈ R and some nonzero
m ∈ M . Prove that r does not have a left inverse in R.
3. Let M be a left R-module and let S be a subring of R. Show that M is a left S-module as well.
4. Let M be a left R-module and let ϕ : S → R be a unital ring homomorphism (i.e., a homomorphism
satisfying ϕ(1S ) = 1R , if S has an identity). Show that M is a left S-module if for all s ∈ S and all
m ∈ M we define sm = ϕ(s)m.
5. Show that there is no nontrivial action of Z[i] on Z that gives Z the structure of a Z[i]-module.
6. Prove Proposition 10.3.7.
7. Prove the details in the second half of Example 10.3.14. Conclude that a representation of G into a
F -vector space V makes V into an F [G]-module.
8. Let R be a ring with an identity 1 6= 0. Prove that Rn is a free left R-module.
9. Let F be a field and let R be the ring Mn (F ) of n × n matrices with coefficients in F .
(a) Prove that the F -vector space Mn×m (F ) is a left R-module.
(b) Prove that the F -vector space Mm×n (F ) is a right R-module.
10. Let R be a commutative ring and let M be a left R-module. Show that defining a scalar right-
def
multiplication by mr = rm gives M the structure of a right R-module.
10.3. INTRODUCTION TO MODULES 495

11. Let R and S be rings. Let M be a left R-module and let N be a left S-module. Prove that M × N is
a left R ⊕ S-module when equipped with the following addition and scalar multiplication:
def
(m1 , n1 ) + (m2 , n2 ) = (m1 + m2 , n1 + n2 )
def
(r, s) · (m, n) = (rm, sn).

12. Let F be a field and consider the polynomial ring F [x, y].
(a) Prove that an F [x, y]-module consists of vector space V over F along with commuting linear
transformations T1 and T2 .
(b) Explicitly describe the action of a polynomial p(x, y) on a vector in V .
13. Let R be a commutative ring and let M be a left R-module. Suppose that D is a multiplicatively
closed subset of R not containing 0. Define the equivalence relation ∼ on D × M by

(d1 , m1 ) ∼ (d2 , m2 ) ⇐⇒ r(d2 m1 − d1 m2 ) = 0

for some r ∈ R. Prove that D−1 M is a left D−1 R-module.


14. Consider the Z-module M = Zn . Decide with a proof which of the following subsets are submodules
of M .
(a) {(x1 , x2 , . . . , xn ) ∈ M | x1 + x2 + · · · + xn = 0}
(b) {(x1 , x2 , . . . , xn ) ∈ M | x1 + x2 + · · · + xn = 5}
(c) {(x1 , x2 , . . . , xn ) ∈ M | x1 + x2 + · · · + xn ∈ 5Z}
(d) {(x1 , x2 , . . . , xn ) ∈ M | x21 + x22 + · · · + x2n ∈ 5Z}
(e) {(x1 , x2 , . . . , xn ) ∈ M | xi ∈ iZ}
n o
(f) (x1 , x2 , . . . , xn ) ∈ M | xx1 +2x 2 +···+nxn
1 +x2 +···+xn
= 5 ∪ {(0, 0, . . . , 0)}
15. Let M be a left R-module and let N1 and N2 be submodules. Define the addition of submodules as

N1 + N2 = {m1 + m2 | m1 ∈ N1 , m2 ∈ N2 }.

Prove that N1 + N2 is a submodule of M .


16. Let M be a left R-module and let N1 and N2 be submodules.
(a) Prove that if N1 and N2 are submodules of M , then N1 ∩ N2 is a submodule of M .
(b) Prove that the intersection (not necessarily finite) of a collection of submodules of M is again a
submodule.
17. Let M be a left R-module and let I be an ideal contained in Ann(M ). Prove that M is a left
R/I-module if for all r ∈ R and all m ∈ M we define (r + I)m = rm.
18. Let N1 ⊆ N2 ⊆ · · · be a chain of submodules of an R-module M . Prove that ∞
S
i=1 Ni is a submodule
of M .
19. Consider the Z-module M = Z45 ⊕ Z12 . Find the annihilator of M .
20. Let R be a ring and let M be a left R-module. Prove that if R is an integral domain, then the set
of torsion elements Tor(M ), defined in (10.6), is a submodule of M . Give an example of a module M
with R not an integral domain where Tor(M ) is not a submodule of M .
21. Let R be an integral domain and let M1 and M2 be R-modules. Prove that Tor(M1 ⊕ M2 ) =
Tor(M1 ) ⊕ Tor(M2 ).
22. Consider the ring R = Z[x].
(a) Let M = F25 and define the scalar multiplication by Z on M as the usual one with integers and
where x acts on F25 by multiplying the vectors
 
0 1
.
0 0

Prove that this scalar multiplication equips M with the structure of an R-module.
(b) Determine the annihilator Ann(M ).
23. Let M be a left R-module. Let I be a right ideal of R. Prove that I ⊆ Ann(Ann(I)). Give an example
to show that this containment might be strict.
496 CHAPTER 10. MODULES AND ALGEBRAS

24. Let M be a left R-module. Let N be a submodule of M . Prove that N ⊆ Ann(Ann(N )). Give an
example to show that this containment might be strict.
25. Consider the R[x]-module V = R2 , equipped with the linear transformation on V corresponding
 to
projection onto the y = x line. Prove that the only submodules of V are Span 11 and Span −1
1
.
26. Consider the C[x]-module V = C2 , equipped with the linear transformation T : V → V given by
    
z1 0 1 z1
T = .
z2 −1 0 z2

(a) Prove that V has exactly four submodules.


(b) Explain why this differs from the first specific example in Example 10.3.19
27. Consider the R[x]-module V = R3 , equipped with the linear transformation T : V → V given by
    
x 0 2 0 x
T y  = −1 3 0 y  .
z 0 0 3 z

Determine all the R[x]-submodules of V . [Hint: There are precisely eight of them.]
28. Consider the R[x]-module V = R3 , equipped with the linear transformation T : V → V given by
    
x 0 1 0 x
T y  = 0 0 1  y  .
z 0 0 0 z

Determine all the R[x]-submodules of V . [Hint: There are precisely 4 of them.]


29. Let (M, +) be an abelian group. Recall that End(M ) is the set of endomorphisms of M . For any
f, g ∈ End(M ), define the addition f + g and product f · g by
def def
(f + g)(x) = f (x) + g(x) and (f · g)(x) = f (g(x)).

Also define the 0 on End(M ) as the 0 function and the 1 in End(M ) as the identity function on M .
(a) Prove that (End(M ), +, ·) is a ring with additive identity 0 and multiplicative identity 1.
(b) Prove that any structure that makes M into a left R-module defines a ring homomorphism from
R into (End(M ), +, ·).
30. Let R be a ring with identity. Let M be the abelian group (R, +).
(a) Prove that the left-multiplication of R on itself makes M into a left R-module.
(b) Use Exercise 10.3.29 to show that R is isomorphic to a subring of End(M ).
[The result of this exercise shows that every ring is isomorphic to a subgroup of some endomorphism
ring of an abelian group.]
31. Section 6.7 introduced the notion of an algebraic integer and in particular the concept of algebraic
closure. (See Definition 6.7.7.) Let R be a subring of a commutative ring S and suppose that S has
an identity 1 that is also in R. Prove that the following three statements are equivalent:
(a) s ∈ S is integral over R;
(b) R[s] is a finitely generated R-module;
(c) s is an element in some subring T , where R ⊆ T ⊆ S, and such that T is a finitely generated
R-module.
[Hint: For showing (3) ⇒ (1), let {t1 , t2 , . . . , tk } be a finite generating subset of T as an R-module.
Then
Xk
sti = aij tj
j=1

for some aij ∈ R. Prove that s solves some associated characteristic equation whose polynomial is
monic and in R[x].]
32. Use Exercise 10.3.31 to prove the following facts about integrality in ring extensions. Let R and S be
as in Exercise 10.3.31.
(a) If s and s0 are in S and integral over R, then s ± s0 and ss0 are also integral over R.
10.4. HOMOMORPHISMS AND QUOTIENT MODULES 497

(b) The integral closure of R is a subring of S that contains R.


33. Let R be a commutative ring and consider the set Mn (R) of n × n matrices with coefficients in R.
Define the function [, ] : Mn (R) × Mn (R) → Mn (R) by [A, B] = AB − BA. Prove that [, ] is an
R-bilinear product on the R-module Mn (R). Show that the bilinear function [, ] is not associative on
Mn (R) and deduce that (Mn (R), [, ]) is an R-algebra that is not a ring.
34. Let R be a ring with identity 1R . Suppose that A is a ring with identity 1A that is also a left R-module
satisfying
r · (ab) = (r · a)b = a(r · b)
for all r ∈ R and all a, b ∈ A.
(a) Prove that the function f : R → A defined by f (r) = r · 1A is a ring homomorphism mapping
1R to 1A .
(b) Prove that f (R) is contained in the center C(A).
(c) Deduce that A has the structure of a unital associative R-algebra.
35. Let (V, [, ]) be an algebra over a field F (an vector space over F equipped with a bilinear function).
k
Suppose that V is finite dimensional and has an ordered basis B = (e1 , e2 , . . . , en ). Let γij ∈ F for
3
1 ≤ i, j, k ≤ n be a collection of n constants defined by
n
X k
[ei , ej ] = γij ek .
k=1

(Note that k is a superscript index and not a power.) Prove that if


   
a1 b1
 a2   b2 
[v]B =  .  and [w]B =  .  ,
   
 ..   .. 
an bn
then  
c1
 c2  n X
X n
  k
[v, w] B =  .  where ck = γij ai bj .
 
 ..  i=1 j=1
cn
k
[The collection of constants γij is called the structure constants of the F -algebra (V, [, ]).]

10.4
Homomorphisms and Quotient Modules
Following the Outline in the preface, the next logical topic in the presentation of the algebraic
structure of modules is that of homomorphisms. This section also includes quotient modules since
they are closely related.

10.4.1 – R-Module Homomorphisms


Module homomorphisms are generalizations of linear transformations between vector spaces.

Definition 10.4.1
Let R be a ring and let M and N be left R-modules. A function ϕ : M → N is called an
R-module homomorphism if

(1) ∀m1 , m2 ∈ M , ϕ(m1 + m2 ) = ϕ(m1 ) + ϕ(m2 );


(2) ∀m ∈ M, ∀r ∈ R, ϕ(rm) = rϕ(m).
498 CHAPTER 10. MODULES AND ALGEBRAS

To provide a few examples of R-module homomorphisms, we consider some homomorphisms


associated to the examples of modules given in Section 10.3.2.

Example 10.4.2 (Vector Spaces). If F is a field, then F -modules are vector spaces over F . Fur-
thermore, the axioms for R-module homomorphisms are precisely those for a linear transformation
between vector spaces. 4

Example 10.4.3. Example 10.3.5 pointed out that a ring R is a left R-module by left-multiplication.
An R-module homomorphism ϕ : R → R has different conditions than a ring homomorphism. Ax-
iom (2) in Definition 10.4.1 requires ϕ(rx) = rϕ(x) for all r, x ∈ R, whereas a ring homomorphism
f : R → R requires ϕ(rx) = ϕ(r)ϕ(x). 4

Example 10.4.4 (Z-Modules). Example 10.3.6 established that Z-modules are precisely the al-
gebraic structure of abelian groups. Let M and N be an abelian groups and let ϕ : M → N be
a function. Axiom (1) of Definition 10.4.1 requires that an R-module homomorphism be a group
homomorphism between (M, +) and (N, +). Now let ϕ : M → N be a group homomorphism. Then
for all positive integers n,

n times n times
z }| { z }| {
ϕ(n · x) = ϕ(x + x + · · · + x) = ϕ(x) + ϕ(x) + · · · + ϕ(x) = n · ϕ(x).

By other properties of group homomorphisms, we also have ϕ(0 · x) = 0ϕ(x) and ϕ((−n)x) =
(−n)ϕ(x). Hence, homomorphisms between abelian groups automatically satisfy the second axiom
of R-module homomorphisms. We conclude that Z-module homomorphisms are precisely group
homomorphisms between abelian groups. 4

The following two definitions are standard for algebraic structures. The set in the third definition
(Definition 10.4.10) has particular value in the theory of modules.

Definition 10.4.5
An R-module homomorphism ϕ : M → N is called an isomorphism if it is also a bijective
function. If there exists and isomorphism between two R-modules, then we say M and N
are isomorphic and we write M ∼ = N.

Example 10.4.6 (F [x]-modules). Let F be a field and let V and W be two (left) F [x]-modules.
Suppose that x acts on V according to a linear transformation T : V → V while x acts on W
according to a linear transformation S : W → W . By considering Axiom (2) for the subring F in
F [x], an F [x]-module homomorphism must be a linear transformation ϕ : V → W . However, we
also need that ϕ(xv) = xϕ(v). Hence, an F [x]-module homomorphism is a linear transformation
ϕ : V → W such that ϕ ◦ T = S ◦ ϕ. In other words, the following diagram of functions commutes.

ϕ
V W

T S

V ϕ W

By Definition 10.4.5, two F [x] modules are isomorphic if the linear transformation ϕ is a bijection.
Consequently, two F [x]-modules V with T and W with S are isomorphic as F [x]-modules if and only
ϕ
if V ∼
= W as vector spaces and T and S are similar linear transformations with S = ϕ ◦ T ◦ ϕ−1 . 4
10.4. HOMOMORPHISMS AND QUOTIENT MODULES 499

Definition 10.4.7
Let ϕ : M → N be an R-module homomorphism.
(1) The kernel of ϕ is Ker ϕ = {m ∈ M | ϕ(m) = 0}.
(2) The image of ϕ is Im ϕ = {n ∈ N | n = ϕ(m) for some m ∈ M }.

The following theorem is similar in many algebraic structures, so we omit the proof.

Proposition 10.4.8
Let ϕ : M → N be an R-module homomorphism.
(1) The kernel of ϕ is a submodule of M .

(2) The image of ϕ is a submodule of N .

Example 10.4.9. Consider the Z-module Z3 and consider the subset M = {(x, y, z) ∈ Z3 | x + 2y +
7z = 0}. It is easy to check by using the definition, that M is a submodule. However, it is even
easier to observe that M is a submodule of Z3 because M = Ker ϕ for the Z-module homomorphism
ϕ(x, y, z) = x + 2y + 7z. 4

Definition 10.4.10
Let M and N be left R-modules. The set of R-module homomorphisms from M to N is
denoted by HomR (M, N ). The set of R-module endomorphisms on M , i.e., homomorphisms
from M itself, is denoted by EndR (M ).

Proposition 10.4.11
If R is a commutative ring, then HomR (M, N ) is a left R-module.

Proof. If X is a set and (N, +) is an abelian group, then the set of functions Fun(X, N ), equipped
with addition of functions, is easily seen to be an abelian group. We note that HomR (M, N ) is a
nonempty subset of Fun(M, N ). Let ϕ, ψ ∈ HomR (M, N ). Then consider the function ϕ − ψ defined
by (ϕ − ψ)(x) = ϕ(x) − ψ(x). For all x, y ∈ M ,

(ϕ − ψ)(x + y) = ϕ(x + y) − ψ(x + y) = ϕ(x) − ψ(x) + ϕ(y) − ψ(y) = (ϕ − ψ)(x) + (ϕ − ψ)(y).

For all x ∈ M and all r ∈ R,

(ϕ − ψ)(rx) = ϕ(rx) − ψ(rx) = rϕ(x) − rψ(x) = r(ϕ − ψ)(x).

Hence, HomR (M, N ) is closed under subtraction so it is a subgroup of (Fun(M, N ), +).


The action of R on HomR (M, N ) defines (rϕ) as the function (rϕ)(x) = r(ϕ(x)). For all r, s ∈ R
and all x ∈ M ,
(rϕ)(sx) = rϕ(sx) = (rs)ϕ(x) = (sr)ϕ(x) = s(rϕ)(x)

where the third equality requires the commutativity of R. Thus, rϕ is an R-module homomorphism
so the scalar multiplication is well-defined. The axioms for scalar multiplication follow from the
properties of the scalar multiplication on N . 

The following proposition greatly generalizes the fact that the set of square matrices over a
commutative ring R is a untial associative R-algebra.
500 CHAPTER 10. MODULES AND ALGEBRAS

Proposition 10.4.12
Let R be a commutative ring with an identity 1 6= 0. The triple (End(M ), +, ◦) is a unital
associative R-algebra.

Proof. (Left as an exercise for the reader. See Exercise 10.4.11.) 

When discussing algebras in Section 10.3.4, we introduced the concept of a bilinear function
out of the Cartesian product of two R-modules. The reader should notice that a bilinear function
(Definition 10.3.23) is a function that is an R-module homomorphism in both entries separately.

10.4.2 – Quotient Modules


Let R be a ring. The goal in defining a quotient structure on a left R-module M is to use the
equivalence classes of an equivalence relation to obtain a new left R-module whose addition and
scalar multiplication are inherited from the addition and scalar multiplication of M .
We say that an equivalence relation ∼ on M behaves well with respect to the operations if:

• whenever m1 ∼ m2 and n1 ∼ n2 then m1 + n1 ∼ m2 + n2 ;

• whenever m1 ∼ m2 , then rm1 ∼ rm2 for all r ∈ R.

We need ∼ to behave well with respect to the operations in order for the quotient structure to have
the desired properties.

Proposition 10.4.13
Let ∼ be an equivalence relation on a left R-module M that behaves well with respect to
the operations. The quotient set M/ ∼, i.e., the set of equivalence classes on M , is a left
R-module with operations:
def def
[m]∼ + [n]∼ = [m + n]∼ and r[m]∼ = [rm]∼ . (10.7)

Proof. All the algebraic properties required for a left R-module will hold in M/ ∼ from the definitions
in (10.7). The only concern is whether the operations are well-defined. Let m1 , m2 ∈ [m], let
n1 , n2 ∈ [n], and let r ∈ R. Since the equivalence relation behaves well with respect to the operations
on M , then
m1 + n1 ∼ m2 + n2 and rm1 ∼ rm2
so the addition and the scalar multiplication on M/ ∼ are well-defined regardless of the choice of
representative for [m] and [n]. 

As with groups and rings, there are only a certain number of equivalence relations that behave
well with respect to the operations on M .

Proposition 10.4.14
Let R and M be as above. Let ∼ be an equivalence relation on M that behaves well with
respect to the operations on M . Then the equivalence class [0]∼ is a submodule of M .

Proof. Call U = [0]∼ . Let m, n ∈ U . Then m ∼ 0 and n ∼ 0. Since ∼ behaves well with respect to
the operations, m + n ∼ 0 + 0 = 0. Hence, U is closed under addition. Similarly, rm ∼ r0 = 0 so U
is closed under scalar multiplication. Hence, U is a submodule of M . 

More importantly, a converse holds.


10.4. HOMOMORPHISMS AND QUOTIENT MODULES 501

Proposition 10.4.15
Let U be a submodule of the left R-module M . Define the relation ∼U on M by m1 ∼U m2
if and only if m2 − m1 ∈ U . Then ∼U is an equivalence relation on M that behaves well
with respect to the operations. Furthermore, the equivalence class of 0 is the submodule
U.

Proof. Suppose that m1 ∼U m2 and n1 ∼U n2 . Then m2 − m1 ∈ U and n2 − n1 ∈ U . Since U is


closed under addition,

(m2 − m1 ) + (n2 − n1 ) = (m2 + n2 ) − (m1 + n1 ) ∈ U,

so m1 + n1 ∼U m2 + n2 . Also, since U is closed under scalar multiplication, for all r ∈ R,

rm1 ∼U rm2 .

Thus, ∼U behaves well with respect to the operations on M .


The equivalence class of 0 ∈ M is

[0] = {m ∈ M | m ∼U 0} = {m ∈ M | 0 − m ∈ U }.

But this is the submodule U since −m is in a submodule if and only if m is in that submodule. 

Definition 10.4.16
Let U be a submodule of a left R-module M and let ∼U be the equivalence relation given in
Proposition 10.4.15. The left R-module structure on M/ ∼U is called the quotient R-module
of M by U and is denoted by M/U .

Of key importance, Proposition 10.4.15 shows that it is possible to construct the quotient of M
from any submodule U . Furthermore, Proposition 10.4.14 establishes that the quotient module with
respect to a submodule is the only type of set quotient on M that induces a left R-module structure
from the operations on M .
As with quotient groups or quotient rings, we write elements in the quotient left R-module M/U
as m + U or more briefly as m, where the submodule U is understood by context. Addition and
scalar multiplication in M/U follow

(m + U ) + (n + U ) = (m + n) + U and r(m + U ) = (rm) + U

for all m, n ∈ M and all r ∈ R.


As with groups and rings, quotient modules are closely related to homomorphisms in the following
way. If U is a submodule of the left R-module M , then the function π : M → M/U defined by
π(m) = m + U is a homomorphism called the canonical projection.

10.4.3 – Isomorphism Theorems


As with groups and rings, the construction of quotient modules leads to four isomorphism theorems.
The proofs of these theorems follow from isomorphism theorems for abelian groups. All that remains
to be checked in each one is that the scalar multiplication behaves as it should for the theorem to
hold. Because of this similarity with previous proofs, we omit the proofs for modules.

Theorem 10.4.17 (First Isomorphism Theorem)


Let R be a ring and let ϕ : M → N be an R-module homomorphism between two left
R-modules. Then M/ Ker ϕ ∼ = Im ϕ.
502 CHAPTER 10. MODULES AND ALGEBRAS

Theorem 10.4.18 (Second Isomorphism Theorem)


Let R be a ring, let M be a left R-module and let N1 and N2 be submodules. Then
(N1 + N2 )/N1 ∼
= N2 /(N1 ∩ N2 ).

Theorem 10.4.19 (Third Isomorphism Theorem)


Let R be a ring, let M be a left R-module and let U ⊆ N be two nested submodules. Then
N/U is a submodule of M/U and (M/U )/(N/U ) ∼ = M/N .

Theorem 10.4.20 (Fourth Isomorphism Theorem)


Let R be a ring, let M be a left R-module and let U be a submodule. There is a bijection
between the submodules of M containing U and the submodules of M/U , given by the
correspondence N ↔ N/U . Furthermore, the correspondence preserves the submodule
partial order and if N1 and N2 are modules such that U ⊆ N1 , N2 ⊆ M , then

N1 /U + N2 /U ∼
= (N1 + N2 )/U and (N1 /U ) ∩ (N2 /U ) ∼
= (N1 ∩ N2 )/U.

10.4.4 – Quotient Vector Spaces


We saw the relevance and value of studying quotient groups and quotient rings. Most introductory
courses on linear algebra do not introduce the quotient construction for vector spaces because of its
abstract nature. With the theory for quotient R-modules at our disposal, we consider the application
to vector spaces and present new approaches to familiar theorems.

Proposition 10.4.21
Let V be a finite-dimensional vector space over a field F and let U be a subspace. Then

dim V /U = dim V − dim U.

Proof. Let {v1 , v2 , . . . , vr } be a basis of U and complete this set to a basis {v1 , v2 , . . . , vn } of V . We
show that B = {v r+1 , v r+2 , . . . , v n } is a basis of V /U . Obviously, {v1 , v2 , . . . , vn } spans V /U but
v 1 = v 2 = · · · = v r = 0 in V /U . Hence, B spans V /U . Now suppose that there exist coefficient
ci ∈ F such that
cr+1 v r+1 + cr+2 v r+2 + · · · + cn v n = 0.

Then

cr+1 v r+1 + cr+2 v r+2 + · · · + cn v n ∈ U = Span(v1 , v2 , . . . , vr )


=⇒cr+1 v r+1 + cr+2 v r+2 + · · · + cn v n = d1 v1 + d2 v2 + · · · + dr vr
=⇒ − d1 v1 − d2 v2 − · · · − dr vr + cr+1 v r+1 + cr+2 v r+2 + · · · + cn v n = 0.

Now since {v1 , v2 , . . . , vn } is linearly independent, we deduce that all the coefficients are 0, and in
particular that cr+1 = cr+2 = · · · = cn = 0. Hence, B is linearly independent and thus a basis of
V /U .
We have shown that if dim V = n and dim U = r, then dim V /U = n − r. The proposition
follows. 

This result, along with the First Isomorphism Theorem, recovers the Rank-Nullity Theorem, an
important theorem in linear algebra that shows that dimensions are not lost under the action of a
linear transformation.
10.4. HOMOMORPHISMS AND QUOTIENT MODULES 503

Theorem 10.4.22 (Rank-Nullity Theorem)


Let T : V → W be a linear transformation between vector spaces over a field F . Then

dim(Ker T ) + dim(T (V )) = dim V,

i.e.,
nullity(T ) + rank(T ) = dim V.

Proof. By the First Isomorphism Theorem, V / Ker T ∼


= T (V ). Then, by Proposition 10.4.21,

dim T (V ) = dim(V / Ker T ) = dim V − dim(Ker T ),

and the theorem follows. 

Exercises for Section 10.4


1. Suppose that R has an identity 1 6= 0. Prove that a function ϕ : M → N between left R-modules is
an R-module homomorphism if and only if ϕ(rm1 + m2 ) = rϕ(m1 ) + ϕ(m2 ) for all m1 , m2 ∈ M and
all r ∈ R.
2. Let ϕ : M → N be an R-module homomorphism. Prove that ϕ is injective if and only if Ker ϕ = {0}.
3. Let M1 , M2 , . . . , Mk be a collection of R-modules. Define the ith projection function πi : M1 ⊕ M2 ⊕
· · · ⊕ Mk → Mi by πi (m1 , m2 , . . . , mk ) = mi . Prove that πi is an R-module homomorphism.
4. Let R be a ring and suppose that constants c1 , c2 , . . . , cn are in the center of R. Prove that the function
ϕ : Rn → R defined by ϕ(r1 , r2 , . . . , rn ) = c1 r1 + c2 r2 + · · · + cn rn is an R-module homomorphism.
5. Let R be a ring and let ϕ : M → N be an R-module homomorphism between two left R-modules.
(a) Prove that if U is a submodule of M , then ϕ(U ) is a submodule of N .
(b) Prove that if V is a submodule of N , then ϕ−1 (V ) is a submodule of M .
6. Let√R = Q[x] and consider the module M√= Q[x]/(x2 − 2). We know that as a ring Q[x]/(x2 − 2) ∼
=
Q[ 2] so we write elements in M as a + b 2.

(a) Determine x · (a + b 2).

(b) Determine xn · (a + b 2).

(c) Calculate (x3 − x2 + 3x + 1) · (a + b 2).
7. Let R be a ring. Let A = (aij ) ∈ Mn×m (R). Supposing that we write elements of Rn and Rm as
column vectors, prove that the function f : Rm → Rn defined by

a12 · · · a1m
    
r1 a11 r1
 r2   a21 a22 · · · a2m   r2 
f . = .
    
.. .. ..   .. 
 ..   .. . . .  . 
rm an1 an2 · · · anm rm

is an R-module homomorphism if R is a commutative ring. Prove that f might not be an R-module


homomorphism when R is not commutative.
8. Let F be a field and let V and W be F [x]-modules with such that x acts on V according to T : V → V
and x acts on W according to S : W → W . Prove that V ∼ = W as F [x] modules if and only if V and
W are isomorphic as vector spaces and T and S are similar, i.e., S = ϕ ◦ T ◦ ϕ−1 .
9. Let G be a group. Prove that an R[G]-module homomorphism is a homomorphism of group represen-
tations as defined in Definition 8.6.13.
10. Prove that HomZ (Z/mZ, Z/nZ) ∼ = Z/dZ, where d = gcd(m, n).
11. Prove Proposition 10.4.12.
12. Consider the Z-module M = Z2 and let N be the subspace N = {s(3, 1) + t(2, 5) | s, t ∈ Z}. Prove
that M/N ∼= Z13 . See Figure 10.1.
13. Consider the Z-module M = Z2 and let N be the subspace N = {s(4, 1) + t(2, 5) | s, t ∈ Z}. Determine
the isomorphism type of M/N .
504 CHAPTER 10. MODULES AND ALGEBRAS

Figure 10.1: Fundamental region in a quotient of Z2

14. Let M1 , M2 be left R-modules and let Ni be a submodule of Mi for i = 1, 2. Prove that

(M1 ⊕ M2 )/(N1 ⊕ N2 ) ∼
= (M1 /N1 ) ⊕ (M2 /N2 ).

15. Consider the cyclic group Z4 = hz | z 4 = 1i and the group ring R[Z4 ]. Let V = R4 and consider the
action of R[Z4 ] on V by scalar multiplication for real numbers and z acting on the standard basis
vectors by shifting through, them, i.e., z~ei = ~ei+1 for i = 1, 2, 3 and z~e4 = ~e1 .
(a) Show that the data equips V with the structure of a R[Z4 ]-module.
(b) Prove that the subspace W = {(x1 , x2 , x3 , x4 ) | x1 − x2 + x3 − x4 = 0} is a submodule.
(c) Determine the structure of V /W and state the action of R[Z4 ] on it.

16. Let ϕ : M → M be an R-module homomorphism such that ϕ2 = ϕ. (We call ϕ idempotent.)


(a) Prove that Im ϕ ∩ Ker ϕ = {0}.
(b) Prove that M = Im ϕ + Ker ϕ.
(c) Deduce that M = Im ϕ ⊕ Ker ϕ.
[An idempotent endomorphism is called a projection onto the submodule U where U = Im ϕ.]

17. Let F be a field, V be a finite-dimensional vector space over F , and U a vector subspace. Prove that
V ∼
= U ⊕ V /U .

10.5
Free Modules and Module Decomposition
One of the topics in the Outline that we have not presented so far is that of conveniently describing
elements in a R-module. We saw that vector spaces have bases and that each basis allows us to
describe a vector by its coordinates. If M is a module over an arbitrary ring R, many properties about
bases, rank, and coordinates do not hold generally and require careful attention. For example, not
all modules have a basis. However, like many other structures, we can define the notion of generating
subsets. With R-modules, the study of how to describe elements in the module leads naturally to
the question of internal structure within the module and then to decomposition of the module into
direct sums. This section studies these topics.
10.5. FREE MODULES AND MODULE DECOMPOSITION 505

10.5.1 – Generating Subsets

Definition 10.5.1
Let R be a ring and let M be a left R-module. Let S be a subset of M . A linear combination
of elements in M is an expression

r1 m1 + r2 m2 + · · · + rk mk (10.8)

such that k is a positive integer, ri ∈ R and mi ∈ S for i = 1, 2, . . . , k. The set of all linear
combinations of S is called the span of S and is denoted by Span(S) or hSi or (S).

To emphasize that the coefficients ri come from the ring R, we sometimes call an expression like
(10.8) an R-linear combination.
As with vector spaces, again note that a linear combination consists of a finite expression of terms
from S. There are different notations for the set of linear combinations because modules generalize
a variety of algebraic structures and these notations come from these other structures.
It is sometimes convenient to refer to a linear combination of elements in S by the notation

finite
X
rm m (10.9)
m∈S

with this notation assumes that the coefficients rm are 0 for all but a finite number of m ∈ S. We
sometimes refer to such a collection {rm }m∈S as an almost zero family of elements in R whenever
all but a finite number of the rm are 0. If S is finite with S = {m1 , m2 , . . . , mk }, then the notation
in (10.9) is identical to (10.8).

Proposition 10.5.2
Let R be a ring and let S be a subset of a left R-module M . Then Span(S) is a submodule
of M .

Proof. (Left as an exercise for the reader. See Exercise 10.5.1.) 

Definition 10.5.3
A left R-module M is said to be generated by S if M = Span(S). A left R-module M
is said to be finitely generated if it has a finite subset S such that M = Span(S). A left
R-module is called cyclic if it can be generated by a single element.

If S is finite there are a few common alternate notations for the span. If S = {a}, some authors
denote Span(a) by Ra. If S = {a1 , a2 , . . . , ak }, then it is not uncommon to write Ra1 +Ra2 +· · ·+Rak
for Span(a1 , a2 , . . . , ak ).
The Z-module Z3 is generated by {(1, 0, 0), (0, 1, 0), (0, 0, 1)} so Z3 is finitely generated. In
contrast, let S = {p1 (x), p2 (x), . . . , pn (x)} be any finite subset of the Z-module Z[x]. The set of
Z-linear combinations consists of polynomials of the form

r1 p1 (x) + r2 p2 (x) + · · · + rn pn (x).

By properties of the degree, we see that for all p(x) ∈ Span(S), deg p(x) ≤ max{deg pi (x)}. Thus,
Span(S) is never equal to Z[x]. Hence, Z[x] is not finitely generated as a Z-module.
506 CHAPTER 10. MODULES AND ALGEBRAS

10.5.2 – Linear Independence

Definition 10.5.4
A subset S of a left R-module M is said to be linearly independent if

r1 m1 + r2 m2 + · · · + rk mk = 0 in M =⇒ m1 = m2 = · · · = mk = 0

for all finite subsets {m1 , m2 , . . . , mk } ⊆ S. Otherwise, S is linearly dependent.

In linear algebra, we saw that linear independence implied the uniqueness of coefficients in a
linear combination. The same holds true for modules.

Proposition 10.5.5
If S is a linearly independent set in a left R-module M , then if two linear combinations are
equal
finite
X finite
X
0
rm m = rm m,
m∈S m∈S
0
then rm = rm for all m ∈ S.

0
Proof. Note that in both summations, only a finite number of the coefficients rm and rm are nonzero,
0
hence the union of the subsets of S over which rm and rm are nonzero is a finite subset. Thus, equality
of the two linear combinations gives,
finite
X
0
(rm − rm )m = 0.
m∈S

0 0
Since S is linearly independent, rm − rm = 0 for all m ∈ S, so rm = rm . 

10.5.3 – Basis
Following the organization for vector spaces, it is natural now to define the concept of a basis.

Definition 10.5.6
A subset S of a left R-module M is called a basis of M if M is generated by S and if S is
linearly independent. A left R-module is called free if it has a basis.

In Section 10.2.5, we saw that all vector spaces have bases. This key property of vector spaces
can be restated by saying that every F -vector space is a free F -module.

Example 10.5.7. In contrast, consider (Z/5Z, +) as a Z-module. Any subset S of Z/5Z except for
S = ∅ or S = {0} generates Z/5Z. However, 5m = 0 for all m ∈ S, so no subset S of M is linearly
independent. Thus, Z/5Z does not have a basis, i.e., is not a free module. 4

In the above example, the failure to have a basis came from the presence of torsion elements in
the R-module. However, there are other reasons that may prevent the existence of a basis as the
following example shows.

Example 10.5.8. Let R be a commutative ring that is not a PID (principal ideal domain) and let
I be a nonprincipal ideal. A left ideal I is a left R-module. Let S be any minimal generating subset
of I. Since I is nonprincipal, S contains two distinct nonzero elements r1 , r2 ∈ S. Then

(r2 )r1 + (−r1 )r2 = 0


10.5. FREE MODULES AND MODULE DECOMPOSITION 507

while r2 6= 0 and −r1 6= 0. Hence, S is not linearly independent. Since no generating set of I is
linearly independent, I does not have a basis. √
√ Exercise 6.1.13 showed that I = (3, 2 + −5) is not a principal ideal in
As a√specific example,
R = Z[ −5]. Since Z[ −5] is an integral domain, it has no zero divisors (torsion elements in this
case) so the phenomenon of not having a basis does not stem only from the presence of torsion
elements. 4

If M is a free left R-module with a basis B, since B spans M , then every element in m ∈ M can
be written as a linear combination

m = r1 f1 + r2 f2 + · · · + rk fk

with ri ∈ R and fi ∈ B or alternatively as


finite
X
m= rb b, (10.10)
b∈B

where {rb }b∈B is an almost zero family of ring elements. By Proposition 10.5.5, since B is linearly
independent, this linear combination is unique. As in linear algebra, this balance between linear
independence and spanning leads to the notion of coordinates.

Definition 10.5.9
The unique almost zero family {rb }b∈B in (10.10) is called the coordinates of m with respect
to B.

Suppose that M has a finite basis B = {m1 , m2 , . . . , mk }. The unique expression of an element
m ∈ M as a linear combination

m = r1 m1 + r2 m2 + · · · + rk mk ,

defines a function f : M → Rk by ϕ(m) = (r1 , r2 , . . . , rk ). It is obvious that the function is additive,


i.e., f (m + m0 ) = f (m) + f (m0 ). Also, if c ∈ R, then

cm = c(r1 m1 + r2 m2 + · · · + rk mk ) = cr1 m1 + cr2 m2 + · · · + crk mk

so f (cm) = (cr1 , cr2 , . . . , crk ) = c(r1 , r2 , . . . , rk ) = c(f (m)). Hence, the function f is an R-module
homomorphism. The function is obviously surjective and, since a basis is linearly independent,
Ker f = {0} so f is injective. Therefore, f is an isomorphism. This gives us the following theorem.

Theorem 10.5.10
Let M be a free left R-module with a finite basis B such that |B| = k. Then M ∼
= Rk .

Another key property of vector spaces proved in Section 10.2.5 is that any two bases of a vector
space have the same cardinality. In vector spaces, this gave us the notion of dimension. This result
is true for various classes of rings but not for arbitrary rings.

Theorem 10.5.11
Let R be a commutative ring with an identity 1 6= 0. Suppose that a free left R-module M
has a basis with k elements. Then every other basis is finite and has k elements.

Proof. First suppose that one basis {ei | 1 ≤ i ≤ k} is finite and that S is another basis, not
necessarily finite. Then for all i,

ei = ci1 mi,1 + ci2 mi,2 + · · · + cij(i) mi,j(i)


508 CHAPTER 10. MODULES AND ALGEBRAS

for some mi,j(i) ∈ S. The subset S 0 = {mi,u | 1 ≤ i ≤ k, 1 ≤ u ≤ j(i)} ⊆ S is finite with


|S 0 | ≤ j(1) + j(2) + · · · + j(k) and S 0 generates M . As a subset of the basis S, S 0 is still linearly
independent. Hence, S 0 is a basis and thus S 0 = S. Therefore, S is finite.
Now let {ei | 1 ≤ i ≤ k} and let {fj | 1 ≤ j ≤ `} be two bases of M . Then each element of one
basis is a linear combination of elements in the other, so
k
X `
X
fj = aji ei and ei = bij fj
i=1 j=1

for some aji , bij ∈ R.


Assume that k 6= `. Without loss of generality, suppose that k < `. Substitution gives
  !
X k `
X k X
X ` `
X k
X
fj = aji  bij 0 fj 0  = aji bij 0 fj 0 = aji bij 0 fj 0 .
i=1 j 0 =1 i=1 j 0 =1 j 0 =1 i=1

Since {f1 , f2 , . . . , f` } is a basis of M , then

k
(
X 1 if j = j 0
aji bij 0 = (10.11)
i=1
0 6 j0.
if j =

Set 
···

b11 b12 b1`
  b21 b22 ··· b2` 
a11 a12 ··· a1k 0 ··· 0  .. .. .. 
 
..
a21 a22 ··· a2k 0 ··· 0
 .
 . . . 
A= . bk1
and B =  bk2 ··· bk` .
 
.. .. ..
 ..

. . . 0 ··· 0  0 0 ··· 0
a`1 a`2 ··· a`k 0 ··· 0
 
 . .. .. .. 
 .. . . . 
0 0 ··· 0
Then (10.11) in matrix multiplication is AB = I` . By Proposition 5.3.11, since R is commutative,
(det A)(det B) = 1. But this is a contradiction since det A = det B = 0 because of the column (resp.
row) of 0s in A (resp. B). This contradicts the assumption that k 6= `. 

The following example describes a noncommutative ring and a module (itself) that has bases of
different cardinalities.

Example 10.5.12. Let M = Fun(N, Z) be the set of sequences into Z. This is naturally a Z-
module. Proposition 10.4.12 shows that the set of endomorphisms of M , namely R = EndZ (M ), is
a unital associative algebra, i.e., a ring with an identity 1 that is the identity endomorphism.
Consider the ring elements ϕe , ϕo ∈ R defined by ϕe (a)i = a2i and ϕo (a)i = a2i+1 , for all
sequences a = (ai )i∈N . The endomorphism ϕe (resp. ϕo ) takes a sequence a ∈ M and returns the
sequence of even (resp. odd) indexed terms. Consider also the elements ψe , ψo ∈ R that map a
sequence a ∈ M to a sequence with the terms,
( (
ai/2 if i even 0 if i even
ψe (a)i = and ψo (a)i =
0 if i is odd a(i−1)/2 if i is odd.

The terms of ψe (a) are the terms of a but spaced out every other term and with a 0 for the odd
terms. Consequently, ψe is the right inverse of φe , namely φe ◦ ψe = 1. Similarly φo ◦ ψo = 1. It is
also easy to tell that φe ◦ ψo = 0 and φo ◦ ψe = 0. Finally, we claim that

ψe ◦ φe + ψo ◦ φo = id,
10.5. FREE MODULES AND MODULE DECOMPOSITION 509

the identity endomorphism on M . Indeed, for all sequences a ∈ M we have


( (
ai if i even 0 if i even
ψe (φe (a))i = and ψo (φo (a))i =
0 if i is odd ai if i is odd

so the resulting addition is (ψe ◦ φe + ψo ◦ φo )(a)i = ai .


Obviously, the set {id} is a basis of R as a left R-module. Now let α ∈ R be any element. Then

α ◦ id = α ◦ (ψe ◦ φe + ψo ◦ φo ) = (α ◦ ψe ) ◦ φe + (α ◦ ψo ) ◦ φo .

In particular, α ∈ Span(φe , φo ) and so the set {φe , φo } generates R as a left R-module. Furthermore,
suppose that γ1 ◦ φe + γ2 ◦ φo = 0. Multiplying (composing) on the right by ψe gives

0 = γ1 ◦ φe ◦ ψe + γ2 ◦ φo ◦ ψe = γ1 ◦ 1 = γ1

and similarly, γ2 = 0. Hence, {φe , φo } is a linearly independent set. Thus, {φe , φo } is a basis of
R. Consequently, we have produced a basis of R consisting of 1 element and a basis consisting of 2
elements. By Theorem 10.5.10, we deduce that R satisfies the unusual property that R ∼ = R2 . 4

Definition 10.5.13
Let R be an arbitrary ring and let M be a free R-module. If every basis of M has the same
cardinality, we call this cardinality the rank of M and denote it by rankR (M ).

The rank of a vector space over a field F is precisely the dimension. Theorem 10.5.11 shows that
if R is a commutative, unital ring and M is an R-module with a finite basis, then every basis of M
has the same cardinality. In this case, the concept of rank applies. As Example 10.5.12 shows, it is
possible for a module to be free but for the rank not to be defined.
By a trivial application of this terminology, we will call the trivial module {0} free of rank 0.

10.5.4 – Finitely Generated R-Algebras


We briefly turn to the concept of generating sets for R-algebras.
In the context of R-algebras, the concept of a generating set differs from the same terminology
for modules. For example, consider the set of polynomials F [x] over a field F . As an F -module, a
generating set for F [x] is S = {1, x, x2 , x3 , . . .}. However, as an F -algebra, S 0 = {1, x} is a generating
set. This is because applying the product (bilinear function) to elements of S 0 gives all the elements
of S and then F -linear combinations of S give all polynomials in F [x].

Definition 10.5.14
Let R be a commutative ring and let A be an R-algebra. Then A is a finitely generated
R-algebra if there exists a finite subset S ⊆ A such that the smallest subalgebra of A that
contains S is all of A.

According to this terminology, F [x] (equipped with it usual addition, multiplication, and scalar
multiplication) is not finitely generated as an F -module but is finitely generated as an F -algebra.
The same is true of F [x1 , x2 , . . . , xn ] for any positive integer n. In contrast, the polynomial ring
with a countably infinite number of variables F [x1 , x2 , x3 , . . .] gives an example of an F -algebra that
is not finitely generated.

10.5.5 – Direct Sums and Complete Reducibility


Corollary 10.2.17 established that every vector space has a basis; the same is no longer true for left
R-modules with an arbitrary ring R. Vector spaces also enjoy the property that if U is a subspace
of a vector space V , then there exists another subspace U 0 of V such that V = U ⊕ U 0 (called a
510 CHAPTER 10. MODULES AND ALGEBRAS

complementary subspace). This generalizes to the fact that if B = {v1 , v2 , . . . , vn } is a basis of a


vector space V , then V decomposes as a direct sum into one-dimensional subspaces as
V = Span(v1 ) ⊕ Span(v2 ) ⊕ · · · ⊕ Span(vn ). (10.12)
With modules, this property of decomposition is not as easy. Even if (which is not always the case)
a left R-module M has a basis, the module does not necessarily decompose into a direct sum like
(10.12).
Example 10.5.15. Let R = Z and let M = Z with usual integer multiplication on the left as the
scalar multiplication. Let m be a positive integer greater than 1. The ideal I = mZ is a strict
submodule of M . If another submodule (ideal) nZ satisfies mZ + nZ = Z, then there must exist
s, t ∈ Z such that sm + tn = 1. In other words, gcd(m, n) = 1 and m and n are relatively prime.
However, in that case, mZ ∩ nZ = mnZ. Since the intersection is nontrivial, Z is not isomorphic to
nZ ⊕ mZ. 4
On the other hand, consider the following example, in which something akin to (10.12) occurs
but in a way that may initially seem counterintuitive.
Example 10.5.16. Let R = R[x] and let V = R3 be the left R-module defined by x acting on R3
as the linear transformation T given with respect to the standard basis as
 
1 −1 0
T (~v ) = 1 1 0 ~v .
0 0 2

The linear transformation has the effect of rotating by π/4 and dilating by 2 around the origin in
the xy-plane and also dilating by a factor of 2 along the z-axis. The xy-plane U and the z-axis W
are T -invariant, so they are R[x]-submodules of V . It is easy to see that U ⊕ W = R3 .
Though U is a two-dimensional subspace of V , it has no T -invariant nontrivial subspaces. Hence,
the R[x]-submodule U of V is not the direct sum of any pair of one-dimensional vector spaces that
are R[x]-submodules. However, it is important to note that the previous observation, though correct,
is not particularly useful. We need to be careful not to mix algebraic structures too flippantly. Note
that
(c2 x + (c1 − c2 )) · ~e1 = c2 (~e1 + ~e2 + (c1 − c2 )~e1 = c1~e1 + c2~e2 .
Consequently, U = Span(~e1 ). It is also easy to see that W = Span(~e3 ). In other words, we deduce
that {~e1 , ~e3 } is a basis for the R[x]-module V , that V is free of rank 2, and that
V = Span(~e1 ) ⊕ Span(~e3 ).
We underscore that Span here is not a R-vector space span but an R[x]-module span. 4
If a module M can be broken up into the direct sum M = M1 ⊕ M2 where M1 and M2 are
submodules, then this decomposition simplifies the effort to understand the internal structure of
the module. Indeed, we only then need to understand the structure in M1 and in M2 separately.
Because of property (10.12), the problem of decomposition does not figure at all in linear algebra. In
group theory, we saw how useful decomposition figures and the key relevant theorem was the Direct
Sum Decomposition Theorem. A similar theorem holds for modules. This theorem is looser than
the property (10.12) for vector spaces but is similar in nature.

Theorem 10.5.17 (Direct Sum Decomposition)


Let R be a ring and let M be an R-module. Let N1 , N2 , . . . , Nk be a finite collection of
submodules satisfying:

(1) M = N1 + N2 + · · · + Nk ;
(2) (N1 + N2 + · · · + Ni ) ∩ Ni+1 = {0} for all i = 1, 2, . . . , k − 1.
Then M ∼
= N1 ⊕ N2 ⊕ · · · ⊕ Nk .
10.5. FREE MODULES AND MODULE DECOMPOSITION 511

Proof. We prove the theorem by induction on k. If k = 1, then the theorem is trivial. Now
suppose that the theorem holds for some positive integer k and suppose also that M has a collection
N1 , N2 , . . . , Nk+1 that satisfies (1) and (2). By (1), every element m ∈ M can be written as a sum

m = n1 + n2 + · · · + nk+1

where ni ∈ Ni for all 1 ≤ i ≤ k + 1. Suppose that m = n01 + n02 + · · · + n0k+1 as well. Then

0 = m − m = ((n1 + n2 + · · · + nk ) − (n01 + n02 + · · · + n0k )) + (nk+1 − n0k+1 ),

so
(n0k+1 − nk+1 ) = ((n1 + n2 + · · · + nk ) − (n01 + n02 + · · · + n0k )) .
By condition (2), we deduce that both sides of the above equality must be 0. Thus,

nk+1 = nk+1 and n1 + n2 + · · · + nk = n01 + n02 + · · · + n0k .

The induction hypothesis then implies that ni = n0i for all 1 ≤ i ≤ k + 1. This shows that each
m ∈ M can be written uniquely as a sum of elements in N1 , N2 , . . . , Nk+1 . This defines a function
ϕ : M → N1 ⊕N2 ⊕· · ·⊕Nk+1 by setting ϕ(m) = (n1 , n2 , . . . , nk+1 ) whenever m = n1 +n2 +· · ·+nk+1
with ni ∈ Ni . This function is surjective and we just showed that it is injective. Furthermore, it is
easy to check that it is additive. Also, for all r ∈ R, since rni ∈ Ni for all ni ∈ Ni ,

ϕ(rm) = ϕ(rn1 + rn2 + · · · + rnk+1 ) = (rn1 , rn2 , . . . , rnk+1 ) = r(n1 , n2 , . . . , nk+1 ).

This finishes showing that ϕ is an R-module isomorphism. 

Example 10.5.15 underscored the fact that in modules, for every submodule there does not neces-
sarily exist a complementary submodule. This observation leads us to distinguish a few possibilities.

Definition 10.5.18
Let R be a ring and let M be a nonzero R-module.
(1) If M contains a strict nonzero submodule N , i.e., 0 ( N ( M , then M is called
reducible. Otherwise, M is called irreducible (or simple).

(2) If there exist nonzero submodules M1 and M2 such that M = M1 ⊕ M2 , then M is


called decomposable. Otherwise, M is called indecomposable.
(3) If M is a direct sum of irreducible submodules, then M is called completely reducible.

Every irreducible module is trivially indecomposable. However, Example 10.5.15 shows that Z
is indecomposable but not irreducible. Similarly, every module is a direct sum of indecomposable
submodules but those submodules need not be irreducible. Hence, every module is completely
decomposable (if we had bothered to define such an expression) but not every module is completely
reducible. In particular, Z as a module of itself, is not completely reducible.
With the terminology of Definition 10.5.18, we can rephrase our above observation about vector
spaces by stating that if F is a field, then any nonzero F -module of dimension 2 or greater is
decomposable. In particular, the indecomposable modules are one-dimensional vector spaces (all
isomorphic to F ). It is also true that the irreducible modules are also the one-dimensional vector
spaces. This also implies that every finite-dimensional vector spaces is completely reducible.
With the examples given so far, the reader may suspect that the failure of a module to be
completely decomposable may come from the properties of the ring, like a discreteness property as
in Z or the possibility that torsion elements arise in quotient modules. However, consider the left
R[x]-module consisting of V = R2 , where x acts on R2 as multiplication by the matrix
 
2 1
A= .
0 2
512 CHAPTER 10. MODULES AND ALGEBRAS

The subspace U = Span(~e1 ) is an eigenspace of A so it is an R[x]-submodule of V . Assume there


exists another submodule W of V such that V = U ⊕ W . Then W is one-dimensional and in-
variant under multiplication by A. Hence, W is an eigenspace of A. However, A has no other
one-dimensional eigenspaces besides U . Thus, the assumption of the existence of A leads to a con-
tradiction. Therefore, V is reducible with submodule U but it is indecomposable. Hence, V is not
completely reducible.

Definition 10.5.19
If M1 is a submodule of M such that there exists another submodule M2 with M = M1 ⊕M2 ,
(1) M1 is called a summand of M ;
(2) M2 is called a complement of M1 in M ;
(3) We call M1 and M2 complementary submodules.

The following proposition gives a characterization of when a submodule M1 of a module M is a


summand of M .

Proposition 10.5.20
Let M1 be a submodule of the the R-module M . Then M1 is a summand of M if and only
if there exists a projection of M onto M1 , i.e., a function π : M → M with π 2 = π and
π(M ) = M1 .

Proof. First suppose that M = M1 ⊕M2 for some complementary submodule M2 . Then the function
π : M1 ⊕ M2 → M1 ⊕ M2 defined by π(x, y) = (x, 0) has M1 for an image and is the identity
function on the submodule M1 , which we identify with {(m1 , 0) | m1 ∈ M1 }. Hence, π is a projection
homomorphism.
Conversely, suppose that there exists a projection of M onto M1 . Let m ∈ M . Obviously,
m = π(m) + (m − π(m)). Note that π(m) ∈ Im π = M1 and

π(m − π(m)) = π(m) − π(π(m)) = π(m) − π(m) = 0,

so m − π(m) ∈ Ker π. Furthermore, if u ∈ Im π ∩ Ker π, then π(u) = u since u ∈ Im π and π(u) = 0


since u ∈ Ker π. Thus, u = 0 and Im π ∩ Ker π = {0}. This shows that Im π = M1 and Ker π are
complementary submodules. By the Direct Sum Decomposition, M = M1 ⊕ Ker π. 

Exercises for Section 10.5


1. Prove Proposition 10.5.2.
2. Show that if M = M1 ⊕ M2 , then M1 ∼
= M/M2 and M2 ∼
= M/M1 .
3. Let R = Z[x] and consider the ideal I = (3, x + 1) in R. Prove that I is not a free R-module.
4. Let R = Q[x] and consider the module M = Q where the action of R on M is defined by (p(x), m) 7→
p(1)m.
(a) Prove that this action makes M into an R-module.
(b) Prove that M is a torsion module and that Ann(M ) = (x − 1).
(c) Deduce that M is not free.
5. Let N be a submodule of the R-module M . Prove that if N and M/N are finitely generated, then so
is M .
6. Prove that a commutative ring R is a PID if and only if every submodule of R is a free R-module.
10.6. FINITELY GENERATED MODULES OVER PIDS, I 513

7. Let R be a commutative ring. Prove that for any R-module homomorphism ϕ : Rn → Rm , there exists
a unique matrix A ∈ Mm×n (R) such that ϕ(r1 , r2 , . . . , rn ) can be written as (using column vectors)
 
r1
 r2 
A . .
 
 .. 
rn

8. Let R be a commutative ring. Prove that an ideal I in R is a free R-module if and only if I is a
principal ideal generated by an element that is not a zero divisor.
9. Let R be an integral domain and let M be a finitely generated torsion module (i.e., Tor(M ) = M ).
Prove that M has a nonzero annihilator. Given an example of a torsion module whose annihilator is
the zero ideal.
10. Let R be an arbitrary ring and let M be a free module over R. Prove that if M has a finite basis,
then every basis of M is finite.
11. Show that the R-module HomR (Rm , Rn ) is free and of rank mn.
12. Let M, N1 , N2 be left R-modules. Prove the following R-module isomorphisms:
(a) HomR (N1 ⊕ N2 , M ) ∼= HomR (N1 , M ) ⊕ HomR (N2 , M ).
(b) HomR (M, N1 ⊕ N2 ) = ∼ HomR (M, N1 ) ⊕ HomR (M, N2 ).
13. Let R be an arbitrary ring and let M and M 0 be free left R-modules. Let B be a basis of M and let
B0 be a basis of M 0 . Prove that if |B| = |B0 | then M ∼
= M 0 . Do not assume that the bases are finite.
14. Find all the irreducible Z-modules.
15. Show that an R-module M is irreducible if and only if M 6= {0} and M is cyclic with every nonzero
elements as a generator.
16. (Schur’s Lemma) Show that if M1 and M2 are irreducible R-modules, then every homomorphism
between them is the 0 homomorphism or is an isomorphism. Deduce that EndR (M ) is a division ring.
[Hint: See Proposition 10.4.12.]
17. Let M be an R-module and suppose that M = M1 ⊕ M2 , where M1 and M2 are nonisomorphic
irreducible R-modules. Prove that EndR (M ) ∼
= EndR (M1 ) ⊕ EndR (M2 ) as unital associative R-
algebras.
18. An element in a ring R is called a central idempotent element if it is idempotent and in the center of
R.
(a) Prove that if e is a central idempotent element and M is a left R-module, then M = eM ⊕(1−e)M .
(b) Let M be a left R-module. Prove that the central idempotent elements in the ring EndR (M )
are the projection functions.
(c) Clearly explain how part (a) is tantamount to Proposition 10.5.20.
19. Show that any direct sum of free left R-modules is again free.
20. Prove that if R is a ring such that R ∼
= R2 , then R ∼
= Rk for any positive integer k.

10.6
Finitely Generated Modules over PIDs, I
Section 4.5 presented the Fundamental Theorem of Finitely Generated Abelian Groups (FTFGAG).
From the perspective of modules, abelian groups are simply Z-modules. Having discussed free mod-
ules and generating modules by subsets, it is natural to wonder if FTFGAG generalizes to modules
over other rings. The first key theorem in establishing all parts of FTFGAG was Theorem 4.5.9,
which establishes that every submodule of a free Z-module is also free. Example 10.5.8 in the pre-
vious section shows that a generalization to FTFGAG cannot extend to rings that are not principal
ideal domains. However, FTFGAG does have an equivalent for modules over a PID.
514 CHAPTER 10. MODULES AND ALGEBRAS

Both fields and Z are principal ideal domains. Consequently, since finite-dimensional vector
spaces and finitely generated abelian groups both fall under the umbrella of this section, we will see
how the theorems of this section resemble theorems already encountered in these two contexts.
Let R be a PID and M a finitely generated R-module generated by elements x1 , x2 , . . . , xn . If
(e1 , e2 , . . . , en ) is the standard (ordered) basis on Rn , then the function ϕ : Rn → M by
n
! n
X X
ϕ a i ei = ai xi (10.13)
i=1 i=1

is a surjective homomorphism. Then by the First Isomorphism Theorem, M ∼ = Rn / Ker ϕ. This


observation provides a strategy to study the structure of M : Namely, first study the structure of
submodules of free modules over a PID and apply it to Ker ϕ.
The main theorem of this section gives a characterization of submodules of free modules over
PIDs. The following section gives a number of consequences of this structure theorem, along with
the Fundamental Theorem of Finitely Generated Modules over PIDs.

10.6.1 – Submodules of Free Modules


The main theorem of this section describes a “best” basis of a submodule in relation to a basis of
the free module.

Theorem 10.6.1
Let R be a PID, let L be a free R-module of rank n and let K be a nontrivial submodule.
The K is free of rank m ≤ n and there exists a basis {x1 , x2 , . . . , xn } of L and elements
d1 , d2 , . . . , dm ∈ R such that di |dj if i ≤ j and {d1 x1 , d2 x2 , . . . , dm xm } is a basis of K.
Furthermore, if {z1 , z2 , . . . , zn } is another basis of L such that {d01 z1 , d02 z2 , . . . , d0s zs } is a
basis of K satisfying d0i |d0j when i ≤ j, then s = m and d0i = di for 1 ≤ i ≤ m.

This theorem directly generalizes Theorem 4.5.9 for abelian groups. However, the proof for
that theorem used integer division. Consequently, the proof could possibly be modified to work
for any module over a Euclidean domain but a slightly different approach is required to generalize
it to modules over PIDs. Since a PID is a UFD, every nonzero, nonunit element has a unique
factorization. For the purposes of this section, we define the length `(a) of a nonzero element in a
PID as `(a) = 0 is a is a unit and `(a) = r if the unique factorization of a involves r (not necessarily
distinct) prime factors.

Lemma 10.6.2
Let {f1 , f2 , . . . , fn } be a basis of a free module L of rank n. Suppose that s, t ∈ R are
relatively prime with ss0 + tt0 = 1 for some s0 , t0 ∈ R. There exist elements f10 , f20 ∈ L such
that f1 = sf10 − t0 f20 and f2 = tf10 + s0 f20 and {f10 , f20 , f3 , . . . , fn } is another basis of L.

Proof. By hypothesis, L = Span(f1 , f2 , . . . , fn ) ⊆ Span(f10 , f20 , f3 , . . . , fn ) and the reverse inclusion


holds trivially. Notice that the elements f10 and f20 are given by f10 = s0 f1 + t0 f2 and f20 = −tf1 + sf2 .
Also, for linear independence,

c1 f10 + c2 f20 + c3 f3 + · · · + cn fn = 0
=⇒ (s0 c1 − tc2 )f1 + (t0 c1 + sc2 )f2 + c3 f3 + · · · cn fn = 0.

Thus, c3 = · · · = cn = 0 and
 0           
s −t c1 0 c1 s t 0 0
= =⇒ = = .
t0 s c2 0 c2 −t0 s0 0 0

Thus, {f10 , f20 , f3 , . . . , fn } is linearly independent. The result follows. 


10.6. FINITELY GENERATED MODULES OVER PIDS, I 515

Proof (Theorem 10.6.1). Let {f1 , f2 , . . . , fn } be any basis of L. All nonzero elements of H are
expressed as linear combinations
c1 f1 + c2 f2 + · · · + cn fn ,

with ci ∈ R.
Let d1 be the minimum value of

{`(c1 ) | c1 f1 + c2 f2 + · · · + cs fs ∈ K for some basis {f1 , f2 , . . . , fn } of L, c1 6= 0}. (10.14)

By the well-ordering principle, K contains some element of the form

z1 = d1 f1 + c2 f2 + · · · + cs fs .

For j = 2, . . . , n, consider the ideals (d1 , cj ). Since R is a PID, (d1 , cj ) = (aj ) for some nonzero
aj ∈ R. Thus, aj | d1 and aj | cj . We claim that for each j, the elements aj arise as coefficients
of the first vector in some ordered basis of L. Suppose that aj = sd1 + tcj . Then we know that s
and t are relatively prime with ss0 + tt0 = 1 for some s0 , t0 ∈ R. By Lemma 10.6.2, L has a basis
{f10 , f2 , . . . , fj0 , . . . , fn } where f1 = sf10 − t0 fj0 and fj = tf10 + s0 fj0 . With respect to this new basis,

z1 = (sd1 + tcj )f10 + c2 f2 + · · · + (−t0 d1 + s0 cj )fj0 + · · · + cn fn .

In particular, a1 = (sd1 + tcj ) arises as a coefficient of the first basis vector in some basis of L. By
the minimality of `(dj ), we deduce that `(d1 ) ≤ `(aj ). However, since aj | d1 , we deduce by unique
factorization that aj is an associate of d1 . In particular, (d1 ) = (aj ) and d1 | cj . Setting cj = d1 qj
for 2 ≤ j ≤ n, we have
z1 = d1 (f1 + q2 f2 + · · · + qn fn ).

Set x1 = f1 + q2 f2 + · · · + qn fn . If K = Span(d1 x1 ) then we are done and {d1 x1 } is a basis of K.


If K 6= Span(d1 x1 ), then there is a minimum value of `(c2 ) for linear combinations

z2 = k1 x1 + c2 f2 + · · · + cn fn ∈ K

where f2 , . . . , fn ∈ L such that {x1 , f2 , . . . , fn } is a basis of L and k1 is arbitrary in R. Call this


minimum value d2 . There exists some element z2 ∈ K such that z2 = k1 x1 + d2 f2 + c3 f3 + · · · + cn fn
for some (new) cj ∈ R. Note that by the previous paragraph d1 divides k1 , d2 , and cj for j ≥ 3.
Now for j ≥ 3, consider the ideals (d2 , cj ). Since R is a PID, (d2 , cj ) = (aj ) for some nonzero aj ∈ R,
and hence, aj | d2 and aj | cj . By an identical reasoning as in the previous paragraph, there exists
a basis {x1 , f20 , . . . , fn0 } of L such that aj arises as the coefficient of the element z2 . Hence, by the
minimality of the length for `(d2 ), we have `(d2 ) ≤ `(aj ). But, again, since aj | d2 , the elements aj
and d2 are associates and in particular cj ∈ (d2 ) so d2 | cj . Writing, cj = d2 qj , we see that

d2 f2 + c3 f3 + · · · + cn fn = d2 (f2 + q3 f3 + · · · + qn fn ).

Set x2 = f2 + q3 f3 + · · · + qn fn . If K = Span(d1 x1 , d2 x2 ) then we are done and {d1 x1 , d2 x2 } is a


basis of K.
The pattern continues and only terminates when it results in a basis

{x1 , . . . , xm , fm+1 , . . . , fm }

of L such that {d1 x1 , d2 x2 , . . . , dm xm } is a basis of K for some positive integers di such that di | di+1
for 1 ≤ i ≤ m − 1.
Finally, suppose that {z1 , z2 , . . . , zn } is another basis of L such that {d01 z1 , d02 z2 , . . . , d0s zs } is
a basis of K satisfying d0i |d0j when i ≤ j. Since R is commutative, s = m by Theorem 10.5.11.
The definition of d1 by the minimality condition in (10.14), implies that d1 = d01 . In addition, the
subsequent definitions for di with i ≥ 2, imply that d0i = di for all 1 ≤ i ≤ m. 
516 CHAPTER 10. MODULES AND ALGEBRAS

Definition 10.6.3
In the result of Theorem 10.6.1, we call the list of factors d1 , d2 , . . . , dm (only defined up
to multiplication by a unit) the invariant factors of the submodule K. We call the basis
{x1 , x2 , . . . , xn } of L ∼
= Rn a K-preferred basis.

Note that the rank and invariant factors of the submodule K are uniquely defined whereas K
usually has more than one preferred basis.

10.6.2 – The Smith Normal Form


The proof of Theorem 10.6.1 does not show how to construct a K-preferred basis {x1 , x2 , . . . , xn }
or how to find the invariant factors. We study this problem now.
Let B = (f1 , f2 , . . . , fn ) be an ordered basis of a free module L of rank n. Suppose that a
submodule K of L is spanned by a set {y1 , y2 , . . . , y` }, where ` is not necessarily less than or equal
to n. Then there exist elements aij ∈ R such that for each j with 1 ≤ j ≤ `,
n
X
yj = aij fi .
i=1

We can organize the data for the components of the yj as the columns of a matrix in Mn×` (R),
 
a11 a12 ··· a1`
 a21 a22 ··· a2` 
A= . ..  .
 
.. ..
 .. . . . 
an1 an2 ··· an`

According to Theorem 10.6.1, there exists another ordered basis (x1 , x2 , . . . , xn ) of L and another
generating set {z1 , z2 , . . . , zm } of K such that zi = di xi for 1 ≤ i ≤ ` such that di | dj whenever
i ≤ j. This leads to the following proposition.

Proposition 10.6.4
For all matrices A ∈ Mn×` (R), there exist S ∈ GLn (R) and T ∈ GL` (R) such that

0 ··· 0
 
d1
 0 d2 · · · 0 
 ..

.. . . .. 0
SAT =  .
 . . . 
 (10.15)
 0
 0 · · · dm 

 
0 0
where the nonzero entries di satisfy di | dj whenever i ≤ j. Furthermore, if S 0
and T 0 are other invertible matrices such that S 0 AT 0 is a diagonal matrix with entries
d01 , . . . , d0s , 0, . . . , 0 with di | dj whenever i ≤ j, then s = m and d0i is an associate of di .

Proof. Interpret A as the matrix in which the columns are the B-coordinates of elements for some
generating set X of a submodule K of a free module L. Multiplying A on the left by an invertible
matrix S ∈ Mn×n (R) corresponds to changing the basis of L and adjusting the coordinates of the
elements in the generating set X. Multiplying on the right by an invertible matrix T ∈ M`×` (R)
corresponds to replacing the generating set X with a set X 0 obtained as linear combinations of the
elements in X but defined by the matrix T . However, since T is invertible, Span(X) = Span(X 0 )
and hence multiplying A on the right by T does not change the corresponding submodule K.
The Proposition now follows by Theorem 10.6.1. 
10.6. FINITELY GENERATED MODULES OVER PIDS, I 517

Definition 10.6.5
The diagonal matrix in (10.15) is called the Smith normal form of A. The matrix S is
called the left-reducing matrix, while T is called the right-reducing matrix.

In linear algebra, the reduced row echelon form of a matrix is obtained via the Gauss-Jordan elim-
ination algorithm. This algorithm involves three row operations applied to a matrix A ∈ Mn×m (F ),
where F is a field: (1) interchange two rows (Ri ↔ Rj ); (2) scale (multiply) a row by an invertible
(nonzero) constant (Ri → cRi ); (3) replace row Ri with row Ri + cRj where Rj is another row and
c is any scalar (Ri → Ri + cRj ).
In the algorithm to find the Smith normal form of a matrix, along with its left-reducing matrix and
right-reducing matrix, we use not only row operations but also column operations. If A ∈ Mn×` (R)
is a matrix, a row or column operation on A corresponds equivalently to a matrix operation, as
stated in the following list. Recall that the matrix Eij is the square matrix that is 0s everywhere
except for a 1 in the (i, j) entry.

(1) Row swap operation Ri ↔ Rj . Interchanges row Ri and row Rj . The corresponding ma-
trix operation is A 7→ M A, where M is the identity matrix but with the ith and jth row
interchanged.

(2) Scale row operation Ri → aRi , where a is a unit a ∈ U (R). Replaces the row Ri with the a
multiple of itself. The corresponding matrix operation is A 7→ M A, where M = I + (a − 1)Eii .

(3) Replace row operation Ri → Ri + aRj , where i 6= j. Replaces row Ri with the row vector
Ri + aRj . (Note that a can be an arbitrary element from the ring R.) The corresponding
matrix operation is A 7→ M A where M = I + aEij .

(4) Column swap operation Ci ↔ Cj . Interchanges the column Ri and the column Cj . The
corresponding matrix operation is A 7→ AN , where N is the identity matrix but with the ith
and jth row interchanged.

(5) Scale column operation Ci → aCi , where a ∈ U (R). Replaces the column Ci with the a
multiple of itself. The corresponding matrix operation is A 7→ AN , where N = I + (a − 1)Eii .

(6) Replace column operation Ci → Ci + aCj , where i 6= j. Replaces column Ci with the column
Ci + aCj . The corresponding matrix operation is A 7→ AN , where N = I + aEji .

Note that for each row operation M is an invertible n × n matrix and for each column operation
N is an invertible ` × ` matrix. Consequently, as one performs row operations on A, they correspond
to a product of invertible n × n matrices from right to left. A sequence of column operations on A
corresponds to a product of invertible ` matrices from left to right.

Example 10.6.6. Let R = Z and consider the free module Z3 equipped with the standard ba-
sis {f1 , f2 , f3 }. Consider the module K = Span(y1 , y2 , y3 , y4 ) in Z3 where y1 = (2, 0, −1), y2 =
(9, −6, 6), y3 = (0, 3, 12), and y4 = (3, 6, 5). To find the Smith normal form of A, we perform the
following sequence of row and column operations:
   
2 9 0 3 1 15 12 8
−−−−−−−−−−→ 
A = 0 −6 3 6 R1 → R1 + R3 0 −6 3 6
−1 6 12 5 −1 6 12 5
   
1 15 12 8 1 0 12 8
−−−−−−−−−−→  −−−−−−−−−−−−→ 
R3 → R3 + R1 0 −6 3 6 C2 → C2 − 15C1 0 −6 3 6
0 21 24 13 0 21 24 13
   
1 0 0 8 1 0 0 0
−−−−−−−−−−−−→  −−−−−−−−−−−→ 
C3 → C3 − 12C1 0 −6 3 6 C4 → C4 − 8C1 0 −6 3 6
0 21 24 13 0 21 24 13
518 CHAPTER 10. MODULES AND ALGEBRAS
   
1 0 0 0 1 0 0 0
−−−−−−→  −−−−−−−−−−−→ 
C2 ↔ C4 0 6 3 −6 R3 → R3 − 2R2 0 6 3 −6
0 13 24 21 0 1 18 33
   
1 0 0 0 1 0 0 0
−−−−−−→  −−−−−−−−−−−→ 
R2 ↔ R3 0 1 18 33  R3 → R3 − 6R2 0 1 18 33 
0 6 3 −6 0 0 −105 −204
   
1 0 0 0 1 0 0 0
−−−−−−−−−−−−→  −−−−−−−−−−−−→
C3 → C3 − 18C2 0 1 0 33 C4 → C4 − 33C2 0 1 0 0 
0 0 −105 −204 0 0 −105 −204
   
1 0 0 0 1 0 0 0
−−−−−−−−−−−→  −−−−−−−−−−−−→ 
C4 → C4 − 2C3 0 1 0 0 C3 → C3 + 18C4 0 1 0 0
0 0 −105 6 0 0 3 6
 
1 0 0 0
−−−−−−−−−−−→ 
C4 → C4 − 2C3 0 1 0 0 .
0 0 3 0
This is the Smith normal form of A. We read that the invariant factors of K are 1,1,3.
By keeping track of the row operations, and multiplying in order the corresponding matrix
operations from right to left, we calculate that
       
1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 1 0 1
S = 0 1 0 0 0 1 0 1 0 0 1 0  0 1 0 =  1 −2 1 .
0 −6 1 0 1 0 0 −2 1 1 0 1 0 0 1 −6 13 −12
By keeping track of the column operations, and multiplying in order the corresponding matrix
operations from left to right, we calculate that
 
1 −8 −138 261
0 0 18 −35
T = .
0 0 −35 68 
0 1 36 −69
It is easy to check by direct computation that
   
2 9 0 3 1 0 0 0
S  0 −6 3 6 T = 0 1 0 0 .
−1 6 12 5 0 0 3 0
Having found the matrix S, we can also find a K-preferred basis. As in linear algebra, we can
think of the matrix A as representing a module homomorphism ϕ : Z4 → Z3 , given with respect
to the standard bases on Z4 and Z3 , respectively. The matrices S and T correspond to change of
bases on Z3 and Z4 , respectively. Then matrix S is the change of basis matrix from the standard
basis to the K-preferred basis. Hence, the columns of S −1 are the change of basis matrix from the
K-preferred basis to the standard basis. Consequently, the columns of S −1 represent the coordinates
of the K-preferred basis expressed in reference to the standard basis.
In this example, in M3×3 (Z), we have
 −1  
1 0 1 2 −13 −2
S −1 =  1 −2 2  = 0 6 1 .
−6 13 −12 −1 13 2
In summary, the invariant factors of the submodule K are d1 = 1, d2 = 1, and d3 = 3. Furthermore,
with respect to the ordered basis (x1 , x2 , x3 ) of Z3 given by x1 = (2, 0, −1), x2 = (−13, 6, 13), and
x3 = (−2, 1, 2), a basis of K is {x1 , x2 , 3x3 }.
Looking forward to calculations that we will consider in the next section, we can now easily
determine the quotient module Z3 /K. By the result of Exercise 10.4.14, we deduce that
Z3 /K = (Zx1 ⊕ Zx2 ⊕ Zx3 )/(Zx1 ⊕ Zx2 ⊕ Z3x3 )
∼ (Zx1 )/(Zx1 ) ⊕ (Zx2 )/(Zx2 ) ⊕ (Zx3 )/(Z3x3 )
=

= Z/3Z.
10.7. FINITELY GENERATED MODULES OVER PIDS, II 519

Observe that at the end of the above elimination process, we performed three consecutive column
replacement operations on C3 and C4 . These amount to three steps in the Euclidean algorithm to
find the greatest common divisor of −105 and −204. If the numbers in this example had been larger,
the procedure might have required similar back-and-forth steps at earlier stages. 4

The strategy involved in this example involved: (1) finding the least greatest common divisor
across a row or column of the A part of the augmented matrix, (2) obtaining this least greatest
common divisor by row or column operations, (3) moving it to the upper left corner as possible
(pivot position), and, once a pivot position is created, (4) eliminate all nonzero terms in the row
and column of the pivot by using replace row and column operations.
The general strategy we just described applies for Z. For a general PID, we look for a greatest
common divisor that has the least length as defined just before Lemma 10.6.2.

10.6.3 – Useful CAS Commands


It should be obvious from Example 10.6.6 that calculating the Smith normal form of a matrix is
time consuming.
The following Maple packages and commands support group theory calculations.

Maple Function
with(LinearAlgebra); Loads the linear algebra theory package. (Many commands.)
ColumnOperation Depending on the options, perform any desired column operation
on a matrix.
RowOperation Depending on the options, perform any desired row operation on
a matrix.
SmithForm(A); Calculates the Smith normal form of the matrix A. Maple makes
an educated guess at the ring R based on the nature of the con-
tent of A. If the ring is a polynomial ring F [x], the variable
x should appear as an option as SmithForm(A,x);. The syn-
tax U,V:=SmithForm(A,output=[’U’,’V’]); also calculates the
left-reducing matrix U and the right-reducing matrix V .

10.7
Finitely Generated Modules over PIDs, II
In the previous section, we proved a theorem about the structure of submodules of free modules
over a PID. This theorem led naturally to the concept of the Smith normal form for a matrix
A ∈ Mn×` (R), where R is a PID. This section develops a few applications of Theorem 10.6.1 and of
the Smith normal form.

10.7.1 – The Smith Normal Form of Module Homomorphisms


Let R be a PID and consider a module homomorphism ϕ : L → L0 between free R-modules. Let
B = (f1 , f2 , . . . , fn ) be an ordered basis of L and let B 0 = (g1 , g2 , . . . , gm ) be an ordered basis of L0 .
For each fi , the element ϕ(fi ) can be expressed in B 0 -coordinates as
m
X
ϕ(fj ) = aij gi .
i=1

Define the m × n matrix A = (aij ). An element x ∈ L can be expressed in B-coordinates as

x = c1 f1 + c2 f2 + · · · + cn fn .
520 CHAPTER 10. MODULES AND ALGEBRAS

By properties of R-module homomorphisms, we have


 
n X
X m m
X Xn
ϕ(x) = aij cj gi =  aij cj  gi .
j=1 i=1 i=1 j=1

Thus, the components of ϕ(x) are given by matrix multiplication of A by the n-tuple (c1 , c2 , . . . , cn )
viewed as a column “vector.” As in linear algebra, the matrix A is called the matrix of ϕ with respect
to B and B 0 .

Proposition 10.7.1
Let R be a PID and let ϕ : Rn → Rm be an R-module homomorphism. Then there exists
an ordered basis B of Rn and an ordered basis B 0 of Rm such that, with respect to these
bases, the matrix in Mm×n (R) of ϕ with respect to B and B 0 is

d1 0 · · · 0
 
 0 d2 · · · 0 
 ..

.. . . .. 0 

 .
 . . . 

 0 0 · · · dr 
 
 
0 0
where the nonzero entries di satisfy di | dj if i ≤ j.

Proof. Possibly after relabeling, by Theorem 10.6.1, Ker ϕ is free submodule of Rn and there
is an ordered basis B = (x1 , x2 , . . . , xn ) of Rn such that {c1 xr+1 , c2 xr+2 , . . . , cn−r xn } satisfying
ci | ci+1 for 1 ≤ i ≤ n − r, is a basis of K. Denote by N the submodule of Rn spanned by
{x1 , x2 , . . . , xr }. Then ϕ restricts to an injective homomorphism ϕ N : N → Rm . It is easy to check
that {ϕ(x1 ), ϕ(x2 ), . . . , ϕ(xr )} is linearly independent, which implies that Im ϕ is free of rank r.
By Theorem 10.6.1, there is a basis B 0 = {y1 , y2 , . . . , ym } of Rm such that {d1 y1 , d2 y2 , . . . , dr yr }
is a basis of Im ϕ. For i = 1, . . . , r, define x0i ∈ L as the unique element such that ϕ(x0i ) = yi . Then,
with respect to the basis B = {x01 , . . . , x0r , xr+1 , . . . , xn } and B 0 , the matrix of ϕ is given by (10.15).

The matrix in Proposition 10.7.1 is obviously the Smith normal form of the matrix of ϕ with
respect to the bases B and B 0 . By the uniqueness properties of the Smith normal form, we can call
this matrix the Smith normal form of ϕ.
Proposition 10.7.1 leads to the following interesting consequence for homomorphism on a free
modules over a PID. Recall from linear algebra, that for a linear transformation T : V → V on
a finite-dimensional vector space V , only possibilities occur: (1) T is invertible, in which cases it
is both surjective and injective; (2) T is not invertible, in which case T is neither injective nor
surjective. With modules over a general PID, another possibility emerges.

Corollary 10.7.2
Let R be a PID and let ϕ : Rn → Rn be an R-module homomorphism between free modules
of rank n. Let A ∈ Mn×n (R) be the matrix of ϕ with respect to a basis B on the domain
and a basis B 0 on the codomain. One of the following mutually disjoint cases occurs:
(1) ϕ is not injective and not surjective if and only if det A = 0.
(2) ϕ is a bijection if and only if det A is a unit.
(3) ϕ is injective but not surjective if det A is a nonzero, nonunit element.

Proof. (Left as an exercise for the reader. See Exercise 10.7.11.) 


10.7. FINITELY GENERATED MODULES OVER PIDS, II 521

When F is a field, every element in F is either 0 or a unit. Consequently, case (3) in the above
corollary never occurs for linear transformations on a finite-dimensional vector space.

10.7.2 – Fundamental Theorem of Finitely Generated Modules over a PID


Theorem 10.6.1 allows us to classify all finitely generated modules over any PID.

Theorem 10.7.3
Let R be a PID and let M be a finitely generated R-module. Then

M∼
= Rr ⊕ R/(d1 ) ⊕ R/(d2 ) ⊕ · · · ⊕ R/(dm ) (10.16)

for some integer r ≥ 0 and some nonzero, nonunit elements d1 , d2 , . . . , dm ∈ R satisfying


di | dj if i ≤ j. Furthermore, this decomposition is unique in the following sense: If

Rr ⊕ R/(d1 ) ⊕ R/(d2 ) ⊕ · · · ⊕ R/(dm ) ∼


= Rs ⊕ R/(d01 ) ⊕ R/(d02 ) ⊕ · · · ⊕ R/(d0m ),

with di | dj and d0i | d0j whenever i ≤ j, then r = s and d0i is an associate to di for 1 ≤ i ≤ m.

Proof. Suppose that M is spanned by elements {x1 , x2 , . . . , xn }. Consider the homomorphism ϕ :


Rn → M defined in (10.13). Then M ∼ = Rn / Ker ϕ.
By Theorem 10.6.1 there exists a basis {y1 , y2 , . . . , yn } of Rn and elements d1 , d2 , . . . , dm ∈ R
such that di |dj if i ≤ j and {d1 y1 , d2 y2 , . . . , dm ym } is a basis of Ker ϕ. Then

M∼
= (Ry1 ⊕ Ry2 ⊕ · · · ⊕ Ryn )/(Rdy1 ⊕ Rd2 y2 ⊕ · · · ⊕ Rdm ym ).

By the result of Exercise 10.4.14,

M∼
= (Ry1 )/(Rd1 y1 ) ⊕ (Ry2 )/(Rd2 y2 ) ⊕ · · · ⊕ (Rym )/(Rdm ym ) ⊕ Rn−m

= Rm−n ⊕ R/(d1 ) ⊕ R/(d2 ) ⊕ · · · ⊕ R/(dm ).

If di is a unit, then R/(di ) = {0} and we remove all such terms from the direct sum.
The uniqueness of the decomposition follows from the uniqueness of the invariant factors as
established in Theorem 10.6.1. 

Definition 10.7.4
The decomposition of M given in Theorem 10.7.3 is called the invariant factors decompo-
sition of M . The constant r is called the free rank of M and the elements d1 , d2 , . . . , dm
(only defined up to multiplication by a unit) are called the invariant factors.

For a submodule of a free module, we allowed the invariant factors to include units. However,
taking the quotient module Rn / Ker ϕ eliminates the invariant factors of Ker ϕ that are units.
We observe that for a module M satisfying (10.16), the torsion submodule is

Tor(M ) ∼
= R/(d1 ) ⊕ R/(d2 ) ⊕ · · · ⊕ R/(dm ).

This follows from Exercise 10.3.21 and the fact that the torsion submodule of a free module over a
PID is trivial. (See Exercise 10.7.1.)
In the language of annihilators, there exist elements z1 , z2 , . . . , zm ∈ Tor(M ) such that Tor(M ) =
Rz1 ⊕ Rz2 ⊕ · · · ⊕ Rzm with

Ann(z1 ) ⊇ Ann(z2 ) ⊇ · · · ⊇ Ann(zm ).

These annihilators are called the invariant factor ideals because Ann(zi ) = (di ) for all i.
522 CHAPTER 10. MODULES AND ALGEBRAS

As with the fundamental theorem of finitely generated abelian groups, this corresponding theorem
for modules over principal ideal domains has an elementary divisors form.
Let a be a nonzero element in a PID R. Every PID is a unique factorization domain. So
a = upα 1 α2 αs
1 p2 · · · ps where u is a unit and the pi are some distinct primes (also irreducible) in R.
αj
Since the factorization is unique in R, for i 6= j the ideals (pα
i ) and (pj ) are comaximal, in that
i

their sum is R. Consequently, by the Chinese Remainder Theorem,

R/(a) ∼
= R/(pα 1 α2 αs ∼ α1 α2 αs
1 p2 · · · ps ) = R/(p1 )(p2 ) · · · (ps )
∼ α α α
= R/(p 1 ) ⊕ R/(p 2 ) ⊕ · · · ⊕ R/(p s ). (10.17)
1 2 s

Returning to the decomposition (10.16), each R/(di ) decomposes according to (10.17). Recall
that if p and q are prime elements in a PID, then (p) = (q) if and only if p and q are associates of
each other. Hence, we can write (10.17) as

R/(a) ∼
= R/P1α1 ⊕ R/P2α2 ⊕ · · · R/Psαs , (10.18)

where each Pi is a prime ideal. Furthermore, in this decomposition (10.18), the primes ideals Pi and
the powers αi are uniquely defined again by unique factorization. So applying this decomposition
to each R/(di ) in the invariant factors decomposition leads to the following restatement of the
fundamental theorem for finitely generated PID modules.

Theorem 10.7.5
Let R be a PID and let M be a finitely generated R-module. Then

M∼
= Rr ⊕ R/P1α1 ⊕ R/P2α2 ⊕ · · · R/Ptαt (10.19)
αj
for some integer r ≥ 0 and some (not necessarily distinct) prime ideal powers Pj in R.
Furthermore, this decomposition is unique in the following sense: if

Rr ⊕ R/(d1 ) ⊕ R/P1α1 ⊕ R/P2α2 ⊕ · · · R/Ptαt ∼ β β


= Rs ⊕ R/Q1 1 ⊕ R/Q2 2 ⊕ · · · R/Qβuu ,

then r = s, t = u, and there is a permutation π ∈ St such that Qπ(i) = Pi and βπ(i) = αi


for all 1 ≤ i ≤ t.

Proof. The Chinese Remainder Theorem along with the invariant factor decomposition gives the
decomposition (10.19). For the uniqueness, it suffices to recover a unique invariant factor form from
(10.19) and the uniqueness of this theorem will follow from the uniqueness proven in Theorem 10.7.3.
Let Q1 , Q2 , . . . , Qk be the complete set of distinct prime ideals appearing anywhere in the de-
composition (10.19). Let ` be the maximum number of times any Qi appears in the decomposition.
Then the torsion part of M in (10.19) can be written as
 
k `
α
M M
Tor(M ) =  R/Qi ij 
i=1 j=1

where some of the αij may be 0 but, for each i, we have αi1 ≤ αi2 ≤ · · · ≤ αi` . Now for each
α α α
j, define the product ideal Ij = Q1 1j Q2 2j · · · Qk kj . Since the prime ideals Qi are distinct, by the
Chinese Remainder Theorem,
k
α
M
R/Ij = R/Qi ij .
i=1

Consequently,
Tor(M ) = R/I1 ⊕ R/I2 ⊕ · · · ⊕ R/I` . (10.20)
α α
Furthermore, since αij ≤ αi,j+1 , then for each i, we have Qi ij
⊇ Qi i,j+1 . Hence, I1 ⊇ I2 ⊇ · · · ⊇ I`
so the expression (10.20) is the invariant factor form of Tor(M ). 
10.7. FINITELY GENERATED MODULES OVER PIDS, II 523

Definition 10.7.6
We call the decomposition in (10.19) the elementary divisor form of a finitely generated
α
module over a PID. If Qi = (pi ) for each prime ideal Qi , then the elements pi ij (only
defined up to multiplication by a unit) are called the elementary divisors.

As we pointed out in motivation to this section, the Fundamental Theorem of Finitely Generated
Modules over a PID subsumes the Fundamental Theorem of Finitely Generated Abelian Groups.
Consequently, the structure theorems we just presented should feel familiar to those in Section 4.5.
However, as the following example illustrates, the torsion modules of PIDs that are not Z might not
look as familiar.
Example 10.7.7. Let R = Q[x] and let K be the submodule of Q[x]2 with invariant factors d1 =
(x2 − 2) and d2 = (x2 − 2)(x − 3). Then the invariant factor form of Q[x]2 /K is

Q[x]/(x2 − 2) ⊕ Q[x]/((x2 − 2)(x − 3)).

The elementary divisor form of Q[x]2 /M is



Q[x]/(x2 − 2) ⊕ Q[x]/(x2 − 2) ⊕ Q[x]/(x − 3) ∼
= Q[ 2]2 ⊕ Q,

where the structure of Q[ 2] as a Q[x]-module is described in Exercise 10.4.6 and the structure of
Q as a Q[x]-module is given by p(x)a = p(3)a. 4

Exercises for Section 10.7


1. Let L be a free module over a PID R. Prove that the torsion submodule Tor(L) is trivial.
2. Consider the free Z-module Z3 . Let f1 = (1, 2, 3), f2 = (4, 5, 6), and f3 = (7, 8, 9) and let K =
Span(f1 , f2 , f3 ). Find the invariant factors of K along with a K-preferred basis. Use this to determine
Z3 /K.
3. Consider the free Z-module Z3 . Let f1 = (2, 5, 10), f2 = (3, −4, 3), and f3 = (−7, 2, −9) and let
K = Span(f1 , f2 , f3 ). Find the invariant factors of K along with a K-preferred basis. Use this to
determine Z3 /K.
4. Consider the free Z-module Z3 . Let f1 = (10, 5, 20), f2 = (5, 0, 5), and f3 = (4, 3, 6) and let K =
Span(f1 , f2 , f3 ). Find the invariant factors of K along with a K-preferred basis. Use this to determine
Z3 /K.
5. Let R = Z[i]. Consider the homomorphism ϕ : R2 → R2 defined by ϕ(x1 , x2 ) = (2x1 + ix2 , (3 + i)x1 +
4x2 ). Find the invariant factors of Im ϕ and find an Im ϕ-preferred basis of the codomain R2 .
6. Let R = Z[i]. Suppose that the submodule K of Z[i]3 is spanned by f1 = (1, 5, 10), f2 = (2+i, 3−2i, 5i),
and f3 = (3 − i, 0, 1 + 4i). Using technology to assist with the row and column operations, find a K-
preferred basis of Z[i]3 along with the invariant factors of K. Determine the structure of the quotient
module Z[i]3 /K and show that the order is finite.
7. Let R = Q[x]. Suppose that the submodule K of Q[x]3 is spanned by f1 = (3x + 2, 7x − 5, x2 + 1),
f2 = (2x − 3, x + 7, 4), and f3 = (13, 11x − 31, 2x2 − 10). Using technology to assist with the row and
column operations, find a K-preferred basis of R3 along with the invariant factors of K. Determine
the structure of the quotient module R3 /K.
8. Let R = Q[x]. Suppose that the submodule K of Q[x]3 is spanned by f1 = (2x − 7, 2x − 16, 5x + 2),
f2 = (2x − 4, 2x − 10, 5x + 3), and f3 = (6x, 6x − 6, 15x + 13). Using technology to assist with the row
and column operations, find a K-preferred basis of R3 along with the invariant factors of K. Prove
that the quotient module R3 /K is isomorphic to Q[x] ⊕ Q and describe the action of Q[x] on the Q
component.
9. Let R = Z and consider the 2 × 2 matrix
 
105 70
A= .
42 30
Find the normal form of A. Observe that it has at least one 1 on the diagonal even though the greatest
common divisor of elements down any column or across any row is greater than 1.
524 CHAPTER 10. MODULES AND ALGEBRAS

10. Let R be a PID and A ∈ Mn×n (R).


(a) Prove that the row operations allowed in calculating the Smith normal form can change the
determinant of A only by multiplication by a unit in R.
(b) Deduce that if det(A) is square-free, then the Smith normal form of A is the diagonal matrix
with all 1s down the diagonal, except for det(A) in the (n, n) entry.
11. Prove Corollary 10.7.2.
12. Show that the irreducible modules over a PID R consist of R and R/P , where P is a prime ideal.
13. Let R be an integral domain and let M be a nonprincipal ideal of R. Prove that M is torsion free but
is not a free R-module.
14. Let R be a PID and let M be a torsion module with invariant factors (d1 ) ⊇ (d2 ) ⊇ · · · ⊇ (ds ). Show
that for any R-module homomorphism ϕ : M → N , the image ϕ(M ) is a torsion with invariant factors
(d1 ) ⊇ (d2 ) ⊇ · · · ⊇ (dt ) satisfying t ≤ s, and dt | ds , dt−1 | ds−1 , . . . , d1 | ds−t+1 .
15. Let R be a PID and let M be a torsion R-module (not necessarily finitely generated). For a prime
element p ∈ R, we define the p-primary component as the subset of all elements of M annihilated by
some positive power of p, i.e., {m ∈ M | pk m = 0 for some k ∈ N∗ }.
(a) Prove that the p-primary component of M is a submodule.
(b) Suppose that M1 and M2 are primary components of M for nonassociate primes. Prove that
M1 ∩ M2 = 0.
αk
(c) Suppose that Ann(M ) = (a), where a = pα 1 α2
1 p2 · · · pk , with p1 , p2 , . . . , pk pairwise nonassociate
primes. Call Mi the pi -primary component of M . Prove that

M = M1 ⊕ M2 ⊕ · · · ⊕ Mk .

16. Let R = Z[x] and consider the free module M = R2 . Let f1 = (2, x), f2 = (x2 , x + 2), and f3 = (x, 3).
Prove that the submodule Span(f1 , f2 , f3 ) is not free.
17. Let R be a PID and let ϕ : Rn → Rm be an R-module homomorphism. Prove that the invariant
factors of Ker ϕ are all 1.
18. Give an example of an integral domain and a nonzero torsion module M such that Ann(M ) = 0.
Prove that if M is finitely generated, then Ann(M ) 6= 0. (Do not assume that R is a PID.)
19. Let R be a PID. Prove that if an R-module M is a summand of a free R-module, then M is free.

10.8
Applications to Linear Transformations
At various points in this textbook, we saw effective applications of the Fundamental Theorem of
Finitely Generated Abelian Groups. However, its more general counterpart for finitely generated
modules over a PID, Theorem 10.6.1, leads to other profound consequences. This section and the
next two analyze consequences for F [x]-modules and implications for linear transformations between
vector spaces over the field F .
Recall that if F is a field, then an F [x]-module consists of a vector space V along with a linear
transformation T : V → V and that the action of F [x] on V is determined by xv = T (v) for all
v ∈ V and F [x]-linearity of the module action. Note that F [x] and any free F [x]-module F [x]n is an
infinite-dimensional vector space over F . If V is finite -, then it is generated by its basis elements
as an F [x]-module so it is finitely generated. By the Fundamental Theorem of Finitely Generated
Modules over a PID, the module V equipped with T has free rank 0 and is a torsion F [x]-module.

10.8.1 – The Characteristic Polynomial


We first remind the reader of a few concepts from linear algebra.
10.8. APPLICATIONS TO LINEAR TRANSFORMATIONS 525

Let F be a field. Let T : V → V be a linear transformation from a vector space V over F


into itself. An eigenvalue of T is an element λ ∈ F such that there exists a nonzero vector v ∈ V
satisfying
T (v) = λv. (10.21)
A vector v satisfying (10.21) is called an eigenvector for the eigenvalue λ. By the properties of linear
transformations, the set of vectors that solve (10.21) is a subspace of V . This subspace is called the
eigenspace for λ. We denote this by Eλ .
Suppose that V has dim V = n with an ordered basis B. Let A be the n × n matrix of T with
respect to B and suppose we represent v as an n × 1 column vector. Then as a matrix equation,
(10.21) becomes Av = λv, which equivalent to (A − λI)v = 0. Since (A − λI)v = 0 for a nonzero
vector v, then det(A − λI) = 0. This requirement provides a way to find the eigenvalues T .
For a variable x, the polynomial det(xI − A) is a monic polynomial of degree n. It is called
the characteristic polynomial of A and is denoted by cA (x). Note that if B is the matrix of T with
respect to another ordered basis B 0 , then B = M AM −1 , where M is the change of coordinate matrix
from B to B 0 coordinates. Then
det(xI − B) = det(xI − M AM −1 ) = det(M (xI)M −1 − M AM −1 )
= det(M (xI − A)M −1 ) = det(M ) det(xI − A) det(M )−1 = det(xI − A).
Hence, cB (x) = cA (x). Since this polynomial is independent of the basis, we also call it the charac-
teristic polynomial of the linear transformation T and denote it by cT (x).
The following proposition is key for finding eigenvalues and the corresponding eigenspaces.

Proposition 10.8.1
Let T : V → V be a linear transformation. A value λ ∈ F is an eigenvalue of T if
and only if it is root of cT (x) = 0. For each, eigenvalue λ, the associated eigenspace is
Eλ = Ker(A − λI).

A particularly nice situation occurs when the matrix A is diagonalizable. In other words, there
exists an invertible matrix M such that A = M DM −1 , where D is a diagonal matrix. In this
case, the direct sum of all the eigenspaces of A is all of V . This is a restrictive situation since it
corresponds to a dilation by a factor of λ in each Eλ , with the linear transformation completed by
linearity on the rest of V .
There are two ways in which diagonalization fails to occur.
First, a linear transformation might fail to have eigenvalues if cT (x) has no roots in F . Consider
the linear transformation T of rotation in R3 by π/2 around the z-axis. The associated matrix with
respect to the standard basis is  
0 −1 0
A = 1 0 0
0 0 1
and cT (x) = det(xI −A) = (x−1)(x2 +1). The only root in R is 1 and the eigenspace E1 is the z-axis,
along which the rotation acts as the identity. (Over the field extension C of R, the characteristic
polynomial splits completely into cT (x) = (x − 1)(x + i)(x − i), but the geometric interpretation is
no longer the same as in the vector space R3 .)
Second, even if the characteristic polynomial splits completely in F [x], if an eigenvalue λ has
algebraic multiplicity 2 or greater, then the eigenspace may still only have a dimension of 1. For
example, consider the matrix  
3 4
A= .
−1 7
The characteristic polynomial is cA (x) = (x − 3)(x − 7) + 4 = (x − 5)2 . So 5 is the only eigenvalue
and it has algebraic multiplicity of 2. However,
   
−2 4 2
E5 = Ker = Span ,
−1 2 1
526 CHAPTER 10. MODULES AND ALGEBRAS

so E5 is not all of R2 . We call dim Eλ the geometric multiplicity of λ. Then for any eigenvalue λ,

1 ≤ geometric multiplicity of λ ≤ algebraic multiplicity of λ.

So, even if cA (x) splits completely over F , we have


X
number of distinct roots of cA (x) ≤ dim Eλ ≤ dim V.
λ

If the second inequality is strict, then A is not diagonalizable.

10.8.2 – Rational Canonical Form


Let V be a vector space over F of dimension n, equipped with the linear transformation T : V → V ,
and consider it as an F [x]-module. By Theorem 10.7.3, since V is a torsion module,

V ∼
= F [x]/(a1 (x)) ⊕ F [x]/(a2 (x)) ⊕ · · · ⊕ F [x]/(am (x)) (10.22)

for some polynomials ai (x) with a1 (x) | a2 (x) | · · · | am (x). Furthermore, these polynomials are
defined uniquely up to a unit. The units in F [x] are the nonzero constant polynomials. Therefore,
if we require that the polynomials ai (x) be monic, then they are uniquely defined. Note that not
only is V a torsion module but Ann(V ) = (am (x)).

Definition 10.8.2
The minimal polynomial of T : V → V , denoted by mT (x), is the monic polynomial am (x).

We use the term minimal polynomial (as opposed to largest invariant factor) because mT (x) is
the unique monic, polynomial of least degree such that mT (x) · V = {0}.
We consider an individual factor ai (x). Each summand F [x]/(ai (x)) in the invariant factor
decomposition (10.22) is a vector space over F . Suppose that deg ai (x) = k and that ai (x) = xk +
pk−1 xk−1 +· · ·+p1 x+p0 . Then the natural ordered basis for F [x]/(ai (x)) is Bi = (1, x, x2 , . . . , xk−1 ).
In particular, dimF F [x]/(ai (x)) = deg ai .
Now x acts on V according to the linear transformation T . So consider the action of x by left
multiplication on F [x]/(ai (x)). Obviously, x · xj = xj+1 for 0 ≤ j ≤ k − 2 but

x · xk−1 = xk = −p0 − p1 x − · · · − pk+1 xk−1

since ai (x) = 0 in F [x]/(ai (x)). Hence, the matrix corresponding of the action of x is
 
0 0 0 ··· 0 −p0
1
 0 0 ··· 0 −p1 
0 1 0 ··· 0 −p2 
. (10.23)
 
0
 0 1 ··· 0 −p3 
 .. .. .. . . .. .. 
. . . . . . 
0 0 0 ··· 1 −pk−1

Definition 10.8.3
If a(x) = xk + pk−1 xk−1 + · · · + p1 x + p0 , the matrix in (10.23) is called the companion
matrix of a(x) and is denoted by Ca(x) .

By the direct sum decomposition (10.22), there are vectors in V that correspond to the vectors
in each of the bases Bi . Furthermore, the union of all these bases constitute a basis B of V and
m
X
dim V = deg ai (x).
i=1
10.8. APPLICATIONS TO LINEAR TRANSFORMATIONS 527

With respect to this basis B, the matrix of T is the n × n block diagonal matrix
0 0
 
Ca1 (x) ···
 0 Ca2 (x) · · · 0 
..  . (10.24)
 
 .. .. . .
 . . . . 
0 0 · · · Cam (x)

Definition 10.8.4
A matrix A ∈ Mn×n (F ) is in rational canonical form if it is in the block diagonal form in
(10.24) where ai (x) are monic polynomials in F [x] such that a1 (x) | a2 (x) | · · · | am (x). A
rational canonical form for a linear transformation T : V → V is any matrix representing
T that is in rational canonical form.

The uniqueness result of Theorem 10.7.3 along with the characterization of isomorphisms between
F [x]-modules given in Example 10.4.6 leads to the following classification theorem.

Theorem 10.8.5
Let V be a finite-dimensional vector space over the field F and let T : V → V be a linear
transformation. Then T has a rational canonical form, which is unique. Furthermore,
S : V → V and T : V → V are similar linear transformations, if and only if S and T have
the same rational canonical form.

In terms of matrices, there exists a bijection between similarity classes of n × n matrices and
the set of matrices in rational canonical form. In other words, the set of n × n matrices in rational
canonical form is a complete set of distinct representatives of the similarity classes on Mn×n (F ).
It is a straightforward exercise in linear algebra to prove that the determinant of a block diagonal
matrix is a product of the determinants of the blocks:
 
A1 0 · · · 0
 0 A2 · · · 0 
det  . ..  = (det A1 )(det A2 ) · · · (det Am ). (10.25)
 
. .
. . .
 . . . . 
0 0 ··· Am
Consequently, we deduce that the invariant factors associated to a linear transformation T and the
characteristic are related as follows.

Proposition 10.8.6
Let T : V → V be a linear transformation on a vector space over a field F . Then the
characteristic polynomial cT (x) is the product of the invariant factors of T .

Proof. If A is a block diagonal matrix, then xI − A is also a block diagonal matrix. By (10.25),
det(xI − A) = det(xI − A1 ) det(xI − A2 ) · · · det(xI − Am ).
Consequently, to prove the proposition, it suffices to show that the characteristic polynomial of the
companion matrix Ca(x) of the monic polynomial a(x) is again a(x). Let Ca(x) be the companion in
(10.23). Then, the characteristic polynomial is
 
x 0 0 ··· 0 p0
−1 x 0 · · · 0 p1 
 
 0 −1 x · · · 0 p2 
det  . . (10.26)
 
.. .. . . .. ..
 .. . . . . . 
 
0 0 0 ··· x pk−2 
0 0 0 · · · −1 x + pk−1
528 CHAPTER 10. MODULES AND ALGEBRAS

We perform Laplace expansion down the rightmost column. For all i with 1 ≤ i ≤ k, then (i, k)-
minor of this determinant is the determinant of the ((i − 1) + (k − i)) × ((i − 1) + (k − i)) block
diagonal matrix
 
x 0 0
 −1 x 0 

 0 −1 x

0 



 . .. 

 .

 −1 x 0 

 0 −1 x 



0 0 0 −1



..
.

By (10.25), the (i, k) minor is (−1)k−i xi−1 . Consequently, the Laplace expansion of (10.26) is

(−1)k+1 p0 (−1)k−1 + (−1)k+2 p1 (−1)k−2 x + · · · (−1)2k (x + pk−1 )xk−1 ,

which is exactly a(x). 

10.8.3 – Finding a Preferred Basis


What the Fundamental Theorem does not provide is a constructive approach to find a basis with
respect to which the matrix of a linear transformation T is in rational canonical form. The previous
section, discussed two related topics: (1) how to deduce properties of V ∼ = F [x]n / Ker ϕ from the
Smith normal form of ϕ; and (2) how to find a preferred basis of a submodule from a generating
set. However, in this situation, we need to find a preferred basis for Ker ϕ.
Identify V with F n and suppose that the linear transformation T : F n → F n has the matrix A
with respect to the standard basis on F n . Consider the composition of functions

ψ ϕ
F [x]n F [x]n Fn

where ψ is the R-module homomorphism, which, with respect to the standard basis on F [x]n , has
the matrix xI − A. The R-module homomorphism ϕ maps standard basis elements of F [x]n to
similarly indexed standard basis elements of F n . Hence,
 
p1 (x)
 p2 (x) 
ϕ :  .  7−→p1 (x) · e1 + p + 2(x) · e2 + · · · + pn (x) · en
 
 .. 
pn (x)
= p1 (A)(e1 ) + p2 (A)(e2 ) + · · · + pn (A)(en ).

Consequently, the composition ϕ ◦ ψ satisfies


     
p1 (x) xp1 (x) p1 (x)
 p2 (x)   xp2 (x)   p2 (x) 
ϕ ◦ ψ  .  = ϕ  .  − A  . 
     
 ..   ..   .. 
pn (x) xpn (x) pn (x)
= Ap1 (A)(e1 ) + Ap2 (A)(e2 ) + · · · + Apn (A)(en )
− A(p1 (A)(e1 ) + p2 (A)(e2 ) + · · · + pn (A)(en ))
= 0.

Therefore, ϕ ◦ ψ is the trivial homomorphism and so Im ψ ⊆ Ker ϕ.


10.8. APPLICATIONS TO LINEAR TRANSFORMATIONS 529

On the other hand, note that the ith entry of


 
p1 (x)
 p2 (x) 
(xI − A)  . 
 
 . .
pn (x)
is
xpi (x) − ai1 p1 (x) − ai2 p2 (x) − · · · − ain pn (x). (10.27)
Notice that given fixed aij and arbitrary polynomials pi (x), polynomials of the form in (10.27) are
arbitrary except that the constant is in the ideal Ii = (ai1 , . . . , ain ). This implies that as subsets of
F [x]n ,
F [x]n = Im ψ + F n ,
where by F n , we mean vectors in F [x]n consisting of constant polynomials. If v ∈ F [x]n is written
as v = q + b, where q ∈ Im ψ and b ∈ F n , then ϕ(v) = ϕ(q) + ϕ(b) = ϕ(b) since Im ψ ⊆ Ker ϕ. But
   
b1 b1
 b2   b2 
ϕ  .  = b1 · e1 + b2 · e2 + · · · + bn en =  .  .
   
 ..   .. 
bn bn

Hence, v ∈ Ker ϕ if and only if b is the zero vector. We deduce that Im ψ = Ker ϕ. In particular,
the columns of xI − A, each as an element in F [x]n , give a generating set for Ker ϕ.

Proposition 10.8.7
The Smith normal form of xI − A as an element in Mn×n (F [x]) is the n × n diagonal matrix

0
 
In−m

 a1 (x) 

 a2 (x) ,

 0

.. 
 . 
am (x)

where the a1 (x), a2 (x), . . . , am (x) are the invariant factors of T : V → V as an F [x]-module.

Proof. Suppose the F [x]-module consisting of the vector space V and linear transformation T has

V ∼
= F [x]/(a1 (x)) ⊕ F [x]/(a2 (x)) ⊕ · · · ⊕ F [x]/(am (x)).

Then since it is a torsion module, the normal form contains no 0s on the diagonal. Also, the nonzero,
nonunit terms on the diagonal of the normal form are precisely the invariant factors. 

Since the Smith normal form of two similar matrices is equal, then we can also talk about the
Smith normal form of a linear transformation T : V → V , where V is finite dimensional.
This shows that we can calculate the invariant factors of a matrix (or linear transformation) by
finding the Smith normal form of A. Furthermore, we can use the method of Example 10.6.6 to find
a basis of V = F n with respect to which the linear transformation T has rational normal form.
Example 10.8.8. Consider the linear transformation T : R4 → R4 that with respect to the standard
basis has the matrix  
2 −4 6 −6
0 6 −6 6 
A= −1 2 −2 3  .

0 −2 3 −1
530 CHAPTER 10. MODULES AND ALGEBRAS

A quick calculation shows that the characteristic polynomial is cT (x) = (x − 2)2 (x2 − x + 3), where
x2 − x + 3 is irreducible over R. By Proposition 10.8.6, we deduce that one of two possibilities occur:
(1) T has only one invariant factor, namely cT (x), or (2) it has 2 invariant factors a1 (x) = x − 2 and
a2 (x) = (x − 2)(x2 − x + 3).
Denote by V the R[x]-module of R4 equipped with the linear transformation T . We use row and
column operations on the 4 × 4 matrix xI − A to get the Smith normal form. Keeping track of the
row operations allows us to calculate the matrix S defined in Proposition 10.6.4. Using technology
(or doing the calculations by hand) we find that the Smith normal form of xI − A is
 
1 0 0 0
0 1 0 0 
 . (10.28)
0 0 x − 2 0 
3 2
0 0 0 x − 3x + 5x − 6
∼ R[x]/(x − 2) ⊕ R[x]/(x3 −
Note that x3 − 3x2 + 5x − 6 = (x − 2)(x2 − x + 3), so we deduce that V =
2
3x + 5x − 6). Furthermore, the left-reducing matrix S is
1
 
4 0 0 0
 1x − 1 − 23 1 0 
S= 6 2 
 1 0 0 −2 
−x2 + x − 6 −2x + 2 3x − 6 3x2 − 6x + 12
and so  
4 0 0 0
3
x − 6
2 −3
x − 34 x2 + 32 x − 3 − 21 
S −1 = ,
 −2 x−1 − 12 x2 + x − 2 − 31 
2 0 − 12 0
which gives basis elements of Ker ϕ as a submodule of R[x]4 , with entries listed as column vectors.
Denote the columns of S −1 by ξ1 , ξ2 , ξ3 , ξ4 . Then from the Smith normal form (10.28), we explicitly
have R[x]-module
V = (R[x])4 / Ker ϕ

= (R[x]ξ1 ⊕ R[x]ξ2 ⊕ R[x]ξ3 ⊕ R[x]ξ4 )/(R[x]ξ1 ⊕ R[x]ξ2 ⊕ R[x]a1 (x)ξ3 ⊕ R[x]a2 (x)ξ4 )

= R[x]/(a1 (x)) ⊕ R[x]/(a1 (x)).
Notice that the module elements ξ1 and ξ2 become 0 in the quotient module. The summands
R[x]/(x − 2) and R[x]/(x3 − 3x2 + 5x − 6) are generated respectively by v1 = ϕ(ξ3 ) and v2 = ϕ(ξ4 ).
Note that we only use columns ξ3 and ξ4 of S because these correspond to the nonunit invariant
factors of xI − A. We calculate these vectors v1 , v2 ∈ R4 by
   
0 3
− 3 x2 + 3 x − 3
   
4 2
3 2 3 1 2 1  −6 
v1 = ϕ  − 1 x2 + x − 2  = − 4 A + 2 A − 3I e2 + − 2 A + A − 2I e3 − 2 Ie4 = −3
  
2
− 12 1
and    
0 0
− 1  1 1 − 1 
2  2
− 1  = − 2 Ie2 − 3 Ie3 = − 1  .
v2 = ϕ 
3 3
0 0
The summand R[x]/(x − 2) is one-dimensional so it is Span(v1 ). Then summand R[x]/(x3 − 3x2 +
5x − 6) is three-dimensional so it is Span(v2 , x · v2 , x2 · v2 ). Setting
 
3 0 0 2
 −6 − 1 −1 −4 
M = v1 v2 xv2 x2 v2 =  2
−3 − 1 − 1 − 4  ,

3 3 3
1 0 0 1
10.8. APPLICATIONS TO LINEAR TRANSFORMATIONS 531

we calculate,  
2 0 0 0
0 0 0 6
M −1 AM = 

.
0 1 0 −5
0 0 1 3
This is the rational canonical form of T , which is the matrix of T with respect to the ordered basis
of R4 listed as the columns of M . 4

The previous example illustrates that knowing the characteristic polynomial is not sufficient to
determine the form of the invariant factor decomposition. Depending on the factorization of cT (x)
into irreducible polynomials in F [x], there is a finite number of possibilities for the rational canonical
form. This situation is not unlike determining the possible abelian groups of a given finite cardinality.
Example 10.8.9. Suppose that T : R4 → R4 is a linear transformation such that cT (x) = (x − 2)4 .
By Proposition 10.8.6, the invariant factors must multiply to cT (x). We list them here below along
with the rational canonical form for each case. As always, the minimal polynomial is largest (by
degree) of the invariant factors.
 
0 0 0 −16
1 0 0 32 
a1 (x) = (x − 2)4  
0 1 0 −24
0 0 1 8 
2 0 0 0
0 0 0 8 
(a1 (x), a2 (x)) = ((x − 2), (x − 2)3 )  
0 1 0 −12
0 0 1 6 
0 −4 0 0
2 2
1 4 0 0
(a1 (x), a2 (x)) = ((x − 2) , (x − 2) )  
0 0 0 −4
0 0 1 4
2 0 0 0
2
0 2 0 0
(a1 (x), a2 (x), a3 (x)) = ((x − 2), (x − 2), (x − 2) )  
0 0 0 −4
0 0 1 4
2 0 0 0
0 2 0 0
(a1 (x), a2 (x), a3 (x), a4 (x)) = ((x − 2), (x − 2), (x − 2), (x − 2)) 
0

0 2 0
0 0 0 2

This list shows that there are 5 similarity classes of matrices in M4×4 (R) with cT (x) = (x − 2)4 .
Only the last canonical form corresponds to matrices that are diagonalizable. 4

Exercises for Section 10.8


1. Let A ∈ Mn×n (F ). Prove that if the characteristic polynomial of A is a product of distinct irreducible
polynomials in F [x], then A has only one invariant factor, namely cA (x).
2. Let A, B ∈ M2×2 (F ) that are not scalar matrices. Prove that A and B are similar if and only if
cA (x) = cB (x).
3. Determine all similarity classes of matrices A ∈ M3×3 (R) such that A3 = I. [Hint: The matrices A
have a minimal polynomial that divides x3 − 1.]
4. Determine all similarity classes of matrices A ∈ M3×3 (C) such that A3 = I. [Hint: The matrices A
have a minimal polynomial that divides x3 − 1.]
5. How many similarity classes are there for matrices in M6×6 (Q) that have the characteristic polynomial
of (x2 − x + 4)3 ? Give the rational canonical form for each one.
6. How many similarity classes are there for matrices in M6×6 (Q) that have the characteristic polynomial
of (x − 1)3 (x2 + x + 1)2 ? Give the rational canonical form for each one.
532 CHAPTER 10. MODULES AND ALGEBRAS

7. How many similarity classes are there for matrices in M6×6 (Q) that have the characteristic polynomial
of (x − 3)3 (x − 4)3 ? Give the rational canonical form for each one.
8. Find the number of conjugacy classes of GL3 (F2 ).
9. Find the number of conjugacy classes of GL2 (F3 ).

In Exercises 10.8.10 through 10.8.14, consider the linear transformation T : F n → F n whose matrix with
respect to the standard basis is the given matrix A. Find the rational canonical form of T and a basis B on
F n so that the matrix of T with respect to B is the rational canonical form.
   
1 1 1 2 1 1
10. F = R: a) A = 1 1 1; b) A = 1 2 1.
1 1 1 1 1 2
   
13 −25 20 18 −76 32
11. F = R: a) A =  −4 13 −8 ; b) A =  −4 21 −8 .
−10 25 −17 −16 76 −30
 
3 −1 0 0
 4 0 0 0
12. F = Q and A =  .
 7 −3 5 1
−23 13 −14 −2
 
0 1 4
13. F = F5 and A = 2 1 3.
3 1 3
 
−3 −10 25 −10
1 4 −5 2 
14. F = R and A =  −2 −4
.
12 −4 
−3 −6 15 −4
15. Let n ≥ 2. Find the rational canonical form of the n × n matrix that consists of 1s in all entries except
for 2s down the diagonal.

10.9
Jordan Canonical Form
10.9.1 – The Jordan Canonical Form
By directly following the presentation in the previous section, it may seem reasonable to study a
canonical form for matrices induced from an elementary divisor expression

V ∼
= F [x]/(p1 (x)k1 ) ⊕ F [x]/(p2 (x)k2 ) ⊕ · · · ⊕ F [x]/(pm (x)km ),

where each pi (x) is a monic irreducible polynomial. Recall that the polynomial

p1 (x)k1 p2 (x)k2 · · · pm (x)km = a1 (x)a2 (x) · · · as (x),

where the a1 (x), a2 (x), . . . , as (x) are the nonunit monic invariant factors. The Jordan canonical form
takes this approach in the special case when the characteristic polynomial of the linear transformation
T : V → V splits completely in the field F . In particular, when F is algebraically closed, this occurs
for all linear transformations.
From now on, suppose that T is a linear transformation on a vector space V of dimension n over
a field F . Suppose further that cT (x) splits completely into

cT (x) = (x − λ1 )k1 (x − λ2 )k2 · · · (x − λs )ks .


10.9. JORDAN CANONICAL FORM 533

Counted with algebraic multiplicity, this means that T has n = deg cT (x) eigenvalues. From the
elementary divisor form of the Fundamental Theorem of Finitely Generated Modules of PIDs, as an
F [x]-module, V equipped with T has the decomposition
V ∼
= F [x]/((x − λ1 )α(1)1 ) ⊕ F [x]/((x − λ1 )α(1)2 ) ⊕ · · · ⊕ F [x]/((x − λ1 )α(1)`(1) )
⊕ F [x]/((x − λ2 )α(2)1 ) ⊕ F [x]/((x − λ2 )α(2)2 ) ⊕ · · · ⊕ F [x]/((x − λ2 )α(2)`(2) )
(10.29)
⊕ ···
⊕ F [x]/((x − λs )α(s)1 ) ⊕ F [x]/((x − λs )α(s)2 ) ⊕ · · · ⊕ F [x]/((x − λs )α(s)`(s) ),
where for each i with 1 ≤ i ≤ s, the powers satisfy α(i)1 ≥ α(i)2 ≥ · · · ≥ α(i)s(i) ≥ 1 and α(i)1 +
α(i)2 + · · · + α(i)s(i) = ki . Consequently, for each i, the finite sequence (α(i)1 , α(i)2 , . . . , α(i)s(i) ) is
a partition of ki , the algebraic multiplicity of the eigenvalue λi . (See Sections 4.5.3 and 4.5.4 for a
few comments on partitions of integers.) We will refer to the partition associated to an eigenvalue
λ as α(λ) so that the content |α(λ)| is the algebraic multiplicity of λ in cT (x). The summand
F [x]/((x − λi )α(i)1 ) ⊕ F [x]/((x − λi )α(i)2 ) ⊕ · · · ⊕ F [x]/((x − λi )α(i)`(i) )
is called the (x − λi )-primary component of V .
We point out that from knowing only the characteristic polynomial cT (x), we can only tell the
content |α(λ)| and not the actual partition α(λ).
The main difference between the Jordan canonical form and the rational canonical form consists
of the basis (as a vector space over F ) used for a summand F [x]/(x − λ)m . In the rational canonical
form, we used the ordered basis (1, x, x2 , . . . , xm−1 ); the Jordan canonical form uses the ordered
basis  m−1 2

B = (x − λ) , . . . , (x − λ) , x − λ, 1 . (10.30)
The linear transformation T corresponds to the F [x]-action of x. As usual, the action of x on these
elements of B is
m−1 m−1 m m−1 m−1
x · (x − λ) = (x − λ + λ) · (x − λ) = (x − λ) + λ(x − λ) = λ(x − λ)
m−2 m−2 m−1 m−2
x · (x − λ) = (x − λ + λ) · (x − λ) = (x − λ) + λ(x − λ)
.. (10.31)
.
1 2
x · (x − λ) = (x − λ + λ) · (x − λ) = (x − λ) + λ(x − λ)
x · 1 = (x − λ + λ) · 1 = (x − λ) + λ.
Therefore, with respect to the basis B, the matrix of T is the m × m matrix
λ 1 0 ··· 0 0
 
0
 λ 1 ··· 0 0 
0 0 λ ··· 0 0
Jλ,m = . ..  .
 
.. .. .. ..
 .. . . . . .
 
0 0 0 ··· λ 1
0 0 0 ··· 0 λ

Definition 10.9.1
Let m be a positive integer. A matrix of the form Jλ,m is called a Jordan matrix or a
Jordan block .

More generally, if α is a partition of length `(α) = r, then we denote by Jλ,α the block diagonal
matrix
0 0
 
Jλ,α1 ···
 0 Jλ,α2 · · · 0 
Jλ,α =  . ..  .
 
. ..
 .. .. . . 
0 0 · · · Jλ,αr
534 CHAPTER 10. MODULES AND ALGEBRAS

The decomposition (10.29) leads to the following key theorem.

Theorem 10.9.2 (Jordan Canonical Form)


Let T : V → V be a linear transformation on a finite-dimensional vector space V over
the field F . Suppose that cT (x) splits completely in F [x]. There exists a basis of V with
respect to which the matrix of T is the block diagonal matrix

0 0
 
Jλ1 ,α(1) ···
 0 Jλ2 ,α(2) · · · 0 
..  .
 
 .. .. ..
 . . . . 
0 0 ··· Jλs ,α(s)

Furthermore, two linear transformations S, T : V → V are similar if and only if with respect
to some bases, they are represented by the same above matrix.

Proof. The block diagonal matrix above arises from the union of the bases of the form B on each
summand F [x]/(x − λ)m . The result about similarity comes from the uniqueness result of Theo-
rem 10.7.5 and the fact that V equipped with T is F [x]-module isomorphic to V equipped with S if
and only if S and T are similar. 

Definition 10.9.3
The form described in Theorem 10.9.2 is called the Jordan canonical form of T . Also, if A
is a matrix in Mn×n (F ), then the Jordan canonical form of A is the Jordan canonical form
of the linear transformation on F n defined by ~x 7→ A~x.

If F is an algebraically closed field, Theorem 10.9.2 affirms that two matrices in Mn×n (C) are
similar if and only if they have the same Jordan canonical form.
Any summand of the form F [x]/(x − λ)m in V as depicted in (10.29) is a T -stable subspace of
V . With respect to the ordered basis B given in (10.30) the restriction of T on M = F [x]/(x − λ)m
is the Jordan block Jλ,m , which has the single eigenvalue λ with algebraic multiplicity m. However,
m−1
on this summand Ker(T − λI) = Span((x − λ) ) and hence, on T restricted to M , the geometric
multiplicity of λ is 1.

Example 10.9.4. To give some examples of what the Jordan canonical form can look like, consider
the following matrices. Most are in Jordan canonical form, though one is not. For those that are,
we list the eigenvalues λ and associated partitions α(λ). For matrices that are in Jordan canonical
form, we have shown lines in the matrices to emphasize the block diagonal structure of each matrix.


matrix  eigenvalues (w/ alg. mult.) eigenspaces
2 1 0 0
0 2 1 0

0 0 2

1
λ = 2; α(2) = (4) E2 = Span(e1 )
0 0 0 2
 
2 1 0 0
 0 2 1 0 

 0 0 2

0 
λ = 2; α(2) = (3, 1) E2 = Span(e1 , e4 )
 0 0 0 2 
2 1 0 0
 0 2 0 0 

 0 0 2

1 
λ = 2; α(2) = (2, 2) E2 = Span(e1 , e3 )
0 0 0 2
 
2 1 0 0
 0 2 0 0 

 0 0 2

0 
λ = 2; α(2) = (2, 1, 1) E2 = Span(e1 , e3 , e4 )
0 0 0 2
10.9. JORDAN CANONICAL FORM 535


matrix 
eigenvalues (w/ alg. mult.) eigenspaces
2 0 0 0
 0 2 0 0 

 0 0 2

0 
λ = 2; α(2) = (1, 1, 1, 1) E2 = Span(e1 , e2 , e3 , e4 )
 0 0 0 2 
3 0 0 0
 0 3 0 0 

 0
 α(3) = (1, 1); α(2) = (2) E2 = Span(e3 ), E3 = Span(e1 , e2 )
0 2 1 
0 0 0 2
1 0 0 0
0 2 0 0

0 0 3

1
1,2,3,4 not in Jordan canonical form
0 0 0 4

Only the last matrix is not in Jordan canonical form. We can deduce that 1, 2, 3, and 4 are the
eigenvalues but the 1 in position (3, 4) makes the lower right 2 × 2 submatrix not a Jordan block.4

10.9.2 – Generalized Eigenvectors


We approached the topic of the Jordan canonical form of a linear transformation or of a matrix
from the perspective of F [x]-modules. It is useful to keep it connected to linear algebra. In linear
algebra, we use the following terminology.

Definition 10.9.5
Let T : V → V be a linear transformation.

• A generalized eigenvector of rank k of eigenvalue λ is a vector v ∈ V such that


(T − λI)k v = 0 but (T − λI)k−1 v 6= 0. (If S : V → V is a linear transformation, by
S 0 , we mean the identity function on V .)
• We denote by Eλ,k = Ker(T − λI)k .

• We call the generalized eigenspace of λ the set

Eλ,∞ = {v ∈ V | (T − λI)k v = 0 for some positive integer k}.

A generalized eigenvector of rank 1 of λ is a usual eigenvector of λ. The generalized eigenspace


of λ is precisely the (x − λ)-primary component of V .
It is obvious from the definition that Eλ,k is a subspace of V for any eigenvalue λ and any
positive integer k. Since Eλ,∞ = Eλ,k0 , where k0 is the algebraic multiplicity of λ, then Eλ,∞ is also
a subspace of V .
The action of x on the basis vectors as described in (10.31) inspires the following concept as
well. A chain of generalized eigenvectors is a sequence of nonzero vectors v1 , v2 , . . . , vk such that
(A − λI)v1 = 0 and (A − λI)vi = vi−1 for 2 ≤ i ≤ k. In a chain of generalized eigenvectors of length
k, the vector vi is a generalized eigenvector of rank i.

Proposition 10.9.6
If the sequence v1 , v2 , . . . , vk is a chain of generalized eigenvectors for a given eigenvalue,
then {v1 , v2 , . . . , vk } is linearly independent.

Proof. By definition (A − λI)v1 = ~0 and (A − λI)vi = vi−1 for 2 ≤ i ≤ k. Hence,


(
k 0 if k ≥ m
(A − λI) vm =
vm−k if k < m
for all nonnegative integers k. Assume that {v1 , v2 , . . . , vk } is a linearly dependent set. Then there
exist constants ci ∈ F not all 0 such that
c1 v1 + c2 v2 + · · · + ck vk = ~0. (10.32)
536 CHAPTER 10. MODULES AND ALGEBRAS

Let m be the least index such that (c1 , c2 , . . . , ck ) = (d1 , d2 , . . . , dm , 0, . . . , 0) is a solution to (10.32).
By the minimality, this requires that dm 6= 0. However,

(A − λ)m−1 (d1 v1 + d2 v2 + · · · + dm vm ) = 0
=⇒ d1 (A − λ)m−1 v1 + d2 (A − λ)m−1 v2 + · · · + dm (A − λ)m−1 vm = 0.

All the terms cancel except for the last one. Hence, dm v1 = ~0. Since v1 is nonzero, we deduce that
dm = 0. This leads to a contradiction, so {v1 , v2 , . . . , vk } is linearly independent. 

In the special case when V ∼


= F [x]/((x − λ)k ), any chain of generalized eigenvectors of length
k is a basis of V . Furthermore, the algebraic multiplicity of λ is k = dim V while the geometric
multiplicity is 1.

Remark 10.9.7. On the other hand, the opposite situation when V ∼ = (F [x]/(x − λ))k corresponds
to when the algebraic multiplicity of λ is equal to the geometric multiplicity. In this case, any basis
of the eigenspace of λ is a basis of V . 4

Proposition 10.9.6 easily generalizes one step further to the following.

Proposition 10.9.8
Suppose that for 1 ≤ i ≤ r, the lists (vλi ,1 , . . . , vλi ,ki ) are chains of generalized eigenvectors
of distinct eigenvalues λi . The union of the lists is a linearly independent set of vectors.

Proof. (Left as an exercise for the reader. See Exercise 10.9.16.) 

10.9.3 – Finding the Jordan Canonical Form of a Matrix


By Theorem 10.9.2, simply knowing that the characteristic polynomial of a linear transformation
splits completely over the field F tells us that there is a basis B with respect to which the matrix of
the linear transformation T is in Jordan canonical form. Knowing the format of the canonical form
helps us to find it.

Proposition 10.9.9
For each eigenvalue λ of T , the associated partition α(λ) is the conjugate of the partition
β, where

β1 = dim(Eλ,1 ) and βi = dim(Eλ,i ) − dim(Eλ,i−1 ) for i ≥ 2.

(Note that βr is the last part of the partition β if dim(Eλ,r+1 ) = dim(Eλ,r ).)

Proof. Let λ be an eigenvalue of T . Consider the restriction of T to the generalized eigenspace Eλ,∞ .
As an F [x]-submodule, Eλ,∞ is isomorphic to

F [x]/(x − λ)α1 ⊕ F [x]/(x − λ)α2 ⊕ · · · ⊕ F [x]/(x − λ)α` . (10.33)

This means that that partition associated to λ is α = (α1 , α2 , . . . , α` ) with |α| = k, the algebraic
multiplicity of λ. The restriction of T to Eλ,∞ is the same as the action of x on the primary
component in (10.33). Furthermore, with respect to the Jordan basis on each summand of (10.33),
the matrix of T is the Jordan matrix Jλ,α .
Then by (10.31), the eigenspace Eλ is
α1 −1 α2 −1 α` −1 
Eλ = Span (x − λ , 0, 0, . . . , 0), (0, x − λ , 0, . . . , 0), . . . , (0, 0, . . . , 0x − λ ) .
10.9. JORDAN CANONICAL FORM 537

In particular, the geometric multiplicity of λ is equal to the length of the partition α, which is also
α10 , the first part of the conjugate partition to α. We also notice that

(T − λI)(Eλ,∞ ) ∼
= F [x]/(x − λ)α1 −1 ⊕ F [x]/(x − λ)α2 −1 ⊕ · · · ⊕ F [x]/(x − λ)α` −1 .

If αi = 1, then the summand F [x]/(x − λ)αi −1 is the trivial module. Now the conjugate of the
partition (α1 − 1, α2 − 1, . . . , α` − 1) is the partition of α0 but with the first row removed. For
example, if α = (5, 2, 2, 1) we have

• • • •



α= • and α0 = .

Thus,
dim(T − λI)(Eλ,∞ ) = α20 + α30 + · · · + α`0 = k − α10 .
By the first isomorphism theorem, (x−λ)(Eλ,∞ ) ∼ = Eλ,∞ / Ker(x−λ) as F [x]-modules. Furthermore,
x − λ acts on (x − λ)(Eλ,∞ ) as a linear transformation with Jordan canonical form of eigenvalue λ
and partition (α1 − 1, α2 − 1, . . . , α` − 1). By a repeated application of this result, we deduce that

dim(T − λI)j (Eλ,∞ ) = k − (α10 + α20 + · · · + αj0 ).

Since (T − λI)(Eλ,∞ ) ∼
= Eλ,∞ / Ker(T − λI)j , then dim Eλ,j = dim Ker(T − λI)j = α10 + α20 + · · · + αj0 .
0
We deduce that αj = dim Eλ,j − dim Eλ,j−1 . 

Example 10.9.10. We perform all the following calculations assisted by a computer algebra system.
Consider the 5 × 5 matrix  
−2 3 2 −2 2
−2 3 1 −1 1
 
−7
A= 5 5 −3 4.
−3 2 1 1 2
−2 2 1 −1 3
We calculated the characteristic polynomial to be cA (x) = (x − 2)5 . Then the reduced row echelon
forms of A − 2I and (A − 2I)2 are

− 21 − 12 − 12 − 12
   
1 0 0 0 −1 1
0 1 0 0 0  0 0 0 0 0 
   
0
 0 1 −1 −1 and 0
 0 0 0 0  
0 0 0 0 0  0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0

with (A − 2I)3 = 0. We see that the generalized eigenspaces are dim E2,1 = 2, dim E2,2 = 4, and
dim E2,3 = 5. Thus, by Proposition 10.9.9, α0 = (2, 2, 1) so the partition associated to λ = 2 is the
conjugate α = (3, 2). Hence, the Jordan canonical form of A is
 
2 1 0 0 0
0 2 1 0 0
 
J2,(3,2) 0
= 0 2 0 0.
0 0 0 2 1
0 0 0 0 2

4
538 CHAPTER 10. MODULES AND ALGEBRAS

Exercises for Section 10.9


1. List the possible Jordan canonical forms for a matrix A ∈ M5×5 (R) that has cA (x) = (x − 2)3 (x + 1)2 .
2. List the possible Jordan canonical forms for a matrix A ∈ M6×6 (R) that has cA (x) = (x2 − 4)x4 .
3. List the possible Jordan canonical forms for a matrix A ∈ M6×6 (R) that has cA (x) = (x2 − 3)3 .
4. List the possible Jordan canonical forms for a matrix A ∈ M4×4 (C) that has cA (x) = (x2 + 1)2 .
5. List the possible Jordan canonical forms for a 5 × 5 matrix with eigenvalue 5 of algebraic multiplicity
1, and the eigenvalue 4 of algebraic multiplicity 4 and geometric multiplicity of 3.
6. List the possible Jordan canonical forms for a 5 × 5 matrix with eigenvalue λ1 of algebraic multiplicity
2 and geometric multiplicity of 1, and the eigenvalue λ2 of algebraic multiplicity 3 and geometric
multiplicity of 2.
7. Let F be a field. Prove that a linear transformation π : F n → F n that is a projection has n eigenvalues
counted with multiplicity, all of which are 1 or 0. Prove also that all Jordan blocks of the Jordan
canonical form have size 1. [Hint: Exercise 10.4.16.]
8. Determine the Jordan canonical forms of the following matrices
   
0 1 −1 5 −4 −4
(a)  13 0 −7 (b) −1 4 1 .
−4 2 0 2 −3 0
9. Determine the Jordan canonical forms of the following matrices
   
3 −9 −9 2 −9 −9
(a) −3 13 11  (b) −1 2 3 .
4 −16 −14 2 −6 −7
10. List all Jordan canonical forms for a matrix with characteristic equation (x − λ)5 .
11. Suppose we work in the field F2 . Prove that the matrix
 
1 0 1 0
1 1 0 1
 
0 1 0 1
1 1 1 0
does not have any eigenvalues over F2 . However, prove that its Jordan canonical form exists in F4 and
find this Jordan canonical form.
12. Find the Jordan canonical form (over C) of the matrix A satisfying aij = 1 if i 6= j and aii = 2.
13. Prove that
n n n
 n  n−1  n−2  n−k+1 
λ 1
λ 2
λ ··· k−1
λ
n n
0
 λn 1
λ n−1
· · · k−2
λ n−k+2 

n n n−k+3 
n 0 0 λ · · · λ

Jλ,k =  k−3 .
 .. .. .. .. ..


 . . . . . 
n
0 0 0 ··· λ
14. Consider the matrix  
λ 1 0
A = 0 λ 1 .
0 0 λ
Express the matrix eAt as a matrix of functions in the variable t.
15. For a square matrix A, we define eA = ∞ 1 n
P
n=0 n! A . Suppose that A is a square matrix over a field F
such that cA (x) splits completely over F . Let t be a free parameter F . Prove that
2
tk−1 λ
 λ 
e teλ t2! eλ · · · (k−1)! e
tk−2 λ 
eλ teλ · · · (k−2)!

0 e 
 
tJλ,k λ tk−3 λ 
e =0 0 e · · · (k−3)! e  .

 .. .. .. ..
 
.. 
. . . . . 
0 0 0 ··· eλ
16. Prove Proposition 10.9.8.
10.10. APPLICATIONS OF THE JORDAN CANONICAL FORM 539

10.10
Applications of the Jordan Canonical Form
10.10.1 – Finding a Basis for the Jordan Canonical Form
Though the Fundamental Theorem for Finitely Generated Modules over a PID allowed us to find
the Jordan canonical form of a linear transformation T : V → V (whose characteristic polynomial
splits completely over the base field), it did not provide a constructive method to find a basis B with
respect to which T has a matrix in Jordan canonical form. We consider this problem now.
Since we need to perform calculations, we must express vectors in coordinates with respect to
some basis. For the remainder of this section, suppose that V has a fixed basis E = {e1 , e2 , . . . , en }.
Suppose that with respect to E, the matrix of the linear transformation T in question is A. Also,
we consider vectors v ∈ V ∼ = F n as expressed in coordinates with respect to E.
Propositions 10.9.6 and 10.9.8 as well as Remark 10.9.7 offer a strategy to find a desired basis.
If Bλ is a basis of the generalized eigenspace of λ with respect to which T restricted to Eλ,∞ is the
Jordan submatrix of λ, then [
B= Bλ
λ
is a basis of V with respect to which the matrix of T is in Jordan canonical form.
We first calculate the characteristic polynomial of A and in particular determine the eigenvalues
and their algebraic multiplicities. Second, we need to calculate the geometric multiplicities of each
eigenvalue λ, i.e., the dimension of the eigenspaces dim Eλ .
We described the algorithms in linear algebraic terms according to a few cases applied to each
eigenvalue λ.
Case 1: dim Eλ is equal to the algebraic multiplicity. In this case, the submatrix pertaining
to λ of the Jordan canonical form is Jλ,(1,1,...,1) = λI. Also, the eigenspace Eλ is equal to the
generalized eigenspace. Then any basis for Eλ will do, which is any basis of Ker(A − λI). This
is a straightforward calculation that uses the reduced row echelon form.
(We do not give an example for this case since it is a common exercise in linear algebra.)
Case 2: The geometric multiplicity of λ is 1. In this case, the submatrix pertaining to λ of
the Jordan canonical form is the single Jordan block Jλ,k , where k is the algebraic multiplicity
of λ. By Proposition 10.9.6, a basis of the generalized eigenspace of λ will consist of a single
chain (v1 , v2 , . . . , vk ) of generalized eigenvectors.
• The vector v1 may be chosen as any nonzero vector in Eλ , i.e., any nonzero solution to
(A − λI)u = 0.
• For all i ≥ 2, the vector vi satisfies T (vi ) = λvi + vi−1 . Hence, select vi as any solution to
(A − λI)u = vi−1 .
• The inductive algorithm terminates when i = k, in which case, (A−λI)u = vk has no solutions.
Example 10.10.1. Suppose that the matrix A is
 
−3 0 1
A =  1 −1 −1 .
0 1 2
The characteristic polynomial is cA (x) = x3 + 6x2 + 12x + 8 = (x + 2)3 . Hence, the only eigenvalue
of A is −2 with an algebraic multiplicity of 3. We calculate the eigenspace E−2 by
   
−1 0 1 1
(A + 2I)u = 0 ⇐⇒  1 1 −1 u = 0 ⇐⇒ u ∈ Span 0 = E−2 .
0 1 0 1
540 CHAPTER 10. MODULES AND ALGEBRAS

Since the geometric multiplicity is 1, we are in Case 2. The Jordan canonical form will be J−2,3 .
Call v1 the listed generating vector of E−2 . To find v2 , a generalized eigenvector of rank 2, we solve
     
−1 0 1 1 −1
(A + 2I)u = v1 ⇐⇒  1 1 −1 u = 0 ⇐⇒ u ∈  1  + E−2 .
0 1 0 1 0

Call v2 = (−1, 1, 0)> . Then, to find a rank 3 generalized eigenvector v3 we solve


     
−1 0 1 −1 1
(A + 2I)u = v2 ⇐⇒  1 1 −1 u =  1  ⇐⇒ u ∈ 0 + E−2 .
0 1 0 0 0

Call v3 = (1, 0, 0)> . The list (v1 , v2 , v3 ) of vectors (expressed in coordinates with respect to E), is a
chain of generalized eigenvectors for −2 and must be a basis. Setting
 
 1 −1 1
M = v1 v2 v3 = 0 1 0 ,
1 0 0
it is an easy calculation to check that
 
−2 1 0
M −1 AM = J−2,3 = 0 −2 1 .
0 0 −2
Hence, A = M J−2,3 M −1 as desired. 4
The following example involves a combination of Case 1 and Case 2, one for each eigenvalue.
Example 10.10.2. Consider the linear transformation T : V → V , with respect to which T (u) =
Au with  
−2 6 −3
A = −2 6 −2 .
0 2 1
The characteristic polynomial is cA (x) = x3 − 5x2 + 8x − 4 = (x − 1)(x − 2)2 . We determine the
eigenspaces as
   
−3 6 −3 −1
E1 : (A − I)u = 0 =⇒ −2 5 −2 u = 0 =⇒ u ∈ Span  0  = E1
0 2 0 1
   
−4 6 −3 0
E2 : (A − 2I)u = 0 =⇒ −2 4 −2 u = 0 =⇒ u ∈ Span 1 = E2 .
0 2 −1 2
We call v1 the column vector given as the generator of E1 . Since the algebraic multiplicity of the
eigenvalue of 1 is 1, then the Jordan submatrix associated to 1 is J1,1 . Since the algebraic multiplicity
of the eigenvalue 2 is 2, while the geometric multiplicity is 1, the Jordan submatrix associated to 2
is J2,(2) . Hence, the Jordan canonical form of A is
 
1 0 0
J = 0 2 1 ,
0 0 2
(though we have the freedom to change the order of the Jordan blocks). We need to perform the
steps for each Jordan block on v2 , so we need to solve
     
−4 6 −3 0 3/2
(A − 2I)u = v2 =⇒ −2 4 −2 u = 1 =⇒ u =  1  + E2 .
0 2 −1 2 0
10.10. APPLICATIONS OF THE JORDAN CANONICAL FORM 541

We call v3 = (3/2, 2, 0)> . We can now set


 
 −1 0 3/2
M = v1 v2 v3 = 0 1 1 
1 2 0
and it is a straightforward calculation to verify that M −1 AM = J. 4
Case 3: 1 < geometric multiplicity of λ < algebraic multiplicity of λ. Suppose that the al-
gebraic multiplicity of λ is k, then the partition α associated to λ is neither (k) nor (1, 1, . . . , 1).
Hence, the λ-Jordan submatrix of the Jordan canonical form is neither diagonal nor a single
Jordan block.
We must find ` = (geometric multiplicity of λ) chains of generalized eigenvectors for λ, all of
eigenvalue λ so that the union of all the chains is linearly independent. This union will be a
basis of the generalized eigenspace of λ. The following example illustrates the care required in
finding these chains.
Example 10.10.3. Consider the linear transformation T : R4 → R4 , such that, with respect to the
standard basis, T (v) = Av with
 
5 0 1 1
−2 2 −1 0 
A= ,
0 1 3 −1
−4 −1 −2 2
The characteristic polynomial is cA (x) = x4 − 12x3 + 54x2 − 108x + 81 = (x − 3)4 . So A has the
eigenvalue of 3 with an algebraic multiplicity of 4. We determine the eigenspace E3 by
     
2 0 1 1 −1 −1
−2 −1 −1 0   0   2 
(A − 3I)u = 0 =⇒   u = 0 =⇒ u ∈ Span   ,   .
0 1 0 −1  2   0 
−4 −1 −2 −1 0 2
This shows that though the eigenvalue 3 has an algebraic multiplicity of 4, it has a geometric
multiplicity of 2. With the information we have so far, we know that the Jordan canonical form
must be    
3 1 0 0 3 1 0 0
0 3 1 0
 or 0 3 0 0 ,
 

0 0 3 (10.34)
0  0 0 3 1
0 0 0 3 0 0 0 3
but we cannot yet determine which of these two it is. We are therefore looking for s = 2 chains of
generalized eigenvectors. Observe that though the first vectors in the two chains will be a basis of
E3 , we cannot assume that either of them is (−1, 0, 2, 0)> or (−1, 2, 0, 2)> .
The rank 2 generalized eigenvectors solve the equation
     
2 0 1 1 −1 −1
−2 −1 −1 0  0 2
  u = r1   + r2   ,
0 1 0 −1 2 0
−4 −1 −2 −1 0 2
with neither r1 nor r2 yet determined. By moving the right-hand side over and collecting into a
single matrix vector equation, we rewrite this as
 
  x1  
2 0 1 1 1 1 x2  0

−2 −1 −1 0 0 −2 x3  0
   =  . (10.35)
0 1 0 −1 −1 0   x4 
 0
−4 −1 −2 −1 0 −2  r1  0
r2
542 CHAPTER 10. MODULES AND ALGEBRAS

The reduced row echelon form of this system is


 
x1
1 1
   
1 0 0 0 x2  0
2 2  
0 1 0 −1 0 2 x3  0
  =  .
 (10.36)
x4 
0 0 0 0 1 1   0
0 0 0 0 0 0  r1  0
r2

Note that r1 and r2 must solve r1 + r2 = 0. This implies that there is only a subspace of E3 of
dimension 1 that involve generalized eigenvectors of rank 2. Hence, we will be able to find one chain
of length 1 and one of length 3, and therefore the Jordan canonical form of A is the first of the
two possibilities listed in (10.34). From (10.36), it is straightforward to deduce that a generalized
eigenvector of rank 2 has the form
     
−1 −1 0
0 2 −2
u = s1 
 2  + s2  0  + s3  0 
    

0 2 0

with s3 6= 0. (If s3 = 0, then the vector in E3,2 is actually in E3,1 , so is only of rank 1.)
We proceed to find a generalized eigenvector of rank 3. We need to solve
     
−1 −1 0
0 2 −2
(A − 3I)u = s1 
 2  + s2  0  + s3  0  .
    

0 2 0

Repeating the same strategy as in (10.35), we solve the system


 
x1
  x2   
2 0 1 1 1 1 0 x3 
 0
−2 −1 −1 0 0 −2 2   0
  x4  =   .
0 1 0 −1 −1 0 0   0
 s1 
−4 −1 −2 −1 0 −2 0  
 0
s2 
s3

The reduced row echelon form of this system is



x1
 1 1
  x2   
1 0 2 2 0 0 1  
 x3  0
0 1 0 −1 0 0 0    0
  x4  =   .

0 0  0 (10.37)
0 0 1 0 0  s1 
0 0 0 0 0 1 −2  
 s2  0
s3

From (10.37), it is straightforward to deduce that a generalized eigenvector of rank 3 has the form
     
−1 −1 −1
0 2 0
u = t1 
 2  + t2  0  + t3  0 
    

0 2 0

with t3 6= 0.
10.10. APPLICATIONS OF THE JORDAN CANONICAL FORM 543

We can now easily get a chain of generalized eigenvectors of length 3. Use


         
−1 −1 −2 −2 0
0 0 2 2 2
 0  , v2 = (A − 3I)  0  =  0  , v1 = (A − 3I)  0  = −2 .
v3 =          

0 0 4 4 2
Finally, to find the fourth basis vector, we simply need a chain of length 1 (i.e., an eigenvector) that
is not in Span(v1 , v2 , v3 ). In other words, we need an eigenvector v4 such that E3 = Span(v1 , v4 ).
The vector v4 = (−1, 0, 2, 0)> will do. Setting,

M = v1 v2 v3 v4 ,
it is easy (with a computer) to verify that
 
3 1 0 0
−1
0 3 1 0
M AM = 
0
.
0 3 0
0 0 0 3
This confirms that (v1 , v2 , v3 , v4 ) is an ordered basis, with respect to which the linear transformation
T is in Jordan canonical form. 4

10.10.2 – Useful CAS Commands


Example 10.10.3 illustrates how difficult it can be to find a basis with respect to which a linear
transformation T : V → V is in Jordan canonical form. Because the Jordan canonical form is useful
for a number of applications, many computer algebra systems implement it.
The following command is in the LinearAlgebra package in Maple.
Maple Function
JordanForm(A); Calculates the Jordan canonical form J of the matrix A, mak-
ing the assumption that the coefficients are in C. The syntax
JordanForm(A,output=’Q’); also calculates a matrix M such that
A = M JM −1 .

10.10.3 – Applications
Let A ∈ Mn (F ), where F is a field. If A is diagonalizable, then there exists an invertible matrix
P ∈ Mn (F ) such that P −1 AP is a diagonal matrix D consisting of the eigenvalues λ1 , λ2 , . . . , λn
down the diagonal. Then we can find a formula for powers of A with
 j 
λ1 0 · · · 0
 0 λj · · · 0 
j −1 j j −1 2  −1
A = (M DM ) = M D M = M  . M .

 .. .. . . .. 
. . . 
0 0 ··· λjn

In Exercise 10.9.15, we discussed the exponential of a matrix eA . If A is diagonalizable, then


 
∞ ∞ ∞
X 1 X 1 X 1
eA = Aj = M Dj M −1 = M  Dj  M −1
j=0
j! j=0
j! j=0
j!
P∞ j 
j=0 λ1 /j! P ···
 λ1
0 0

e 0 ··· 0
∞ j  0 eλ2 · · · 0 
0 j=0 λ2 /j! · · · 0
 
  −1  −1
=M .. .. .. ..
M = M   .. .. .. ..  M .

. . .

. . . .
P∞ . j
   
λn
0 0 ··· j=0 λn /j!
0 0 · · · e
544 CHAPTER 10. MODULES AND ALGEBRAS

These formulas for the power and the exponential of a matrix are predicated on the fact that A is
diagonalizable. Not every matrix is diagonalizable. However, every square matrix does have a unique
Jordan canonical form when considered with coefficients in E, the splitting field of the characteristic
polynomial cA (x) over F . Exercises 10.9.13 and 10.9.15 establish formulas for the power and the
exponential function of Jordan block matrices. These formulas are useful for finding formulas for
various applications (e.g., difference equations, and systems of linear ordinary differential equations.)

Example 10.10.4. Consider the sequence of integers {fn }∞


n=0 defined by f0 = 2, f1 = 3, f2 = 5,
and
fn+3 = fn+2 + 8fn+1 − 12fn for n ≥ 0.
The first few terms of this sequence are

n 0 1 2 3 4 5 6 7
fn 2 3 5 5 9 −11 1 −195

We can find a formula for this sequence in the following way. Define a new sequence {gn } in Z3 by
 
fn
gn = fn+1  .
fn+2

Then {gn } satisfies the recurrence relation


     
fn+1 fn+1 0 1 0
gn+1 = fn+2  =  fn+2 = 0 0 1 gn .
fn+3 −12fn + 8fn+1 + fn+2 −12 8 1

An explicit formula for gn , and hence for fn , comes from gn = An g0 , where A is the 3 × 3 coefficient
matrix appearing above. In order to calculate this, we must find a formula for An . The characteristic
polynomial of A is x3 − x2 − 8x + 12 = (x − 2)2 (x + 3). Whether doing the work by hand or using
technology, we can find that A = M JM −1 , where
   
−3 0 0 4 −30 21
1 −12 −60
J =  0 2 1 and M= 12  .
25
0 0 2 36 −120 −36

Using Exercise 10.9.13 and the block diagonal structure of J, we deduce that

(−3)n 0
     
4 −30 21 0 12 −12 3 2
1  1
gn = M J n M −1 g0 = −12 −60 12   0 2n n2n−1   0 −3 −1 3
25 2
36 −120 −36 0 0 2n 12 −2 −2 5
(−3)n − 20n2n−1 + 49 · 2n
 
1 
= −3(−3)n − 40n2n−1 + 78 · 2n  .
25
9(−3)n − 80n2n−1 + 116 · 2n
1 n
A formula for fn is the first entry of gn so fn = 25 ((−3) − 20n2n−1 + 49 · 2n ). 4

Exercises for Section 10.10

In Exercises 10.10.1 through 10.10.6, for the following matrix A, find the Jordan canonical form J of A
along with a matrix M such that A = M JM −1 . If the field is not stated, assume that it is C.
 
−1 −2 2
1. A =  2 4 −1.
−3 −2 4
10.10. APPLICATIONS OF THE JORDAN CANONICAL FORM 545
 
−1 −2 2
2. A= 2 4 −1.
−3 −2 4
 
1 1 0
3. A = 1 1 0 over the field F2 .
0 1 1
 
0 −1 1 0
1 0 0 1
4. A=  .
0 0 0 −1
0 0 1 0
 
λ 1 1 1
0 λ 1 1
5. A= 0
.
0 λ 1
0 0 0 λ
 
−3 7 9 −6
5 −10 −12 9
6. A=  .
−1 1 3 −1
7 −16 −17 14
7. Use Exercise 10.9.13 to give a formula for
 n
6 −7 −4
1 0 −1 .
3 −5 −1

8. Use a CAS to find a formula for  n


0 −1 2 1
1
 4 −1 0

−2 0 4 1
−4 −2 3 4
for all nonnegative integers n.
9. Find a formula for the terms of an integer sequence {fn }∞
n=0 that satisfies f0 = 0, f1 = 1, f2 = 1, and

fn+3 = 6fn+2 − 12fn+1 + 8fn for n ≥ 0.

10. Suppose that a parametric curve ~ x(t) in Rn satisfies the system of ordinary differential equations
0
~x (t) = A~x(t), where A is an n × n matrix with coefficients in R. Prove that ~ ~ where C
x(t) = eAt C, ~ is
a constant vector, is a solution to the differential equation. Also observe that ~ ~
x(0) = C.
11. Use Exercises 10.10.10 and 10.9.15 to solve the following system of differential equations

0
x (t) = −2x(t) + y(t) + 2z(t)

y 0 (t) = −3x(t) + 4y(t) + 2z(t)
 0

z (t) = 0x(t) − 3y(t) + z(t)

subject to the initial condition (x(0), y(0), z(0)) = (2, 1, 1). [Use a CAS to assist calculations.]
12. Use Exercise 11.2.19 to deduce that if a matrix A ∈ M4 (Q) has the characteristic polynomial (x2 −2)2 ,
then it can only have one of the following two Jordan canonical forms
√  √ 
2 √0 0 0 2 √1 0 0
 0 2 0 0   0 2 0 0 
 √  or  √ .
 0 0 − 2 0

  0 0 − 2 1 

0 0 0 − 2 0 0 0 − 2
546 CHAPTER 10. MODULES AND ALGEBRAS

10.11
A Brief Introduction to Path Algebras
We conclude the chapter with an extended example that involves both algebras and modules. The
topic of path algebras has garnered interest in mathematical research in recent years.
In this section we only have room to describe the algebraic structures of a path algebra and
modules over path algebras. The reader should understand that, like other “brief introductions” in
this book, this section scratches the surface of this topic.

10.11.1 – Path Algebras


A path algebra is an algebra over a field generated by paths on a directed graph. We introduce some
common terminology in order to make this more precise.

Definition 10.11.1
A quiver is a quadruple Q = (V, E, h, t), where V and E are sets and h and t are func-
tions h, t : V → E. The set V is called the set of vertices, the elements of E are called
(directed) edges or arrows, and the functions h and t are called the head and tail functions,
respectively.

A quiver is also known as a directed graph. Though the terminology of “directed graph” is more
common in the context of combinatorics, the term “quiver” is common in algebra.
Let K be a field and Q a quiver. We construct the path algebra K[Q] of the field K over Q as
follows. For each vertex v ∈ V , denote by ev a symbol representing the stationary path at the vertex
v. A path p is either ev for some v ∈ V or a finite expression p = an · · · a2 a1 , where a1 , a2 , . . . , an ∈ E
are arrows such that h(ai ) = t(ai+1 ) for i = 1, 2, . . . , n − 1. In intuitive terms, a path is a sequence of
arrows that are strung together head-to-tail. Caveat: Like functions, we read the arrows in a path
p = a1 a2 · · · an from right to left so that an is the first arrow in p while a1 is the last arrow.
We extend the head and tail functions from E to all paths as follows. If h(ev ) = t(ev ) = ev for
all stationary paths with v ∈ V . If p = an · · · a2 a1 , then h(p) = h(an ) and t(p) = t(a1 ).
The elements in the path algebra K[Q] are symbolic linear combinations of the form

c1 p1 + c2 p2 + · · · + cn pn

where ci ∈ K and pi is a path of the quiver Q. Note that the set of paths form a basis for the
vector space K[Q]. The product on K[Q] is defined on basis elements by the concatenation of paths,
namely
(
an · · · a2 a1 bm · · · b2 b1 if h(bm ) = t(a1 )
(an · · · a2 a1 ) · (bm · · · b2 b1 ) =
0 otherwise,
(
an · · · a2 a1 if t(a1 ) = v
(an · · · a2 a1 ) · ev =
0 otherwise,
(
an · · · a2 a1 if h(an ) = v
ev · (an · · · a2 a1 ) =
0 otherwise,
(
ev if v = w
ev · ew =
0 otherwise.

Finally, the product on K[Q] is determined by extending by distributivity and the above products
on paths.
10.11. A BRIEF INTRODUCTION TO PATH ALGEBRAS 547

4 1 5
a a d
c
1 a 2 b 3 1 b d 3 3 c 4
f f
c 2
d g
b
5 2 6
Q1 Q2 Q3

Figure 10.2: Examples of quivers

Example 10.11.2. Consider the quiver Q1 as depicted in Figure 10.2. This quiver has 5 stationary
paths e1 , e2 , e3 , e4 , e5 . Each of the directed edges a, b, c, d is a path. The quiver also has three paths
of length 2 (ba, cb, and db) and two paths of length 3 (cba and dba). Recall that, like functions,
we read the sequence of arrows from right to left. So the path algebra K[Q] is a vector space of
dimension 14 over K. Suppose that α, β ∈ R[Q] are

α = e2 + 3ba − 2c,
β = 5b − a + ba + 3e4 .

The sum and the product of these elements are

α + β = e2 + 3e4 − a + 5b − 2c + 4ba,
αβ = 5e2 b − e2 a + e2 ba + 3e2 e4 + 15bab − 3baa
+ 3baba + 9bae4 − 10cb + 2ca − 2cba − 2ce4
= 0 − a + 0 + 0 + 0 + 0 + 0 + 0 − 10cb + 0 − 2cba + 0
= −a − 10cb − 2cba,
βα = 5be2 + 15bba − 10bc − ae2 − 3aba + 2ac
+ bae2 + 3baba − 2bac + 3e4 e2 + 9e4 ba − 6e4 c
= 5b + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 − 6c
= 5b − 6c. 4

Though the quiver Q1 in Figure 10.2 produces a path algebra that is finite dimensional, the
quiver Q2 has a path algebra K[Q2 ] that is infinite dimensional. The paths in the path algebra
K[Q2 ] are
ea , e2 , e3 , a, b, c, d, da, db, dc, f n d, f n da, f n db, f n dc

for any positive integer n. The path f n intuitively means traveling around the f loop path n times.
In particular, a loop-edge is not idempotent.
In a similar manner, the path algebra for K[Q3 ] is also infinite dimensional because of the loop
f ed. Though, the there is no loop-edge, the path f ed is a loop and hence for all positive n, the path
(f ed)n consists of going around the loop, starting at vertex 4, n times.

Proposition 10.11.3
Let Q = (V, E, h, t) be a quiver where V is a finite set and K a field. The path algebra
K[Q] is a unital associative algebra with multiplicative unit
X
1= ev . (10.38)
v∈V
548 CHAPTER 10. MODULES AND ALGEBRAS

Proof. Note that for all paths p, the product ev p = 0 unless v = h(p), in which case eh(p) p = p.
Similarly, pev = 0 if v 6= t(p) and is p if v = t(p). Let α ∈ K[Q] be a linear combination of paths
α = c1 p1 + c2 p2 + · · · + cn pn . Then
! n X n n
X X X X
ev α= ci ev pi = ci eh(pi ) pi = ci pi = α.
v∈V i=1 v∈V i=1 i=1

P
Multiplication by α on the left also yields α. Thus, v∈V ev is an identity in K[Q].
That K[Q] is associative follows from how the product on paths consists of concatenation. Sec-
tion 3.8.3 discussed in detail how the operation of concatenation is associative. 

10.11.2 – Modules over Path Algebras

Let Q be a quiver and let K be a field. The product on K[Q] equips the path algebra with the
structure of a ring that is generally not commutative. We saw in earlier sections what data is
necessary to describe modules for various rings, e.g., F [x]-modules (Example 10.3.13) or group ring
modules (Example 10.3.14). In a similar way, we explore the data for a left K[Q]-module.
Let Q = (V, E, h, t) be a quiver with V finite and let W be a left K[Q]-module. By definition,
W must be an abelian group. For all vertices v, denote by Wv the subset ev W . The subsets Wv are
not submodules but just subgroups of W . By (10.38), for all m ∈ W ,

!
X X
m=1·m= ev ·m= ev · m.
v∈V v∈V

Hence, W is the sum of Wv as subgroups. Furthermore, if ei ·m = ej ·n with i and j distinct vertices,


then

ei · m = ei · (ei · m) = ei · (ej · n) = (ei ej ) · n = 0.

Thus, the intersection of distinct Wv subgroups is the trivial subgroup {0}, so W is the direct sum

M
W = Wv .
v∈V

Because ev is idempotent, Kev is a subring of K[Q] that is isomorphic to the field K. Hence,
since modules over fields are vector spaces, the action of the subring Kev on Wv equips Wv with
the structure of a vector space over K.
For each arrow a ∈ E, we have as a product of paths, a = aet(a) . Consequently, a · Wv = {0} for
all vertices v unless v = t(a). A priori, a · W might be any subset of W but since a · W = (eh(a) a) · W ,
then a · W is a subset of Wh(a) . Furthermore, by linearity and distributivity properties for modules,
a acts as a linear transformation ϕa : Wt(a) → Wh(a) .
The action of any path p = an · · · a2 a1 on M consists of the linear transformation

ϕan ◦ · · · ◦ ϕa2 ◦ ϕa1 : Wt(a1 ) → Wh(an ) . (10.39)

Finally, the action of any element in K[Q] follows from the action of any path and extending by
linearity to all of W . We have shown the following.
10.11. A BRIEF INTRODUCTION TO PATH ALGEBRAS 549

Proposition 10.11.4
The data of a K[Q]-module W consists of a pair ({Wv }v∈V , {ϕa }a∈E ) where Wv is a vector
space over K for each vertex v ∈ V and ϕa : Wt(a) → Wh(a) is a linear transformation for
each arrow a ∈ E. The module W is the direct sum (as vector spaces over K) of the Wv
for all v ∈ V and the action of K[Q] satisfies:

(1) ev acts on W by projection onto the summand Wv ;


(2) a path p acts on W according to (10.39);
(3) the action of any other element of K[Q] on W is determined by the previous two,
extended by linearity.

The data involving vector spaces as described by Proposition 10.11.4 motivates the alternate
terminology for K[Q]-modules: quiver representations.

Example 10.11.5. Consider the quiver Q1 shown in Figure 10.2. A R[Q1 ]-module consists of a
R-vector spaces Wv associated to each vertex v of the quiver and linear transformations ϕa : Wt(a) →
Wh(a) . It is possible to depict a module with a diagram.
Consider the following R[Q1 ]-module W , in which we depict the linear transformations as given
with respect to the standard bases.

R

1 −3
   
−1 3 6
1 1 2
R R2 R2

4 5

An element of the module is a 5-tuple of vectors in R ⊕ R2 ⊕ R2 ⊕ R ⊕ R, with each component being
a vector space attached to a given vertex. Take for example w = (2, 32 , −1 3 , 3, 5) and consider the
element α ∈ R[Q1 ] given by α = 3e1 + 2e3 + b − 5db. Then the action α · w is

α · w = 3e1 · w + 2e3 · w + b · w − 5db · w


−1
        
3 6 3
= (3 · 2, 0, 0, 0, 0) + 0, 0, 2 · , 0, 0 + 0, 0, , 0, 0
3 1 2 2
    
 3 6 3
+ 0, 0, 0, 0 − 5 4 5
1 2 2
−2
       
21
= (6, 0, 0, 0, 0) + 0, 0, , 0, 0 + 0, 0, , 0, 0 + (0, 0, 0, 0, 119)
6 7
   
19
= 6, 0, , 0, 119 . 4
13

It is important to observe in Proposition 10.11.4 how modules of path algebras generalize many
situations in linear algebra. For example, everything in linear algebra about properties of a linear
transformation T : V → W between vector spaces over a field K falls under the purview of modules
of the path algebra K[Q] where Q is the simple quiver.

1 a 2
Qarrow :

Similarly, the study of properties of a linear transformation T from a vector space V to itself
corresponds to studying K[Q]-modules for the simple loop quiver.
550 CHAPTER 10. MODULES AND ALGEBRAS

1
Qloop : a

For this latter quiver Qloop , the set {e1 , a, a2 , a3 , . . .} is a basis of the path algebra K[Q]. In the
algebra K[Q], since there is only one vertex, the ring identity is just the stationary path e1 . Fur-
thermore, by how elements multiply, we see that K[Q] is isomorphic to the polynomial ring K[x]
simply by mapping a to x. This recovers the earlier result that a left K[x]-module is defined by the
data of a vector space V equipped with a linear transformation T : V → V .

10.11.3 – K[Q]-Module Homomorphisms


As with any algebraic structure, we do not consider arbitrary functions between them but rather
homomorphisms. We know the definition for module homomorphisms, but it is useful to analyze
K[Q]-module homomorphisms in light of Proposition 10.11.4.
Let Q = (V, E, h, t) be a quiver and let K be a field. Let U and W be K[Q]-modules with
data ({Uv }v∈V , {ϕa }a∈E ) and ({Wv }v∈V , {ψa }a∈E ), respectively. A K[Q]-module homomorphism
f : U → W is first of all an additive function from
M M
f: Uv −→ Wv .
v∈V v∈V

Because K[Q] contains the field K as the subring K · 1, and since f (cu) = cf (u) for all c ∈ K · 1,
then f is also a linear transformation between the direct sums. However, for all v ∈ V and all
u ∈ U , the linear transformation f satisfies f (ev u) = ev f (u). Now ev u is the projection of u onto
the Uv component of U and ev f (u) is the projection of f (u) onto the Wv component of W . Hence,
f maps Uv components to Wv components. In other words, f consists of a collection of linear
transformations fv : Uv → Wv for all v ∈ V . For each edge a ∈ E, a K[Q]-module homomorphism
also satisfies f (au) = af (u). Now,
(
ϕa (u) if u ∈ Ut(a)
a·u=
0 otherwise

and similarly for the action in W . Hence, the identity f (au) = af (u) translates into fh(a) (ϕa (u)) =
ψa (ft(a) (u)) for all u ∈ Ut(a) . If this identity holds, then by associativity of path concatenation and
by linearity of f , the f (αu) = αf (u) for all u ∈ U . We have shown the following characterization of
K[Q]-module homomorphisms.

Proposition 10.11.6
Let U = ({Uv }v∈V , {ϕa }a∈E ) and W = ({Wv }v∈V , {ψa }a∈E ) be two K[Q] modules for a
quiver Q = (V, E, h, t). A K[Q]-module homomorphism from U to W consists of a collection
of linear transformations fv : Uv → Wv for all v ∈ V satisfying

fh(a) ◦ ϕa = ψa ◦ ft(a) . (10.40)

A K[Q]-module homomorphism is an isomorphism if fv is an isomorphism for all v ∈ V .

The requirement in (10.40) for the linear transformations is the same as saying that for all arrows
a ∈ V , the following diagram is commutative.
10.11. A BRIEF INTRODUCTION TO PATH ALGEBRAS 551

ϕa
Ut(a) Uh(a)

ft(a) fh(a)

Wt(a) Wh(a)
ψa

With the quiver Q1 in Figure 10.2, the overall diagram of linear transformations for a K[Q1 ]-
module homomorphism f : U → W has the following diagram in which all squares of functions are
commutative.

U1 ϕa ϕc U4
U2 ϕb
U3 ϕd
f1 f4
f2 U5
f3
W1 f5 W4
ψa W2 ψc
ψb W3
ψd
W5

10.11.4 – Direct Sums and Indecomposable Modules


Let Q = (V, E, h, t) be a quiver and let K be a field. Clearly, if U = ({Uv }v∈V , {ϕa }a∈E ) and
W = ({Wv }v∈V , {ψa }a∈E ) are K[Q]-modules, then their direct sum has the data

U ⊕ W = ({Uv ⊕ Wv }v∈V , {ϕa ⊕ ψa }a∈E ),

where ϕa ⊕ ψa means the linear transformation

(ϕa ⊕ ψa )(u, w) = (ϕa (u), ψa (w))

for all (u, w) ∈ Ut(a) ⊕ Wt(a) .


Every ring R leads to a different algebraic structure of R-modules. With many rings or classes
of rings, it is both interesting and fruitful to determine the indecomposable modules and how to
decompose a given module into indecomposable ones. With modules over the path algebra K[Q],
this is again the case. The theorems in this branch of algebra subsume and greatly expand many
results in linear algebra.
For every quiver Q = (V, E, h, t), we can always identify some indecomposable modules. For
any vertex p ∈ V , the collection of vector spaces with {Uv }v∈V with Up = K and Uv = {0} for all
v 6= p and all maps {ϕa }a∈E as zero maps is an indecomposable module. Also, for any arrow α ∈ E,
consider the collection of vector spaces with {Uv }v∈V with Ut(α) = Uh(α) = K and Uv = {0} for all
other vertices and where all maps {ϕa }a∈E as zero maps except for ϕα as the identity map K → K.
For example, consider the theorem in linear algebra that states that for all linear transformations
T : U → W with dim V = m and dim W = n, there exist ordered bases B of V and B 0 of W such
that with respect to them, T has the matrix
 
Ir 0r×(m−r)
,
0(n−r)×r 0(n−r)×(m−r)
where r = rank T . To rephrase this result in terms of path algebras, we use the quiver Qarrow . We
can restate the theorem to say that
T 0 id 0
V −→ W = (K −→ 0)m−r ⊕ (K −→ K)r ⊕ (0 −→ K)n−r .
552 CHAPTER 10. MODULES AND ALGEBRAS

0
The individual indecomposable modules are: K −→ 0, which means the module K ⊕ {0} with the
id
trivial map from K to {0}; K −→ K, which means the module K ⊕ K with the identity from K to
0
K; and 0 −→ K, which means the module {0} ⊕ K with the trivial inclusion map from {0} to K.
It is important to observe that though (K ⊕ {0}) ⊕ ({0} ⊕ K) ∼
= K ⊕ K as vector spaces over K,

0 0 0
(K −→ 0) ⊕ (0 −→ K) ∼
= (K −→ K),

id id
which is not isomorphic to (K −→ K). In particular, (K −→ K) is indecomposable.
As another result, let K be an algebraically closed field and consider Theorem 10.9.2 and the
existence of the Jordan canonical form. Let V be a vector space over K and let T : V → V be
a linear transformation. The fact that there exists a basis B of V such that the matrix of T is a
block diagonal matrix in which the blocks are Jordan blocks. To interpret this in modules over path
algebras, we use the quiver Qloop . Theorem 10.9.2 can be restated to say that every K[Qloop ]-module
decomposes into indecomposable modules of the form

Kn Jλ,n

Example 10.11.7. We propose to decompose the module W in Example 10.11.5 into indecompos-
able submodules. Consider first a submodule W 0 that contains  the subspace (Span(1), 0, 0, 0, 0).
Applying the edge a we get a(Span(1), 0, 0, 0, 0) = (0, Span −1 1 , 0, 0, 0). Since W is closed under
0
the left action of the path algebra K[Q], then this new subspace  the submodule W . Applying
is in
−1 3

b to this new subspace, we get b(0, Span 1 , 0, 0, 0) = (0, 0, Span 1 , 0, 0). Applying c to this gives
c(0, 0, Span 31 , 0, 0) = (0, 0, 0, (1 − 3) 31 , 0) = (0, 0, 0, 0, 0). Then, applying d to (0, 0, Span 31 , 0, 0)
  

we obtain the subspace (0, 0, 0, 0, Span(17)). This shows that W 0 is the following submodule.

{0}

1 −3
   
−1 3 6
1 1 2
−1 3
 
R Span 1 Span 1

4 5

Write the action of the edges with respect to appropriate bases on each of the component vector
spaces, we see that W 0 is isomorphic to the following K[Q]-module.

0
0
id id
R R R
id
R
 
3 6 2
. Thus, we have found another submodule W 00

In W2 , the kernel of is Span −1
1 2
10.11. A BRIEF INTRODUCTION TO PATH ALGEBRAS 553

{0}

−3
   
−1 3 6 1
1  1 2
2
{0} Span −1 {0}

4 5

{0}

This is isomorphic to the R[Q1 ]-module

0
0
0 0
0 R 0
0
0

Finally, since the kernel of the action of c as a subspace of W3 is Span 31 , we can take any other


subspace not in Span 31 , say Span 21 and consider the submodule containing this subspace. We
 

have another submodule W 000


Span(1)

1 −3
   
−1 3 6
1 1 2
2

{0} {0} Span 1

4 5

{0}

This is isomorphic to the R[Q1 ]-module

R
id
0 0
0 0 R
0
0

We have shown that the R[Q1 ]-module depicted in Example 10.11.5 decomposes into the three
indecomposable submodules W = W 0 ⊕ W 00 ⊕ W 000 . 4

Exercises for Section 10.11


1. Determine the dimension over K of the path algebra K[Q] for the following quiver.

2
c
a 5
1 e 4

d
b
3
554 CHAPTER 10. MODULES AND ALGEBRAS

2. Let Q be the quiver depicted by the diagram.

a b c
1 2 3 4

Let α = 3e2 + 2e3 + 4a − c and β = 2a + 3b + 4c. Calculate a) α + β; b) αβ; c) βα; d) α2 ; e) β 2 .


3. Let Q be the quiver depicted by the diagram.

1 2

Let α = −e1 + 2e2 + 3b, β = 4e2 + a − 3b, and γ = a + b. Calculate a) αβ; b) βα; e) αγβ; d) γ n (for
all positive integers n).
4. Consider the quiver Q from Exercise 10.11.2 and consider the R[Q]-module W depicted by
 
3  
−2 2 1 0

2 −3 2 −1 2 3
R2 R R3 R2

Describe how the following R[Q] element α acts on the given element w ∈ W .
   
1
(a) α = 3e1 + 2ba − cb + 2e4 with w =  32 , 5, 2 , 51 .
 

3
   
0
(b) α = 2a + 3b + 4c with w =  −1 , −4, 1 , 24 .
 
2
4
5. Let Q be the quiver consisting of one vertex v and n arrows, each of which is a loop on the vertex v.
Prove that the path algebra of Q with field K is the multivariable polynomial ring K[x1 , x2 , . . . , xn ].
6. Let Q = (V, E, h, t) be a quiver with V and E both finite. Prove that the following conditions are
equivalent:
(a) dimK K[Q] is finite;
P
(b) the element a∈E a is nilpotent;
(c) Q does not contain any cycles, i.e., paths (consisting possibly of a single arrow) p = an · · · a2 a1
such that h(an ) = t(a1 ).
7. For every quiver Q and any field K, prove that the groups of units in K[Q] consists of elements c1,
where c ∈ U (K).
8. Let Q = (V, E, h, t) be a quiver. Prove that a subset of the path algebra K[Q] is an ideal (two-sided)
if and only if it is {0} or of the form K[Q0 ], where Q0 is a connected component of Q. (A connected
component of Q consists of a subset V 0 ⊆ V and E 0 ⊆ E such that all arrows a ∈ E such that t(a) ∈ V 0
or h(a) ∈ V 0 are in E 0 .) In particular, deduce that if Q has only one connected component, then K[Q]
has only two ideals, the trivial one and itself.
9. Let Q be the following quiver.

3
b
a
2 1
c
4

Prove that for any field K the following K[Q]-module is indecomposable.


10.12. PROJECTS 555

K
 
  0
1 1
0
K K2
 
1
1

10. Consider the quiver Q with the diagram.

a b
1 2 3

Let M be the left R[Q]-module depicted by the following diagram.


   
1 4 7 3 0 −1
2 5 8 1 −2 1
3 6 9 1 4 −3
R3 R3 R3

Prove that M is isomorphic to the direct sum of the following 5 indecomposable modules:
id id id
M∼
= (R −→ 0 −→ 0) ⊕ (R −→ R −→ 0) ⊕ (R −→ R −→ R)
id
⊕ (0 −→ R −→ R) ⊕ (0 −→ 0 −→ R).

[Hint: Consider kernels of matrices and kernels of products of the matrices.]


11. Consider the quiver Q with the diagram.

a b
1 2 3

Prove that for any field K, the path algebra K[Q] has exactly 6 indecomposable modules and show
what these modules are.

10.12
Projects
Project I. Modules over P(S). Study properties of modules over the ring (P(S), 4, ∩), where
S is a set. Can you determine a characterization of these modules? Discuss and give examples
of torsion and annihilators. Give examples of finitely and infinitely generated modules. Can
you imagine useful applications of this algebraic structure?

Project II. Smith Normal Form of an Integer Matrix. Write a program that obtains an
M -preferred basis and the invariant factors of a submodule M = Span(f1 , f2 , . . . , fm ) in Zn
where the module elements fi are given with respect to the standard basis. (Feel free to modify
the project to do the same problem over the PID Z[i].)

Project III. Decomposition of K[Q]-modules. Consider the quiver Q1 given in Figure 10.2
and let K be any field. Study the decomposition of K[Q1 ]-modules. Can you determine
whether there are a finite number of indecomposable modules? What are the indecomposable
modules? What are the irreducible modules? Given a particular K[Q]-module, can you provide
a procedure to determine its decomposition if it does decompose?
556 CHAPTER 10. MODULES AND ALGEBRAS

Project IV. Algebras and Structure Coefficients. Exercise 10.3.35 defined the notion of
structure constants of an algebra. Give some examples of algebras and their corresponding
structure coefficients. (For example, R3 with vector
√ addition and cross product; or M2×s (R)
with matrix addition and multiplication; of Q( 3 2) over Q.) Find everything you can about
k
how the structure coefficients relate to the algebra. (What must the γij satisfy if the algebra
is commutative, or is associative, or has an identity, or has inverse, or is a field? The structure
coefficients are given in reference to a basis; how do they change under a change of basis?)
Project V. Modules of Smooth Function Rings. Let X be an interval [a, b] ⊆ R, all of R,
or Rk . Consider the ring C ∞ (X, R) of smooth functions f : X → R. (A function is smooth
if its derivatives of all orders exists and are continuous.) Explore what a C ∞ (X, R)-module
is. Can you relate it to any mathematical object that you already know? If not, can you
describe properties of modules? What do elements of a C ∞ (X, R)-module look like? Discuss
what simple and indecomposable C ∞ (X, R)-modules might be.
Project VI. Shear-Rotations in R4 . As in Euclidean three-space, a rotation in R4 is a linear
transformation such that there is a basis of R4 with respect to which the rotation has the
matrix  
cos θ − sin θ 0 0
 sin θ cos θ 0 0
 .
 0 0 1 0
0 0 0 1
In R4 , there is enough room to allow for a linear transformation T : R4 → R4 that is similar
to  
cos θ − sin θ 1 0
 sin θ cos θ 0 1 
A= .
 0 0 cos θ − sin θ
0 0 sin θ cos θ
Find the Jordan canonical form of A. Find a formula for An for all positive n. Can there be a θ
for which T is periodic? Discuss properties of T . Study the action of T on the unit hypercube.
(Is the image still a hypercube of side length 1?) This project proposed the name of “shear-
rotation” to T . Is this a good name? Why or why not? Is there any linear transformation in
R3 that has similar properties? Why or why not?
11. Galois Theory

As mentioned at the beginning of Chapter 3, the axioms of group theory were first written down in
their present form by Evariste Galois. His purpose was not to study combinatorial properties of sets
equipped with a binary operation. Instead, he used groups to study symmetries within the roots
of a polynomial. Like many other mathematicians before him, the big open problem he hoped to
address was how to solve arbitrary polynomials using radicals.
This approach to studying polynomial equations uncovered a deep connection between field
extensions and groups. The study of this relationship became known as Galois theory. To fully
appreciate it, one needs group theory, but also field theory, as well as the concept of composition
series, and group actions. Many of the profound theorems in this theory come from understanding
the interplay between the structure of field extensions and the structure of groups.
Sections 11.1 and 11.2 introduce Galois theory, along with the Fundamental Theorem of Galois
theory. Sections 11.3 and 11.4 present a number of applications of Galois theory, including the
Fundamental Theorem of Algebra and fully answering the question of geometric constructibility of
regular n-gons. Then Sections 11.5 through 11.8 focus on studying the Galois groups of polynomials
over fields of characteristic 0 or p. The final section presents the landmark result that it is impossible
to solve arbitrary equations with radicals.
Though it is possible to define a Galois theory on field extensions of infinite degree, this book
restricts its study of Galois extensions to finite extensions. For further study of Galois theory, we
recommend [15, 48].

11.1
Automorphisms of Field Extensions
Instead of directly attacking the problem of studying symmetries among roots of a polynomials,
Galois theory takes a step into abstraction and considers groups of transformations on field exten-
sions. After all, if F is a field and p(x) ∈ F [x] is an irreducible polynomial, then F [x]/(p(x)) is a
field extension of F . Furthermore, if char F = 0 and K is a finite field extension of F , then by the
Primitive Element Theorem (Theorem 7.6.13), K = F [x]/(p(x)) for some irreducible polynomial.

11.1.1 – Automorphisms of Field Extensions


Recall that an isomorphism of a field K with itself is called an automorphism. The collection of
automorphisms of K, denoted by Aut(K), is a group. Using the notation of group actions, it is not
uncommon to write σα for σ(α) for any σ ∈ Aut(K) acting on an element α ∈ K.

Definition 11.1.1
An automorphism σ ∈ Aut(K) is said to fix an element α if σα = α and σ is said to fix a
subfield F ⊆ K if σ|F = idF . If K/F is a field extension, we denote by Aut(K/F ) the set
of automorphisms of K that fix F .

Proposition 11.1.2
If F is a prime field, then Aut(F ) = {1}. In particular Aut(Q) = {1} and Aut(Fp ) = {1}.

557
558 CHAPTER 11. GALOIS THEORY

Proof. Since 12 = 1 in F , then any ring homomorphism σ : F → F satisfies σ(1)2 = σ(1). Since
F is a field, σ(1) is equal to 1 or 0. If σ(1) = 0, then σ(a) = σ(a)σ(1) = 0 so σ is the trivial
homomorphism. Thus, since σ is an automorphism, σ(1) = 1.
Then σ(n · 1) = n · 1 for all positive integers n. The elements of the prime field consist of elements
of the form f = (a · 1)/(b · 1) = (a · 1)(b · 1)−1 . Thus, the image of σ(f ) satisfies
a · 1 = σ(a · 1) = σ(f (b · 1)) = σ(f )σ(b · 1).
Thus, σ(f ) = (a · 1)(b · 1)−1 = f . Consequently, every automorphism of a prime field is trivial. 

Proposition 11.1.3
If K is a field and F a subfield, then Aut(K) is a group and Aut(K/F ) is a subgroup.

Proof. The set Aut(K/F ) is not empty since it contains the identity function on K. It is clear that
if σ, τ ∈ Aut(K/F ), then στ fixed the subfield F . Hence, στ ∈ Aut(K/F ). Suppose now that σ ∈
Aut(K/F ). By definition σ(a) = a for all a ∈ F . The inverse σ −1 satisfies a = σ −1 (σ(a)) = σ −1 (a)
for all a ∈ F and so σ −1 also fixes F . Thus, Aut(K/F ) is closed under taking inverses and the
proposition follows. 
The following proposition is the key to connecting properties of the group Aut(K/F ) with the
study of symmetries among roots of a polynomial.

Proposition 11.1.4
Let K/F be a field extension and let α ∈ K be algebraic over F . For any σ ∈ Aut(K/F ),
σα is also a root of the minimal polynomial mα,F (x) ∈ F [x].

Proof. Since α is algebraic over F , it has a minimal polynomial


mα,F (x) = cn xn + · · · + c1 x + c0 .
Since mα,F (α) = 0, applying an automorphism σ ∈ Aut(K/F ) gives
σ(mα,F (α)) = 0 =⇒ σ (cn αn + · · · + c1 α + c0 ) = 0
=⇒ σ(cn )σ(α)n + · · · + σ(c1 )σ(α) + σ(c0 ) = 0 since σ ∈ Aut(K)
n
=⇒ cn σ(α) + · · · + c1 σ(α) + c0 since σ fixes F .
Hence, σα is a root of mα,F (x). 

Example 11.1.5. Consider the field extension K = Q( 13) over Q and the group Aut(K/Q). Any
automorphism σ ∈ Aut(K/Q) satisfies
√ √
σ(a + b 13) = a + bσ( 13)

and,
√ therefore, is determined uniquely by the action√ on the √element 13. By Proposition 11.1.4,
σ( 13) must be a root of x2 −13 = 0, namely either √ 13 or − 13. √This shows that Aut(K/Q) ∼ = Z2
and is generated by the conjugation function σ(a + b 13) = a − b 13. 4

3
Example 11.1.6. Similarly to the previous example, consider the field extension K = Q( 13) over
Q and the automorphism group Aut(K/Q). Any automorphism σ ∈ Aut(K/Q) satisfies
√ √ 2 √ √
σ(a + b 13 + c 13 ) = a + bσ( 13) + c(σ( 13))2
3 3 3 3


and, therefore, is determined uniquely by√the action on the element 3 13. This is all essentially the
same as in Example 11.1.5. However, σ( 3 13) must be a root of x3 − 13 = 0. This polynomial has 3
roots, two
√ of which are complex.
√ The
√ field extension K is a subfield of R so the only root of x3 − 13
3 3 3
in K is 13. Therefore, σ( 13) = 13 for all σ ∈ Aut(K/Q) and so Aut(K/Q) = {1}. 4
11.1. AUTOMORPHISMS OF FIELD EXTENSIONS 559

Example 11.1.7. √ Following the pattern of the previous two examples consider the field exten-
4
sion K = √ Q( 13) over Q. An √ automorphism σ ∈ Aut(K/Q) is uniquely determined by how it
acts
√ on√
4
13. √ The image
√ σ( 4
13) must be a root of x4 − 13. The roots of this polynomial are
4
13, i 4 13, − 4 13, −i 4 13 but only two of these are in the real field K. Hence, again Aut(K/Q) ∼
=
Z2 . 4
√ √
Example 11.1.8. Consider√the √ field extension F = Q( 2 + 3) over Q. (See Example 7.2.7.) The
minimal polynomial of α = 2+ 3 is mα,Q (x) = x4 −10x2 +1 and the four roots of this polynomial
are √ √ √ √ √ √ √ √
α1 = 2 + 3, α2 = 2 − 3, α3 = − 2 + 3, α4 = − 2 − 3.
Let σ ∈ Aut(F/Q). Then according to Proposition 11.1.4, σ must permute the roots of mα,Q (x).
√ √S4 has order 4! = 24. However, Aut(F/Q) is not all of S4 . In Example 7.2.7
The permutation group
we observed
√ that 2, 3 ∈√F so all the roots of mα,Q (x) are in F . It is straightforward to check
that 2 = 12 (α3 − 9α) and 3 = 12 (11α − α3 ). Hence,

α1 = α, α2 = α3 − 10α, α3 = −α, α4 = 10α − α3 .


In particular, for i = 2, 3, 4 the element σ(αi ) is completely determined by σ(α1 ). Consequently, we
deduce that | Aut(F/Q)| has order 4. It is obvious that if σ(α1 ) = α3 = −α1 , then σ 2 = 1 as an
automorphism. Suppose now that σ(α1 ) = α2 . We deduce that
√ √ √ √
σ 2 (α1 ) = σ(α2 ) = σ(α)3 − 10α = ( 2 + 3)3 − 10( 2 + 3) = α.
So again, |σ| = 2. There is only one group of order 4 that contains at least two elements of order 2.
We deduce that Aut(F/Q) ∼ = Z2 ⊕ Z2 . 4

11.1.2 – Linear Algebra of Field Embeddings


Elementary properties about field extensions presented in Section 7.1 took advantage of the fact
that an extension K of a field F is a vector space over F . If σ ∈ Aut(K/F ), then for all x, y ∈ K,
σ(x + y) = σ(x) + σ(y). Furthermore, for all c ∈ F , σ(cx) = σ(c)σ(x) = cσ(x), since c is fixed by
σ. Hence, σ is an F -linear transformation from K to itself. Since automorphisms are invertible,
we deduce that Aut(K/F ) is a subgroup of GLF (K). It is sometimes convenient to represent an
automorphism σ by the invertible matrix that represents σ with respect to some F -basis of K.
More generally, if K and L are extensions of a field F , then an embedding of K into L that fixes
F is an injective linear transformation in HomF (K, L). Like automorphisms, the set of embeddings
of a field K into L that fix F is not a vector subspace of HomF (K, L).
If we make no assumptions about fixing a common subfield, then we can still view embeddings
of K into L as elements in Fun(K, L), the vector space over L of all functions from K to L.

Proposition 11.1.9
Let σ1 , σ2 , . . . , σn be distinct embeddings (injective homomorphisms) of a field K into a
field L. Then {σ1 , σ2 , . . . , σn } is a linearly independent set in Fun(K, L).

Proof. Assume that the set {σ1 , σ2 , . . . , σn } is linearly dependent. Then there exists a nontrivial
linear combination of the σi that gives the 0 function. Let m be the least positive integer such
that there exists a nontrivial linear combination of the σi that gives the 0 function. Possibly after
relabeling the σi , suppose that
c1 σ1 + c2 σ2 + · · · + cm σm = 0 (11.1)
for some nonzero ci ∈ L. Since σ1 6= σm as functions, then there exists some a ∈ K such that
σ1 (a) 6= σm (a). Then for all x ∈ K,
c1 σ1 (ax) + c2 σ2 (ax) + · · · + cm σm (ax) = 0
=⇒ c1 σ1 (a)σ1 (x) + c2 σ2 (a)σ2 (x) + · · · + cm σm (a)σm (x) = 0.
560 CHAPTER 11. GALOIS THEORY

Multiplying (11.1) by σm (a) and subtracting by the above equation, we get


c1 (σm (a) − σ1 (a))σ1 (x) + c2 (σm (a) − σ2 (a))σ2 (x) + · · · + cm−1 (σm (a) − σm−1 (a))σm−1 (x) = 0
for all x ∈ K. Since c1 (σm (a) − σ1 (a)) 6= 0, this linear combination is nontrivial. However, it
contradicts the minimality definition of m. This contradicts the assumption that {σ1 , σ2 , . . . , σn } is
linearly dependent. The proposition follows. 

11.1.3 – Structure within Aut(K/F )


In Examples 11.1.5 through 11.1.8, we observe that | Aut(K/F )| ≤ [K : F ]. This seemed to emerge
as a natural result in the exploration in Example 11.1.8. This turns out to be true. It is easy to show
this result for primitive extensions (see Exercise 11.1.4). The following example shows that even
knowing that | Aut(K/F )| ≤ [K : F ] for primitive fields does not lend itself to a proof by induction
for the general case.
√ √ √
Example √ 11.1.10. Let K √ = Q( 3 13, −3). As seen in Example 11.1.6, Aut(Q( √
3
13)/Q) = 1 so
| Aut(Q( 13)/Q)| < [Q( 13)√: Q] = 3. It is easy to show that Aut(K/Q( 13)) ∼
3 3 3
= Z2 whose
cardinality is equal to [K : Q( 3 13)].
Let us determine Aut(K/Q) √ directly.√ Automorphisms in Aut(K/Q) will be determined by how
they act on the√ generators 3 13 and −3. By Proposition√11.1.4, there are at most three options
where to map 3 13 and at most two options where to map −3. Hence, | Aut(K/Q)| ≤ 6. Consider
the functions ρ, τ ∈ Aut(K/Q) satisfying
( √ √ √  ( √ √
ρ( 3 13) = 3 13 − 12 + 21 −3 τ ( 3 13) = 3 13
ρ: √ √ and τ: √ √
ρ( −3) = −3, τ ( −3) = − −3.

One of the particularities of this example is that the primitive third root of unity ζ3 = − 21 + 12 −3
√ √
is in Q( −3). Notice that ζ32 = − 12 − 12 −3 = τ (ζ3 ). It is obvious that τ has order 2 but we also
notice that
√ √ √ √ 1 1√
 
2 3 3 3 2 3
σ ( 13) = σ( 13)ζ3 = 13ζ3 = 13 − − −3
2 2
and that √ √ √
σ 3 ( 13) = 13ζ33 = 13.
3 3 3

Thus, σ has order 3 in Aut(K/Q). We also observe that


(√ √ √ √
3
13 7→ τ ( 3 13ζ3 ) = 3 13τ (ζ3 ) = 3 13ζ32
τρ : √ √ √
−3 7→ τ ( −3) = − −3
(√ √ √
3
13 7→ ρ2 ( 3 13) = 3 13ζ32
ρ2 τ : √ √ √
−3 7→ ρ2 (− −3) = − −3.

Thus, Aut(K/Q) = hρ, τ | ρ3 = τ 2 = 1, τ ρ = ρ−1 τ i ∼


= D3 = S3 . In particular, | Aut(K/Q)| = 6. 4
In order to understand the internal structure of the automorphism group Aut(K/F ), we need to
understand the interplay between subgroups and subfields better.

Proposition 11.1.11
Let H ≤ Aut(K). The subset Fix(K, H) of elements in K fixed by H is a subfield of K.

Proof. Obviously, 0 ∈ Fix(K, H) so Fix(K, H) 6= ∅. Let a, b ∈ Fix(K, H). Let σ ∈ H. Then


σ(a − b) = σ(a) − σ(b) = a − b, so a − b ∈ Fix(K, H). Thus, Fix(K, H) is a subgroup of (F, +).
Now suppose that a, b ∈ Fix(K, H) − {0}. Since σ is a homomorphism U (F ) → U (F ), then
σ(ab−1 ) = σ(a)σ(b)−1 = ab−1 . Thus, ab−1 ∈ Fix(K, H) − {0}, and Fix(K, H) − {0} is a subgroup
of U (F ). We conclude that Fix(K, H) is a subfield of K. 
11.1. AUTOMORPHISMS OF FIELD EXTENSIONS 561

Definition 11.1.12
The field Fix(K, H) in the above proposition is called the fixed subfield of K by H.

Proposition 11.1.13
Let K be a field. The association between subgroups of Aut(K) and subfields of K via

Aut(K) K
H −→ Fix(K, H)
Aut(K/F ) ←− F

is orientation reversing. More precisely, if F1 ⊆ F2 ⊆ K, then Aut(K/F2 ) ≤ Aut(K/F1 )


and if H1 ≤ H2 ≤ Aut(K) then Fix(K, H2 ) ≤ Fix(K, H1 ). Furthermore, this association is
injective from left to right.

Proof. Suppose first that F1 ⊆ F2 ⊆ K. Let σ be an automorphism in Aut(K) that fixes F2 . Then
since F1 ⊆ F2 , σ fixes every element in F1 so σ ∈ Aut(K/F1 ). Thus, Aut(K/F2 ) ≤ Aut(K/F1 ).
Suppose that H1 ≤ H2 ≤ Aut(K). Let a ∈ Fix(K, H2 ). Since H1 ≤ H2 , the field element a is
fixed by all σ ∈ H1 . This implies that a ∈ Fix(K, H1 ). Thus, Fix(K, H2 ) ≤ Fix(K, H1 ).
Finally, suppose that Fix(K, H1 ) = Fix(K, H2 ). Then H1 fixes Fix(K, H2 ) and since Fix(K, H2 )
is the fixed field of H2 , then H1 ≤ H2 . Since H2 fixes Fix(K, H1 ), we also deduce that H2 ≤ H1 .
Thus, H1 = H2 . 

Theorem 11.1.14
Let G be a finite subgroup Aut(K) and let F = Fix(K, G). Then [K : F ] = |G|.

Proof. Let |G| = n and write G = {σ1 , σ2 , . . . , σn }.


Suppose first that [K : F ] = m < n. Let ω1 , ω2 , . . . , ωm be a basis of K over F . For simplicity,
assume that σ1 = 1, the identity of G. Then the system

σ1 (ω1 )x1 + σ2 (ω1 )x2 + · · · + σn (ω1 )xn = 0


..
.
σ1 (ωm )x1 + σ2 (ωm )x2 + · · · + σn (ωm )xn = 0

is a system of m equations in n unknowns. This must have nontrivial solutions β1 , β2 , . . . , βn in K


since m < n.
Let a1 , a2 , . . . , am be arbitrary elements in F . Since all the automorphisms σi fix F , we have
σi (aj ) = aj . Therefore, multiplying the jth equation by aj and evaluating each xi = βi , gives

σ1 (a1 ω1 )β1 + σ2 (a1 ω1 )β2 + · · · + σn (a1 ω1 )βn = 0


..
.
σ1 (am ωm )β1 + σ2 (am ωm )β2 + · · · + σn (am ωm )βn = 0.

Call α = a1 ω1 + a2 ω2 + · · · + am ωm . Summing up the above equations we have

β1 σ1 (α) + β2 σ2 (α) + · · · + βn σn (α) = 0.

However, {ωj }m
i=1 forms a basis of K over F so since the aj are arbitrary, then α is an arbitrary
element of K. Thus, we have shown that {σ1 , σ2 , . . . , σn } is linearly dependent, which contradicts
Proposition 11.1.9. By contradiction, we conclude that m ≥ n.
562 CHAPTER 11. GALOIS THEORY

Suppose now that n < m. We can find n + 1 linearly F -independent elements α1 , α2 , . . . , αn+1
in K. Then the system

σ1 (α1 )x1 + σ1 (α2 )x2 + · · · σ1 (αn+1 )xn+1 = 0


..
.
σn (α1 )x1 + σn (α2 )x2 + · · · σn (αn+1 )xn+1 = 0

has n equations in n + 1 unknowns. Therefore, the system has a nonzero solution (β1 , β2 , . . . , βn+1 )
in K. Furthermore, not all βi can be in F because otherwise, since σ is the identity, the first equation
in the system would produce a linear dependence of the αj over F . Thus, at least one βi ∈ / F.
From the nontrivial solutions, choose one that has the least number of nonzero entries βi . Without
loss of generality, assume that β1 ∈ K − F and that for some 2 ≤ r ≤ n + 1 we have β1 , . . . , βr−1
are nonzero, βr = 1 and βi = 0 for i > r. Then our system of equations becomes

σi (α1 )β1 + · · · σi (αr−1 )βr−1 + σi (αr ) = 0 (11.2)

for i = 1, 2, · · · , n.
Since β1 ∈ / F , there exists i0 such that σi0 (β1 ) 6= β1 ; otherwise, β1 would be in the fixed field of
G which is F . Applying σi0 to the above equations (indexed by i), gives

(σi0 σi )(α1 )σi0 (β1 ) + · · · (σi0 σi )(αr−1 )σi0 (βr−1 ) + (σi0 σi )(αr ) = 0. (11.3)

However, since the {σ1 , σ2 , . . . , σn } is a group, then σi0 σi as i = 1, 2, . . . , n runs through all σj in
G. Therefore, the system in (11.3) has equations reordered from (11.2). Subtracting equations with
corresponding σi , we obtain the new system

σi (α1 )(β1 − σi0 β1 ) + · · · σi (αr−1 )(βr−1 − σi0 βr−1 ) = 0.

However, this last set of equations shows that



β1 − σi0 β1 , β2 − σi0 β2 , . . . , βr−1 − σi0 βr−1 , 0, 0, . . . , 0

is a solution to the original system. Furthermore, this solution is nontrivial since β1 − σi0 β1 6= 0
and has fewer than r nonzero coefficients in its n-tuple. This contradicts the minimality of r. We
conclude that the trivial solution to the system is the unique solution, and thus n = m. 

This theorem implies the following important corollary that offers an upper bound and a sharp-
ness condition on the size of the automorphism group Aut(K/F ).

Corollary 11.1.15
Let K/F be any field extension. Then | Aut(K/F )| ≤ [K : F ] with equality if and only if
F is the fixed field of Aut(K/F ).

Proof. (Left as an exercise for the reader. See Exercise 11.1.11.) 

11.1.4 – Definition of Galois Extensions


By Corollary 11.1.15, we see that the relationship between field extensions K/F when | Aut(K/F )| =
[K : F ] is a particular one and we may suspect that they have special properties.

Definition 11.1.16
A finite extension of a field K/F is called a Galois extension if | Aut(K/F )| = [K : F ]. If
K/F is a Galois extension, then the automorphism group Aut(K/F ) is called the Galois
group of the extension and is denoted by Gal(K/F ).
11.1. AUTOMORPHISMS OF FIELD EXTENSIONS 563

Galois theory hinges on Galois extensions and the relationship between subgroups of Gal(K/F )
and fields L satisfying F ⊆ L ⊆ K. At present, it may seem like an intractable problem to determine
if a field extension is Galois. Corollary 11.1.15 can be restated to give the following criterion for a
Galois extension.

Corollary 11.1.17
An extension K/F is Galois if and only if F is the fixed field of Aut(K/F ).

Example 11.1.18. All quadratic extensions of Q are Galois. Recall that a quadratic extension
of Q involves a field Q(α), where α is the root of some irreducible quadratic polynomial m(x) =
x2 + bx + c ∈ Q[x]. The polynomial m(x) has two roots
1 p 1 p
(−b + b2 − 4c) and (−b − b2 − 4c)
2 2
and if we label one of the roots as α, then the other one is −b − α. Consequently, there are two
automorphisms in Aut(Q(α)/Q): the identity function and σ such that σ(m + nα) = m + n(−b − α).
Hence, Aut(Q(α)/Q) has order 2, which is equal to [Q(α) : Q]. We write Gal(Q(α)/Q) ∼ = Z2 . 4

Example√ 11.1.19. The observations in Examples 11.1.6 and 11.1.7 show that neither Q( 3 13)
4
nor√Q( √13) are Galois extensions over Q. In Example 11.1.10, we considered the field extension
Q( 3 13, √ −3)√over Q, which is the splitting field of x3 − 13. We had already determined directly
that Q( 3 13, −3)/Q is a Galois extension, though Theorem 11.2.1 proves it in general. 4

Galois extensions and their groups of automorphisms are the central themes of this chapter.

Exercises for Section 11.1


1. Prove directly that complex conjugation is an automorphism.
√ √
2. Consider the extension K = Q( 2, 7) over Q. Show that K/Q is a Galois extension and determine
Gal(K/Q).
√ √ √
3. Consider the extension K = Q( 2, 5, 7) over Q. Show that K/Q is a Galois extension and
determine Gal(K/Q).
4. Prove that | Aut(K/F )| ≤ [K : F ] where K is a simple extension of F (i.e., K = F (α) for some α ∈ K)
without using the methods of the proof Theorem 11.1.14.
5. Let K be a field, let G be a finite subgroup of Aut(K), and let F = Fix(K, G). Prove that Aut(K/F ) =
G so that K/F is Galois with Galois group G.
6. Let G1 , G2 ≤ Aut(K) with G1 6= G2 . Prove that Fix(K, G1 ) 6= Fix(K, G2 ).
7. This exercise guides the reader to show the surprising result that Aut(R/Q) = {1}.
(a) Prove that σ ∈ Aut(R/Q) maps positive reals to positive reals. [Hint: Consider how σ maps
squares.]
(b) Deduce that σ is an increasing function.
(c) Deduce that for all x, y ∈ R and all m ∈ Z,
1 1 1 1
− <x−y < implies − < σ(x) − σ(y) < .
m m m m
(d) Deduce that σ is a continuous function.
(e) Prove that the only continuous function on R that fixes Q is the identity function.
This proves the result that Aut(R/Q) = {1}.
8. Prove that the only automorphisms of C are the identity and complex conjugation.
9. Consider the transcendental extension F (t) of a field F .
at+b
(a) Prove that an automorphism σ ∈ Aut(F (t)/F ) satisfies σ(t) = ct+d
for some a, b, c, d ∈ F with
ad − bc 6= 0.
(b) Prove that Aut(F (t), F ) ∼
= GL2 (F ).
564 CHAPTER 11. GALOIS THEORY

10. Let K and E be extensions of a field F . A nontrivial homomorphism K → E is an injection so


corresponds to the situation F ⊆ K ⊆ E. Show that restrictions of automorphisms in Gal(E/F ) to
K defines a homomorphism of Gal(E/F ) onto Gal(K/F ).
11. Prove Corollary 11.1.15.

11.2
Fundamental Theorem of Galois Theory
The definition of a Galois extension does not lend itself to readily identifying which field extensions
are Galois. This section first gives a full characterization of Galois extensions and then explores in
further detail the relationship between subgroups of Aut(K/F ) and fields L satisfying F ⊆ L ⊆ K.
A more detailed analysis of the association described in Proposition 11.1.13 leads to the Fundamental
Theorem of Galois Theory.

11.2.1 – Characterization of Galois Extensions


The examples in the previous section hinted that the automorphism group of a field extension K/F
is limited by whether K contains all the roots of a minimal polynomial of an element α ∈ K. The
following theorem shows that this suspicion is well-founded. In order to show that a splitting field
gives a Galois extension, we count the number of possible extensions of the identity map id : F → F
to isomorphic splitting fields of a given polynomial.

Theorem 11.2.1
Let E be the splitting field over F of a separable polynomial p(x) ∈ F [x]. Then the
extensions E/F is Galois, i.e., | Aut(E/F )| = [E : F ].

Proof. It is convenient to prove a stronger result. We prove that (*) the number of ways of extending
isomorphisms ϕ : F → F 0 to splitting fields E and E 0 of p(x) and p0 (x) = ϕ(p(x)) is equal to
[E : F ] = [E 0 : F 0 ]. The Theorem follows once we apply this result to F = F 0 .
We proceed by induction. If [E : F ] = 1 then E = F and any extension E → E 0 of the
isomorphism ϕ : F → F 0 involves a field E 0 such that [E 0 : F 0 ] = 1. Hence, E 0 = F 0 and the only
extension of ϕ to E is ϕ itself.
Now suppose that (*) holds for all separable polynomials q(x) ∈ F [x] with a splitting field Eq
satisfying [Eq : F ] < n for some positive integer n ≥ 2. If p(x) ∈ F [x] has a splitting field E with
[E : F ] = n ≥ 2 then p(x) has at least one irreducible factor f (x) F [x] with degree greater than 1
and similarly for p0 (x) = ϕ(p(x)) and f 0 (x). Let α be a root of f (x). If σ : E → E 0 is an extensions
of the isomorphism ϕ, then σ restricts to F (α) as σ F (α) = τ : F (α) → F 0 (β) where β is a root of
f 0 (x).
σ
E E0

σ F (α)
F (α) F 0 (β)
τ

ϕ
F F0
11.2. FUNDAMENTAL THEOREM OF GALOIS THEORY 565

To count the number of extensions σ of the isomorphism ϕ, we count the number of such diagrams.
The number of extensions of ϕ to isomorphisms τ : F (α) → F 0 (β) is equal to the number of distinct
roots β of f 0 (x). Since deg f 0 (x) = [F 0 (β) : F 0 ] = [F (α) : F ] and since p0 (x) is separable, the number
of extensions τ is exactly [F (α) : F ].
Since E and E 0 are splitting fields of p(x) over F (α) and of p0 (x) over F 0 (β), then we can use the
induction hypothesis on extending the isomorphism τ : F (α) → F 0 (β) to E → E 0 , since [E : F (α)]
is strictly less than n. There are exactly [E : F (α)] extensions of τ to maps σ : E → E 0 . Since
[E : F ] = [E : F (α)][F (α) : F ], the number of extensions of ϕ to maps σ : E → E 0 is equal to
[E : F ]. This establishes (*).
Applying the result (*) to the isomorphism idF : F → F proves the theorem. 

Theorem 11.2.1 gives a particularly easy way of finding Galois extensions: Find a splitting field
of a separable polynomial. This situation is so common that we give it its own notation.

Definition 11.2.2
If f (x) ∈ F [x] is a separable polynomial over F , then the Galois group of the splitting field
of f (x) over F is called the Galois group of f (x) and is denoted by GalF (f (x)) or simply
Gal(f (x)), whenever the field F is understood by context.

Example 11.2.3. As an example, we calculate the Galois group over F3 of f (x) = x3 + 2x + 1. It


is easy to verify that this polynomial is irreducible. By Proposition 7.7.11, f (x) is separable so we
can talk about the Galois group of f (x).
Let θ be a root of f (x) in a field extension of F3 so that θ3 = −2θ − 1 = θ + 2. Now F3 (θ) =
F3 [θ] is a field of 33 elements. By Theorem 7.7.12, F3 (θ) ∼= F27 and is the splitting field of f (x).
Consequently, f (x) splits completely in F27 . By factoring out (x − θ) and then factoring further, we
find that
x3 + 2x + 1 = (x − θ)(x − (1 + θ))(x − (2 + θ)).
So the three roots of the polynomial are θ, θ + 1, and θ + 2. The automorphisms in the Galois
group Gal(f (x)) = Aut(F3 (θ)/F3 ) is uniquely determined by where they map θ. There are only
three possibilities, each corresponding to mapping θ to any of the three roots. We deduce that
Gal(f (x)) = Gal(F27 /F3 ) ∼
= Z3 , generated by the automorphism σ defined by σ(θ) = θ + 1. 4

Though Theorem 11.2.1 gives a way to find Galois extensions, it has a converse, which gives a
complete characterization of Galois extensions.

Theorem 11.2.4
A finite extension K/F is Galois if and only if K is the splitting field of a separable
polynomial over F . Furthermore, if K/F is Galois, then every irreducible polynomial in
F [x] which has a root in K is separable and has all its roots in K.

Proof. Theorem 11.2.1 shows that if K is the splitting field of a separable polynomial in F [x], then
K/F is Galois. We need to prove the converse.
Suppose that K/F is Galois and call G = Gal(K/F ). Let p(x) ∈ F [x] be a polynomial that
has a root α in K. Call G · α the orbit of α under the action of G on K and write G · α = {α1 =
α, α2 , . . . , αr }. Any automorphism τ ∈ G acts as a permutation on G · α. Therefore, the coefficients
of
f (x) = (x − α)(x − α2 ) · · · (x − αr )
are fixed by all the elements of G, since any τ ∈ G simply permutes these linear factors.
If p(x) ∈ F [x] is a minimal polynomial and α is a root of p(x) in K, then p(x) must divide f (x)
since it is a constant multiple of mF,α (x). However, f (x) must also divide p(x) since any αi must
also be a root of p(x). Thus, adjusting multiplicative constants, we can assume that p(x) = f (x).
This proves the second part of the theorem.
566 CHAPTER 11. GALOIS THEORY

Now suppose that β1 , β2 , . . . , βm is a basis of K over F . We just proved that each minimal
polynomial mβi ,F (x) is separable with all its roots in K. Define

g(x) = lcm(mβ1 ,F (x), mβ2 ,F (x), . . . , mβm ,F (x)) in K[x].

Again, this polynomial splits completely in K[x]. Furthermore, it contains all the roots of every
mβi ,F (x) so the action of any τ ∈ G simply permutes the linear factors. Thus, g(x) ∈ F [x]. Since all
the roots of g(x) are in K, then the splitting field E of g(x) over F is a subfield of K. On the other
hand, the splitting field of g(x) contains {β1 , β2 , . . . , βm }, which is a basis of K over F . Hence, K
is a subfield of E. We deduce that K = E and thus that K is the splitting field of some separable
polynomial. 

Because of this theorem, Galois extensions become the natural context in which to study roots
of polynomials.

Definition 11.2.5
Let K/F be a Galois extension and let α ∈ K. The (Galois) conjugates of α are the distinct
elements σ(α) for σ ∈ Gal(K/F ). If E is a subfield of K containing F , then σ(E) is called
the conjugate field of E over F .

In other words, the conjugates of α consist of the elements in the orbit of α in the action
Gal(K/F ) on K. By Proposition 11.1.4 and Theorem 11.2.4, the conjugates of α are precisely the
set of roots of the minimal polynomial mα,F (x).

Example 11.2.6. Consider K = Q( 3 13, ζ3 ). This is the splitting field over Q of the polynomial
p(x) = x3 − 13. Recall from Example 11.1.10 √ that G = Gal(K/Q) ∼ = D3 , generated by ρ and τ as
described in that example. On the generators 3 13 and ζ3 , the automorphisms ρ and τ act as
( √ √ ( √ √
ρ( 3 13) = 3 13ζ3 τ ( 3 13) = 3 13
ρ: and τ:
ρ(ζ3 ) = ζ3 τ (ζ3 ) = ζ32 .
√ √
Consider the element α = 1 − 3 3 13 + 2( 3 13)2 . The conjugates of α are the distinct elements
one obtains by applying the elements of G. In this case:
√ √ √ √ √ √
α1 = 1 − 3 2 + 2( 2)2 , α2 = 1 − 3 2ζ3 + 2( 2)2 ζ32 , α3 = 1 − 3 2ζ32 + 2( 2)2 ζ3 .
3 3 3 3 3 3

Note there are only 3 conjugates even though |G| = 6. We can get some mileage out of this
observation. First note that the minimal polynomial of α is given by
√ √ √ √ √ √
p(x) = (x − (1 − 3 13 + 2( 13)2 ))(x − (1 − 3 13ζ3 + 2( 13)2 ζ32 ))(x − (1 − 3 13ζ32 + 2( 13)2 ζ3 )).
3 3 3 3 3 3

A calculation gives p(x) = x3 − 3x2 + 237x − 1236. We can use this result to calculate the inverse of
α in a similar way as the technique taught in college algebra called rationalizing the denominator:
√ √ √ √ √ √
1 α2 α3 (1 − 3 3 13ζ3 + 2 3 169ζ32 )(1 − 3 3 13ζ32 + 2 3 169ζ3 ) 79 + 55 3 13 + 7 3 169
= = = .
α αα2 α3 1236 1236 4

11.2.2 – The Fundamental Theorem of Galois Theory


Proposition 11.1.13 already showed a close connection between subgroups of Aut(K) and subfields
of K, where K is any field. The Fundamental Theorem of Galois Theory shows that in the case of
Galois extensions K/F , this correspondence is very rich.
If K/F is a field extension, we denote by Sub(K/F ) the set of fields E such that F ⊆ E ⊆ K.
11.2. FUNDAMENTAL THEOREM OF GALOIS THEORY 567

Theorem 11.2.7 (Fundamental Theorem of Galois Theory)


Let K/F be a Galois extension and set G = Gal(K/F ). There exists a bijection
Sub(K/F ) ←→ Sub(G) given by the correspondence:

Sub(K/F ) Sub(G)
E −→ Aut(K/E)
Fix(K, H) ←− H

Under this correspondence, the following are true:


(1) If E1 , E2 correspond to H1 , H2 respectively, then E1 ⊆ E2 if and only if H2 ≤ H1 .
(2) [K : E] = |H| and [E : F ] = |G : H|.
(3) K/E is Galois with Galois group Gal(K/E) = H.

(4) If E1 , E2 correspond to H1 , H2 respectively then E1 ∩ E2 ↔ hH1 , H2 i and E1 E2 ↔


H1 ∩ H2 . Hence, the lattice of subfields of K containing F is dual (upside) to the
lattice of subgroups of G.
(5) The isomorphisms of E into the algebraic closure F of F and which fix F are in
a bijective correspondence with the cosets {σH} of H in G. In particular, E/F is
Galois if and only if H E G and if this is the case, then Gal(E/F ) = G/H.

Proof. By Theorem 11.2.4, we can suppose that K is the splitting field of some separable polynomial
f (x) ∈ F [x].
Proposition 11.1.13 already established injectivity from right to left as well as part (1).
We now prove surjectivity from right to left. For any E ∈ Sub(K/F ), we can view f (x) as a
polynomial in E[x]. Then K is the splitting field of f (x) over E and by Theorem 11.2.4, K/E is
Galois. By Corollary 11.1.17, E is the fixed field of Aut(K/E). Now, Aut(K/E) is a subgroup of
Aut(K/F ) and we have now proved that E = Fix(K, Aut(K/E)). This gives surjectivity from right
to left. This also establishes part (3).
For part (2), let E ∈ Sub(K/F ) with H = Gal(K/E) and thus also E = Fix(K, H). Then since
K/E is Galois, |H| = | Gal(K/E)| = [K : E]. It follows that

[E : F ] = [K : F ]/[K : E] = |G|/|H| = |G : H|.

For part (4), first consider two subfields E1 , E2 ∈ Sub(K/F ). If σ ∈ Aut(K/E1 ) ∩ Aut(K/E2 ),
then σ fixes everything in the composite field E1 E2 . Thus,

Aut(K/E1 ) ∩ Aut(K/E2 ) ≤ Aut(K/(E1 E2 )).

Conversely, if σ ∈ Aut(K/(E1 E2 )), then σ fixes every element of E1 and also every element of E2 .
Therefore,
Aut(K/(E1 E2 )) ≤ Aut(K/E1 ) ∩ Aut(K/E2 )
which shows that these two subgroups are equal. Second, consider H1 , H2 ∈ Gal(K/F ). If a ∈
Fix(K, H1 ) ∩ Fix(K, H2 ), then a is fixed by any element in H1 and any element in H2 , so a ∈
Fix(K, hH1 , H2 i). Thus,

Fix(K, H1 ) ∩ Fix(K, H2 ) ⊆ Fix(K, hH1 , H2 i).

Conversely, if a ∈ Fix(K, hH1 , H2 i), then a is certainly in both Fix(K, H1 ) and in Fix(K, H2 ),
thereby establishing the reverse inclusion.
For part (5), let E = Fix(K, H) for H ≤ G. Every σ ∈ G gives an isomorphism σ|E : E → σ(E)
where σ(E) is a conjugate subfield of E in K. On the other hand, if we consider any isomorphism
between field τ : E → E 0 where E 0 ⊆ F̄ which fixes F , then τ (E) is in fact contained in K. We see
568 CHAPTER 11. GALOIS THEORY

D3

hρi
H
hτ i hτ ρi 2
hτ ρ i

{1}
Gal(K/E) Fix(K, H)
K

√ √ √
Q( 3 13) Q( 3 13ζ3 ) Q( 3 13ζ32 )
E
Q(ζ3 )

Figure 11.1: Illustration of Galois correspondence

this because if α ∈ E with minimal polynomial mα,F (x), then τ (α) is another root of mα,F (x) and
K contains all these roots. Since K is the splitting field of f (x) over E, then it is also the splitting
field of τ (f (x)). By Theorem 7.6.10 τ : E → τ (E) extends to an isomorphism σ : K → K.
Since σ fixes F (since τ does), we conclude that every isomorphism τ of E with another field
extension of F is the restriction of some σ ∈ Aut(K/F ).
Now two automorphisms of σ1 , σ2 ∈ G restrict to the same embedding τ of E fixing F if σ2−1 σ1
is the identity map on E, which means that σ2−1 σ1 ∈ H, which also means that σ1 H = σ2 H as
cosets in G. Consequently, there is a bijection between the cosets of H and the isomorphisms of
E with some subfield of F . In particular, we again see the result of part (2) that the number of
isomorphisms of E with subfield of F is |G : H| = [E : F ].
The extension E/F is Galois if and only if | Aut(E/F )| = [E : F ]. This will only be the case if
all the isomorphisms of E with some subfield of F are actually isomorphisms of E with itself that
fix F . Thus, E/F is Galois if an only if σ(E) = E for all σ ∈ G. However, since H = Aut(K/E),
then σHσ −1 = Aut(K/σ(E)). And we conclude from previous parts of the theorem that σ(E) = E
if and only if σHσ −1 = H for all σ ∈ G. This concludes part (5). 

The Fundamental Theorem of Galois Theory can be restated in the following way. The function
Ψ : Sub(K/F ) ←→ Sub(Aut(K/F )) defined by Ψ(E) = Aut(K/E) is a monotonic bijection between
the lattices (Sub(K/F ), ⊇) and (Sub(Aut(K/F )), ≤) in which Galois extensions E/F correspond to
normal subgroups N = Aut(K/E) in Aut(K/F ). The statement that Ψ is a bijection between
lattices (as opposed to just posets) means that it must preserve greatest lower bounds and least
upper bounds. (Part (4).)
As a first example, let us revisit the running example we have considered regularly in this and
the previous section.

Example 11.2.8. Consider the Galois extension K = Q( 3 13, ζ3 ) over Q. In Example 11.1.10, we
saw that Aut(K/Q) ∼ = D3 generated by ρ and τ defined by
( √ √ ( √ √
ρ( 3 13) = 3 13ζ3 τ ( 3 13) = 3 13
ρ: and τ:
ρ(ζ3 ) = ζ3 τ (ζ3 ) = ζ32 .

The subgroup lattice of D3 is easy to draw. (See the top part of Figure 11.1.)
11.2. FUNDAMENTAL THEOREM OF GALOIS THEORY 569

The fixed fields Fix(K, √


√ H) for various H ∈ Sub(G) are √ Fix(K, hρi) = Q(ζ3 ), Fix(K, hτ i) =
Q( 3 13), Fix(K, hτ ρi) = Q( 3 13ζ3 ), and Fix(K, hτ ρ2 i) = Q( 3 13ζ32 ). Hence, the Fundamental The-
orem of Galois Theory implies that the lattice of subfields Sub(K/F ) has the Hasse diagram as
depicted in the lower part of Figure 11.1. Intuitively, the two diagrams are in relationship with each
other via a reflection through a horizontal line. 4
4 2
Example 11.2.9.√ Consider the polynomial f (x) = x − 6x + 6 ∈ Q[x]. The quadratic formula
gives x2 = 3 ± 3, so the four roots of the equation are
√ √ √ √
q q q q
3 + 3, − 3 + 3, 3 − 3, − 3 − 3.
p √ p √ √ β

Set α = 3 + 3 and β = 3 − 3. By Exercise 11.2.12, since αβ = 6 ∈ / Q and αβ+α = 6∈/ Q,
then Gal(f (x)) = D4 and in particular, if K is the splitting field of f (x), then [K : F ] = |D4 |√= 8.
The splitting
√ field of f (x) is obviously
√ K = Q(α, β). However, we note √ that αβ = 6 so
2
K
√ = Q(α, 6). Furthermore, since 3 = α − 3 ∈ Q(α), then K = Q(α, 2). Explicitly, β =
2(α2 − 3)/α. Then
√ √
8 = [K : F ] = [Q(α, 2) : Q(α)][Q(α) : Q] = 4[Q(α, 2) : Q(α)].
√ √
Therefore, [Q(α, 2) : Q(α)] = 2 and in particular 2√∈ / Q(α). The Galois group Gal(K/F ) is
generated by the action of automorphisms on α and on 2. We can use the generators,
( p √ p √ ( p √ p √
ρ( 3 + 3) = 3 − 3 τ ( 3 + 3) = 3 + 3
ρ: √ √ and τ: √ √
ρ( 2) = 2 τ ( 2) = − 2.

Notice that τ is a transposition in that τ 2 = 1. On the other hand,


√ ! √
2 2(α2 − 3) ρ( 2)(ρ(α)2 − 3)
ρ (α) = ρ(β) = ρ =
α ρ(α)
√ √ √ p √
2((3 − 3) − 3) − 6 √
q
3+ 3
= p √ =p √ p √ = − 3 + 3.
3− 3 3− 3 3+ 3
Thus, ρ2 (α) = −α. Consequently, ρ3 (α) = −β and ρ4 = 1. We also calculate that
( p √ √ 2
2(α −3)
 p √ ( p √ p √
τ ρ( 3 + 3) = τ α = − 3 − 3 3 ρ3 τ ( 3 + 3) = − 3 − 3
τρ : √ √ and ρ τ : √ √
τ ρ( 2) = − 2 ρ3 τ ( 2) = − 2.

We see explicitly that the generators of Gal(K/F ) have the same relations as D4 . The lattice of D4
is the following.

Gal(K/F )

hρ2 , τ i hρi hρ2 , τ ρi

hτ i hτ ρ2 i hρ2 i hτ ρi hτ ρ3 i

According to the Fundamental Theorem of Galois Theory, the Hasse diagram of the lattice of
subextensions in Sub(K/F ) is:
570 CHAPTER 11. GALOIS THEORY

√ √
Q(α) Q(β) Q( 2, 3) Q(α − β) Q(α + β)

√ √ √
Q( 3) Q( 2) Q( 6)

Again, note that the above two diagrams are reflections of each other through a horizontal line. In
the above diagram, we took care to calculate the fixed subfields of K by determining elements that
remained invariant under the action of the corresponding subgroup H. A useful result for the above
calculations is that
√ √ √
τ (β) = τ ( 2(α2 − 3)/α) = τ ( 2)(τ (α)2 − 3)/τ (α) = (− 2)(α2 − 3)/α = −β.
2
For example, if H √ = hρ , τ i, then Fix(K, H)√is a subfield of K of index 2 over Q. √ Consequently,2
Fix(K, H) = Q( a) for some a ∈ Q. Since 3 √= α2 − 3, it is easy to see that 3 is fixed √ by ρ
2 2 2 2 2 2 2
(since
√ ρ (α − 3) = (ρ (α)) − 3 = (−α) − 3 = 3. Furthermore, τ (α − 3) = α − 3 = 3. Thus,

Q( 3) ⊆ Fix(K, H), so knowing that [Fix(K, H) : Q] = 2, we deduce that Fix(K, H) = Q( 3).
By the Fundamental Theorem of Galois Theory, knowing the normal subgroups√of Gal(K/F √ √) =
D4√, we deduce √ that the only nontrivial Galois extensions of Q inside K are K, Q( 2, 3), Q( 2),
Q( 3), and Q( 6). 4

By virtue of the close connection to group theory and Galois extensions, it is common to adapt
terms heretofore applied only to describe groups to field extensions.

Definition 11.2.10
A field extensions K/F is called

(1) abelian if K/F is Galois and Gal(K/F ) is an abelian group;


(2) cyclic if K/F is Galois and Gal(K/F ) is a cyclic group.

Exercises for Section 11.2


1. Prove the following equivalent characterization to Galois extensions (adding to that in Theorem 11.2.4):
A finite extension K/F is Galois if and only if K is separable and normal over F .
2. Prove that if an irreducible cubic f (x) ∈ Q[x] has exactly one real root, then Gal(f (x)) ∼
= S3 .

3. Let K = Q( a), where a is a negative integer. Prove that K is not the subfield of a cyclic extension
of degree 4 of Q.
4. Calculate the Galois groups over Q of the following polynomials.
(a) x3 + x + 1
(b) (x2 − 3)(x3 − 2)
(c) x3 − 2x + 1
5. Calculate the Galois group Gal(f (x)) of f (x) = x4 + 2x2 − 2 ∈ Q[x].
√ √
6. Determine the Galois group of x3 −17 over each of the following fields: (a) Q; (b) Q( −3); (c) Q( 3 17).
7. Find the Galois group of (x2 − 2)(x2 − 3) and draw the subfield lattice of its splitting field over Q.
8. Find the Galois group of x4 − 4x2 + 2 and draw the subfield lattice of its splitting field over Q.
9. Find the Galois group of (x3 − 2)(x3 − 3) and draw the subfield lattice of its splitting field over Q.
11.3. FIRST APPLICATIONS OF GALOIS THEORY 571

10. Find the Galois group of x4 − 13 and draw the subfield lattice of its splitting field over Q.
√ √ √
11. Let p1 , p2 , . . . , pn be distinct prime numbers and let K = Q( p1 , p2 , . . . , pn ).
(a) Show that K/Q is Galois.
(b) Show that Gal(K/Q) is generated by automorphisms σi defined by
( √
√ − pj if i = j
σi ( pj ) = √
pj 6 j.
if i =

(c) Prove that σi σj = σj σi .


√ √ √ √ √
(d) Prove that pi ∈ / Q( p1 , p2 , . . . , d pi , . . . pn ) (where the b
a notation means that a is removed
√ √
from a list). [Hint: By contradiction. Use σ pi = ± pi for all σ ∈ Gal(K/Q).]
(e) Prove that Gal(K/Q) ∼ = Z2n .
12. Let p(x) = x4 − 2ax2 + b be an irreducible polynomial in Q[x]. Denote its roots by ±α and ±β and
call K the splitting field of p(x) over Q.
(a) Show that Gal(p(x)) is isomorphic to a subgroup of D4 .
(b) Prove that
 √ √
Z4 if and only if b a2 − b ∈ Q;
√ √

Gal(p(x)) ∼= Z2 ⊕ Z2 if and only if b or a2 − b ∈ Q;

D4 otherwise.

√ √
13. Let F be a field of characteristic char F 6= 2. Suppose that c ∈ F such that c ∈/ F and let K = F ( c).
√ √ √
Let α = a + b c with a, b not both zero and call L = K( α). Set α0 = a − b c. Prove that L is
Galois over F if and only if either αα0 or cαα0 is a square in F . Prove also that Gal(L/F ) is cyclic of
degree 4 if and only if cαα0 is a square in F .
14. Let K = F (α) be a finite separable extension of a field F with [K : F ] prime. Let α = α1 , α2 , . . . , αp
be the conjugates of α over F . Prove that if α2 ∈ K, then K/F is Galois and that Gal(K/F ) ∼ = Zp .
15. Let L/F be a Galois extension and let p be the smallest prime dividing [L : F ]. Prove that if K is a
subfield of L containing F such that [L : K] = p, then K/F is a Galois extension.
16. Let E/F be a normal extension of finite degree. Let K and L be fields with F ⊆ K, L ⊆ E. Assume
that E is separable over K and over L. Prove that E is separable over K ∩ L.
17. Find an example of fields F ⊆ K ⊆ E such that K/F is Galois and E/K is Galois but such that E/F
is not Galois.
18. The relation of normal subgroup is not transitive. In other words if N1 and N2 are subgroups in G
such that N1 E N2 and N2 E G, then it is not necessarily true that N1 E G. Express this nontransitive
property in terms of Galois extensions under the Galois correspondence.
19. Let F be a field and let A ∈ Mn (F ). Let E be the splitting field of the characteristic polynomial
cA (x) over F . Let σ ∈ Gal(E/F ) be a field automorphism. Let Gal(E/F ) act on the vector space E n
by σ(v) = (σ(v1 ), σ(v2 ), . . . , σ(vn )) for all v = (v1 , v2 , . . . , vn ) ∈ E n . Prove that if w is a generalized
eigenvector of rank k with respect to the eigenvalue λ, then σ(w) is also a generalized eigenvector of
rank k with respect to the σ(λ).

11.3
First Applications of Galois Theory
The Galois correspondence established in Theorem 11.2.7 opens doors to many pathways of investi-
gation that relate properties of group theory and field extensions. This section explores a few initial
applications.
572 CHAPTER 11. GALOIS THEORY

11.3.1 – Consequences of Group Isomorphism Theorems


By virtue of the Fundamental Theorem of Galois Theory, many results about the structure of groups
imply theorems about the structure of subfields within a field extension. Usually, the fundamental
theorem invites a simple reinterpretation of the group theoretic language into the field theoretic
context. We only give a few proofs of propositions and we leave other proofs as exercises.

Proposition 11.3.1
Let K1 and K2 be Galois extensions of a field F . Then the intersection K1 ∩ K2 is Galois
over F and the composite field K1 K2 is Galois over F .

Proof. (Left as an exercise for the reader. See Exercise 11.3.3.) 

Proposition 11.3.2
Let K1 and K1 be Galois extensions of a field F with K1 ∩ K2 = F . Then

Gal(K1 K2 /F ) ∼
= Gal(K1 /F ) ⊕ Gal(K2 /F ).

Proof. (Left as an exercise for the reader. See Exercise 11.3.4.) 

Proposition 11.3.3
Let E be a finite separable extension of a field F . Then E is contained in an extension
K that is Galois over F and is minimal in the sense that in a fixed algebraic closure of K
every Galois extension of F containing E also contains K.

Proof. Since E is a finite extension of F , then E = F (α1 , . . . , αn ) for some finite set of algebraic
elements αi over F . The splitting field K 0 of the collection {mF,αi (x)} is Galois over F and contains
E. Now K 0 might not be the smallest field extension that satisfies the desired property.
By the Fundamental Theorem of Galois Theory, K 0 has only a finite number of subfields. By
Proposition 11.3.1, K1 ∩ K2 is again a Galois extension of F . The desired field K is the intersection
of all Galois subfields in K 0 that contain E, since this is again Galois. 

Definition 11.3.4
The field K in the above proposition is called the Galois closure of E over F .

As a simple example, we observe√ that K = Q( 3 13) is not Galois over Q. The field K 0 in the
proof of Proposition 11.3.3 is Q( 3 13, ζ3 ). There is no Galois subfield of K 0 containing K so K 0 is
the Galois closure of K. If K is already Galois, then it is its own Galois closure.
In a variety of applications, when considering the properties of a field extension K/F , it is useful
to embed K in the Galois closure of K over F . In the Galois closure, we can use Galois theory and
then restrict the study to the extensions K/F . The proofs of the following two propositions gives a
first example where this strategy is effective.

Proposition 11.3.5
Let F be a field, K a Galois extension of F , and L any extension of F . Then the composite
field KL is Galois over L with

Gal(KL/L) ∼
= Gal(K/K ∩ L),

which in turn is isomorphic to a subgroup of Gal(K/F ).


11.3. FIRST APPLICATIONS OF GALOIS THEORY 573

Proof. This is an application of the Second Isomorphism Theorem. Recall that if B is a normal
subgroup of a group G and A is any other group, then B E AB, A ∩ B E A and AB/B ∼ = A/(A ∩ B).
Let L0 be an extension of L that is Galois over F , say the Galois closure of L over F . Then KL0
is a Galois field extension of F . Call G = Gal(KL0 /F ). Let A be the subgroup of G corresponding
to L and let B be the subgroup corresponding to K under the Galois correspondence of subfields of
KL0 over F and subgroups of G. Since K is Galois over F , then by the fundamental theorem, B is
a normal subgroup of G.
Again by the fundamental theorem, Fix(KL0 , AB) = K ∩ L and Fix(KL0 , A ∩ B) = KL. Thus,
we conclude that K is Galois over K ∩ L and KL is Galois over L and that
Gal(KL/L) ∼
= Gal(K/K ∩ L). 

Proposition 11.3.6
Suppose that K is a Galois extension of F and that L is any finite extension. Then

[K : F ][L : F ]
[KL : F ] = .
[K ∩ L : F ]

Proof. Let L0 be the Galois closure of L over F . Let G = Gal(KL0 /F ). Under the Galois corre-
spondence, let A correspond to L and B correspond to K. In particular, A = Gal(KL0 /L) and
B = Gal(KL0 /K). By Proposition 4.1.16,
|A||B|
|AB| = .
|A ∩ B|
However, by definition any Galois extension E of F has | Gal(E/F )| = [E : F ]. Hence,
[KL0 : L][KL0 : K]
| Gal(KL0 /(K ∩ L))| = [KL0 : K ∩ L] = .
[KL0 : KL]
But for any subfield E of KL0 containing F , we have [KL0 : E] = [KL0 : F ]/[E : F ]. The
proposition follows after simplification. 

11.3.2 – Norm and Trace of Algebraic Elements


Another application of the Fundamental Theorem of Galois Theory gives us a method to find im-
portant elements in a base field F related to algebraic elements over F .
Let K/F be any finite extension and let α ∈ K. Let L be a Galois extension of F that contains
K. By the fundamental theorem, K corresponds to a subgroup H ≤ Gal(L/F ) via H = Gal(L/K).
For all σ ∈ H, we have σ(α) = α because α ∈ K = Fix(L, H).

Definition 11.3.7
With the above fields, we define the norm of α from K to F as the product
Y
NK/F (α) = σ(α),
σ

where the product is taken over a complete set of distinct coset representatives of H in
Gal(L/F ).

This product is well-defined because if σ1 and σ2 are two different representatives of the same
coset, then σ2−1 σ1 ∈ H. Therefore, σ2−1 σ1 (α) = α and therefore σ1 (α) = σ2 (α). Hence, no matter
the choice of coset representative taken in the product, the value of σ(α) remains the same.
Note that if K is Galois, then we can take L = K and the product in Definition 11.3.7 runs over
all elements in Gal(K/F ).
574 CHAPTER 11. GALOIS THEORY

Proposition 11.3.8
The norm is a function NK/F : K → F satisfying NK/F (αβ) = NK/F (α)NK/F (β) for all
α, β ∈ K.

Proof. For all τ ∈ Gal(L/F ),


!
Y Y
τ (NK/F (α)) = τ σ(α) = (τ σ)(α).
σ σ

However, since the action of a group G on the cosets of a subgroup H is a transitive action, as σ
runs through a complete set of distinct coset representatives, so does τ σ. Hence,
Y
τ (NK/F (α)) = σ(α) = NK/F (α).
σ

Thus, NK/F (α) ∈ Fix(L, Gal(L/F )) = F .


Now let α, β ∈ K. Then
! !
Y Y Y Y
NK/F (αβ) = σ(αβ) = σ(α)σ(β) = σ(α) σ(β) = NK/F (α)NK/F (β).
σ σ σ σ 

The norm of an algebraic element α over a base field F provides a direct way to calculate the
inverse of α. In the product defining the norm, the coset representative of H fixes α, so α appears
explicitly in the product. Consequently,
!
1 1 Y
= σ(α) .
α NK/F (α)
σ: σ ∈H
/

We understand the product to run over all conjugates of α over F except α itself.
√ √
Example 11.3.9. Consider√ the element α = 2−3( 3 13)2 in K = Q( 3 13). We know that K/Q is not
Galois but that L = Q( 3 13, ζ3 ) is a Galois extension over Q. Using the notation of Example 11.2.6,
K = Fix(L, hτ i) so a distinct set of coset representatives of H = hτ i is 1, ρ, ρ2 . Hence,

NK/F (α) = αρ(α)ρ2 (α)


√ √ √
= (2 − 3( 13)2 )(2 − 3( 13)2 ζ32 )(2 − 3( 13)2 ζ3 )
3 3 3

= −4555.

Furthermore, we can use this norm to calculate


1 1 √ √ 1 √ √
(2 − 3( 13)2 ζ32 )(2 − 3( 13)2 ζ3 ) = − (4 + 117 13 + 6( 13)2 ).
3 3 3 3
=
α NK/F (α) 4555 4

The trace of an algebraic element is a similar construction as the norm.

Definition 11.3.10
We define the trace of α from K to F as the sum
X
TrK/F (α) = σ(α),
σ

where the sum is taken over a complete set of distinct coset representatives of H in
Gal(L/F ).
11.3. FIRST APPLICATIONS OF GALOIS THEORY 575

The sum is well-defined for the same reason that the norm was well-defined, despite the use of
coset representatives. We omit the proof for the following proposition since it is nearly identical to
the equivalent one for the norm.

Proposition 11.3.11
The trace is a function TrK/F : K → F satisfying TrK/F (α + β) = TrK/F (α) + TrK/F (β)
for all α, β ∈ K.

The definition of the norm and the trace of elements from a field K to a base field F illustrate
a rather common strategy in field theory. Since the automorphism groups of Galois extensions of
a base field have nice group properties, when studying an arbitrary finite field extension K/F it is
convenient to simply work in the Galois closure of K over F .

11.3.3 – The Fundamental Theorem of Algebra


The Fundamental Theorem of Algebra (that every polynomial in C[x] has a root in C) is often taught
in the algebra curriculum shortly after the introduction of complex numbers. Though very useful
for many applications of algebra, this theorem is difficult to prove. Its early introduction into the
curriculum is not unlike teaching the formula for the area of a disk in early classes in geometry: It
hides the difficulty involved in proving such an important result. One of the difficulties in the proof
comes from the interplay between algebra and topology. Most (if not all) of the proofs must take
into account a sort of continuity in the set of complex numbers, that there are no holes.
We conclude this section with a proof of the Fundamental Theorem of Algebra using Galois
theory. We point out ahead of time the strangeness of the statement “let F be a splitting field over
C”; this is a strange statement only because we are so familiar with the result that C is algebraically
closed.

Theorem 11.3.12 (Fundamental Theorem of Algebra)


The field C is algebraically closed.

Proof. To show that C is algebraically closed, we must show that every polynomial p(z) ∈ C[z] has
a root in C.
Let p(z) ∈ C[z] and consider the polynomial q(z) = p(z)p(z). If

p(z) = cn z n + · · · + c1 z + c0 ,

then the mth coefficient of q(z) is


X
ci cj .
i+j=m

However, we see that for all m with 0 ≤ m ≤ 2n, the mth coefficient is unchanged under complex
conjugation so we deduce that q(z) ∈ R[z]. Furthermore, if z0 is a root of q(z), then either p(z0 ) = 0
or p(z0 ) = 0, so either z0 or z0 is a root of p(z). Hence, in order to prove that every polynomial
p(z) ∈ C[z] has a root in C, it suffices to prove that every polynomial q(x) ∈ R[x] has a root in C.
We now prove by induction on ord2 (deg q(x)) that every polynomial q(x) ∈ R[x] has a root in C.
For the base case, assume that ord2 (deg(q(x)) = 0, i.e., that deg q(x) is odd. Then
 
lim q(x) = − lim q(x) = ±∞.
x→−∞ x→∞

Hence, for a, b ∈ R large enough in the negative sense and positive sense, q(a) and q(b) are of
opposite signs. Consequently, by the Intermediate Value Theorem, since polynomials are continuous
functions, there exists c ∈ [a, b] such that q(c) = 0. Thus, every odd degree real polynomial has a
real root.
576 CHAPTER 11. GALOIS THEORY

Now let k ≥ 1 and suppose that every polynomial q(x) ∈ R[x] with ord2 (deg q(x)) < k has a root
in C. Let F be a splitting field of q(x) over C, so that in F we write

q(x) = a(x − z1 )(x − z2 ) · · · (x − zn ),

where z1 , z2 , . . . , zn ∈ F and a = LC(q(x)). For a real parameter t, define the polynomials


Y
ft (z) = (z − (zi + zj + tzi zj )).
1≤i<j≤n

A priori, this polynomial is in the ring F [x], where F is an algebraic and Galois extension of C and
also of R. The degree of ft (z) is n2 = 21 n(n − 1) = 2k−1 m(n − 1). Since n is even, n − 1 is odd
and so ord2 (deg ft (z)) = k − 1. Any σ ∈ Gal(F/R) permutes the terms in the product of ft (z) so in
fact the coefficients of ft (z) are invariant under σ. Thus, ft (z) ∈ R[x]. By the induction hypothesis,
ft (z) has a root in C.
Every root of ft (z) is of the form zi + zj + tzi zj . There are only n2 pairs (i, j) but there are


infinitely many parameters t ∈ R, so by the pigeonhole principle there is a specific pair (i, j) such
that there exist two distinct parameter values t1 6= t2 such that

zi + zj + t1 zi zj = c1 ∈ C and zi + zj + t2 zi zj = c2 ∈ C.

By elementary algebra, we find that


c1 − c2 t2 c1 − t1 c2
zi zj = and zi + zj = .
t1 − t2 t2 − t1
This shows that zi and zj are roots of the quadratic polynomial

(t1 − t2 )z 2 + (t2 c1 − t1 c2 )z + (c1 − c2 ) = 0.

This polynomial is in C[x]. However, the quadratic formula gives explicit solutions in C for any
quadratic polynomial in C. Hence, both these specific zi and zj are in C. Consequently, we have
proven that q(x) has a root in C. This completes the induction proof and the theorem follows. 

The proof given above is due to Laplace in 1795. However, at the time that Laplace offered this
argument, it was incomplete because the theorem on the existence of splitting fields of a polynomial
had not been established. The essay by Remmert in [52] gives a excellent summary of the history
behind various proofs of the FTA, Fundamental Theorem of Algebra.

Exercises for Section 11.3


p √
1. Determine the Galois closure of Q( 5 − 3) over Q.
p
3 √ p
3 √
2. Let α = 1 + 2 + 1 − 2. Determine the Galois closure of Q(α) over Q.
3. Prove Proposition 11.3.1. [Hint: Use the characterization that finite Galois extensions of F are splitting
fields over F .]
4. Prove Proposition 11.3.2.
5. State and prove the field theory result that ensues from the Third Isomorphism Theorem of groups.

6. Let D be square-free. Prove that if K = Q( D), then the absolute value of the norm NK/Q is the
same as the field norm defined in Subsection 6.7.2.
p √
7. Let K = Q( 1 + 2) be an extension of Q.
(a) Find a Galois extension of Q that contains K.
p √
(b) Find the norm and the trace of 3 + 7 1 + 2 from K over Q.
p √
(c) Find the inverse of 3 + 7 1 + 2 in K.
p √
(d) Observe that
√ K is Galois over Q. Use this to find the norm and the trace of 3 + 7 1 + 2 from
K over Q( 2).
11.4. GALOIS GROUPS OF CYCLOTOMIC EXTENSIONS 577


8. Let n ∈ Z be cube-free and consider the extension K = Q( 3 n). Give formulas for the norm to Q, the
√ √ 2
trace to Q, and the inverse in K of the generic element α = a + b 3 n + c 3 n .
√ √
9. Consider the ring R = Z[ 3 2] inside its field of fractions K = Q( 3 2).

(a) Prove that the norm NK/Q , when restricted to R, is a multiplicative function from Z[ 3 2] to Z.
(b) Prove that the group of units U (R) is {α ∈ R | NK/Q (α) = ±1}.

(c) Find one unit in R = Z[ 3 2] that is neither 1 nor −1.
√ √ 2
10. Using Galois conjugates, find the minimal polynomial of Q of 3 − 3 7 + 2 3 7 .
11. Suppose that K/F is any field extension with [K : F ] = n. Let α ∈ K. Suppose that the minimal
polynomial of α over F is mα,F (x) = xd + ad−1 xd−1 + · · · + a1 x + a0 .
(a) Prove that d | n.
(b) Prove that there are d distinct Galois conjugates that are repeated n/d times in the product for
the norm NK/F (α).
n/d
(c) Deduce that NK/F (α) = (−1)n a0 .
(d) Deduce also that TrK/F (α) = − nd ad−1 .
12. Let K/F be a Galois extension and let σ ∈ Gal(K/F ).
(a) Prove that if α = β/σ(β), for some β ∈ K, then NK/F (α) = 1.
(b) Prove that if α = β − σ(β), for some β ∈ K, then TrK/F (α) = 1.
13. Let K/F be a finite extension. Prove that K is a simple extension of F (with K = F (α)) if and only
if there exist only finitely many subfields of K containing F .

11.4
Galois Groups of Cyclotomic Extensions
As we continue to develop the theory of Galois groups of polynomials, it is very useful to understand
the Galois groups of xn − a and more precisely xn − 1. In Section 7.5, we saw that xn − 1 can
be factored into cyclotomic polynomials so we propose to study the Galois groups of cyclotomic
polynomials Φn (x) ∈ Q[x].

11.4.1 – Cyclotomic Extensions


Let n be a positive integer. Recall that the roots of Φn (x) are precisely the primitive nth roots of
unity. Setting ζn = e2πi/n , recall that these primitive roots of unity are ζna with 1 ≤ a ≤ n and
gcd(a, n) = 1. Consequently, Q(ζn ) is the splitting field of Φn (x), which means that the cyclotomic
extension Q(ζn )/Q is Galois. An automorphism in Gal(Q(ζn )/Q) is uniquely determined by how
it maps ζn , which must be to some other primitive nth root of unity. This leads to the following
simple theorem.

Theorem 11.4.1
The Galois group of Q(ζn ) is isomorphic to U (n) = U (Z/nZ) given explicitly by

U (n) −→ Gal(Q(ζn )/Q)


a 7−→ σa = (ζn 7→ ζna ).

With this result and the Fundamental Theorem of Galois Theory, we can construct primitive
field elements for all the subfields of Q(ζp ), where p is an odd prime. Recall that [Q(ζp ) : Q] = p − 1
and that the cyclotomic polynomial Φp (x) is

Φp (x) = xp−1 + xp−2 + · · · + x + 1.


578 CHAPTER 11. GALOIS THEORY

Though it is natural to use 1, ζp , ζp2 , . . . , ζpp−2 as a basis of Q(ζp ) over Q, we can alternatively use
the elements ζp , ζp2 , . . . , ζpp−2 , ζpp−1 because
1 = −ζp − ζp2 − · · · − ζpp−2 − ζpp−1 .
Furthermore, by Proposition 7.5.2, U (Z/pZ) = U (Fp ) is a cyclic group so Gal(Q(ζp )/Q) ∼
= Zp−1 . A
generator of U (Z/pZ) is called a primitive root modulo p.
Now let H be any subgroup of G = Gal(Q(ζp )/Q). Define the element
X
αH = σ(ζp ).
σ∈H

Then for all automorphisms τ ∈ H, we have


X X
τ (αH ) = τ (σ(ζp )) = (τ σ)(ζp ).
σ∈H σ∈H

However, the action by left-multiplication of τ on the elements of H is a permutation on H. There-


fore, in the above product τ σ runs through all the elements of H so
X
τ (αH ) = σ 0 (ζp ) = αH .
σ 0 ∈H

Consequently, αH is in the fixed field Fix(Q(ζp ), H).


Conversely, suppose that τ ∈ G is not in H and τ (αH ) = αH . Any τ ∈ Gal(Q(ζp )/Q) simply
permutes elements in the basis that we chose for Q(ζp ) over Q, precisely because the basis elements
are all the roots of the minimal polynomial Φp (x). Thus, τ (αH ) is the sum of some basis elements of
Q(ζp ). By the properties of a basis, the condition that τ (αH ) = αH implies that there exists σ ∈ H
such that τ (ζp ) = σ(ζp ), as one of the terms in αH . But automorphisms on Q(ζp ) are completely
determined by their action on ζp so if τ −1 σ(ζp ) = ζp , then τ = σ. This is a contradiction since we
assumed τ ∈ / H. Thus, αH is not fixed by any automorphism not contained in H. We conclude that
Fix(Q(ζp ), H) = Q(αH ).
This result gives a process for constructing all the field extensions of Q that are subfields of a
given Q(ζp ), where p is an odd prime.
Example 11.4.2. Let p = 7. The Galois group is Gal(Q(ζ7 )/Q) = U (F7 ) ∼ = Z6 . A generator for
U (Z/7Z) is 3. (Note that 2 is not a generator.) A subgroup lattice of U (F7 ) is
U (F7 )

h2i

h6i

{1}

With H = h2i = {1, 24} a generator of Fix(Q(ζ7 ), H) is ζ7 + ζ72 + ζ74 . Similarly, a generator of
the subfield Fix(Q(ζ7 ), h6i) is ζ7 + ζ76 . Hence, by the Fundamental Theorem of Galois Theory, the
subfield structure of Q(ζ7 ) is
Q(ζ7 )

Q(ζ7 + ζ7−1 )

Q(ζ7 + ζ72 + ζ74 )

Q
11.4. GALOIS GROUPS OF CYCLOTOMIC EXTENSIONS 579

It is not hard to find minimal polynomials for α = ζ7 + ζ72 + ζ74 and β = ζ7 + ζ76 . Note that

α2 = ζ72 + ζ74 + ζ7 + 2ζ73 + 2ζ75 + 2ζ76 .

However, 1 + ζ7 + ζ72 + · · · + ζ76 = 0 since these roots of unity are the distinct roots of x7 − 1 = 0.
Thus, we notice that α2 = −2 − α. Hence, α is a root of x2 + x + 2.
For β we calculate that

β 2 = ζ72 + ζ75 + 2 and β 3 = ζ72 + 3ζ7 + 3ζ76 + ζ74 .

We can observe that

β 3 + β 2 = (1 + ζ7 + ζ72 + · · · + ζ76 ) + 1 + 2(ζ7 + ζ76 ).

So we conclude that β 3 + β 2 − 2β − 1 = 0. Since Q(β) has degree 3, then m(x) = x3 + x2 − 2x − 1


must be the minimal polynomial of ζ7 + ζ7−1 over Q. 4

The following example illustrates how the process of defining αH does not necessarily work for
Q(ζn ) where n is not a prime number.
Example 11.4.3. Consider the field Q(ζ9 ). This is Galois over Q with Galois group G = U (Z/9Z) ∼ =
Z6 . We can write G = {σ1 , σ2 , σ4 , σ5 , σ7 , σ8 }, where σa (ζ9 ) = ζ9a . This group has two nontrivial
subgroups H1 = {σ1 , σ4 , σ7 } and H2 = {σ1 , σ8 }.
For either H ≤ G, define αH ∈ Q(ζ9 ) as the sum of the Galois conjugates of ζ9 in Q(ζ9 ), namely
X
αH = σ(ζ9 ).
σ∈H

Note that αH is fixed by H so αH ∈ Fix(Q(ζ9 ), H). In these cases,

α1 = αH1 = ζ9 + ζ94 + ζ97 and α2 = αH2 = ζ9 + ζ98 = ζ9 + ζ9−1 .

Since the sum of the primitive 9th roots of unity is 0, it is not hard to show as we did in the
previous example that α2 is a root of x3 − 3x + 1 = 0 and that this polynomial is irreducible. So
(as expected), [Q(α2 ) : Q] = 3. However, we observe that

α12 = ζ92 + ζ98 + ζ95 + 2ζ95 + 2ζ98 + 2ζ92 = 3(ζ92 + ζ95 + ζ98 ) = 3ζ9 α1 .

This implies that α1 solves x(x − 3ζ9 ) = 0. By the triangle inequality |α1 | < 3, while |3ζ9 | = 3.
Hence, α1 = 0. In particular, α1 is fixed by all of G and not just by H1 . Consequently, the element
α1 is not a generator for Fix(Q(ζ9 ), H1 ). On the other hand, by the same reasoning as above,
ζ9 ζ94 ζ97 = ζ93 is in Fix(Q(ζ9 ), H1 ). But ζ93 = ζ3 is not in Q but has degree 2. Hence, we find that
Fix(Q(ζ9 ), H1 ) = Q(ζ3 ). 4

The reason why αH might not be a primitive element of simple extension of Q in Q(ζn ), when
n is composite follows from the fact that, though {ζna | 0 ≤ a ≤ ϕ(n) − 1} is a basis of Q(ζn ), the
set {ζna | a ∈ U (n)} is not a basis of Q(ζn ). For example, with n = 9, the element ζ93 = ζ3 is not in
{ζ9a | a ∈ U (9)}.

Corollary 11.4.4
If n has the prime decomposition n = pa1 1 pa2 2 · · · pakk , then

Gal(Q(ζn )/Q) ∼
= Gal(Q(ζpa1 1 )/Q) × Gal(Q(ζpa2 2 )/Q) × · · · × Gal(Q(ζpak )/Q). k

Proof. We know that Gal(Q(ζn )/Q) ∼


= U (Z/nZ). By Corollary 5.6.19,
Gal(Q(ζn )/Q) ∼
= U (p1α1 ) ⊕ U (pα
αk
2 ) ⊕ · · · ⊕ U (pk ),
2

and the corollary follows. 


580 CHAPTER 11. GALOIS THEORY

Corollary 11.4.5
The order of the Galois field extension is [Q(ζn ) : Q] = φ(n), where φ is the Euler totient
function.

11.4.2 – Constructible Regular n-gons


Galois theory answers the old question about constructible regular n-gons, which we had begun to
answer in Section 7.4. Recall that Theorem 7.4.8 stated that if a real number α is constructible, then
[Q(α) : Q] = 2k for some nonnegative integer k. In that section, we also pointed out that a regular
n-gon is constructible with a compass and a straightedge if and only if cos(2π/n) is a constructible
number.
The nth roots of unity form the vertices of a regular n-gon in the complex plane, so the algebraic
properties of the roots of unity are related to the constructibility of a regular n-gon. Consider the
root of unit ζn = e2πi/n . Since ζn = cos(2π/n) + i sin(2π/n), we easily deduce that
1
cos(2π/n) = (ζn + ζn−1 ).
2
Hence, cos(2π/n) ∈ Q(ζn ). Conversely, since
p
ζn = cos(2π/n) + i sin(2π/n) = cos(2π/n) + cos2 (2π/n) − 1,

then ζn is in an extensions of degree 2 of Q(cos(2π/n)). This leads to the following proposition.

Proposition 11.4.6
A regular n-gon is constructible if and only if [Q(ζn ) : Q] = 2k for some nonnegative integer
k. Consequently, a regular n-gon is constructible if and only if φ(n) = 2k .

Proof. The discussion motivating the proposition showed that if a regular n-gon is constructible then
[Q(cos(2π/n)) : Q] is a power of 2, which in turn implies that [Q(ζn ) : Q] is a power of 2. However,
the converse is not at all obvious.
Suppose that [Q(ζn ) : Q] = 2k . Then Gal(Q(ζn )/Q) ∼ = U (n) has an order that is a power of 2.
Since the group U (n) is abelian, the subgroup of Gal(Q(ζn )/Q) corresponding to complex conjugation
σc is a normal subgroup. Hence, by the Fundamental Theorem of Galois Theory, Q(cos(2π/n)) is
Galois over Q and Gal(Q(cos(2π/n))/Q) is a quotient group of Gal(Q(ζn )/Q) and hence is an abelian
group whose order is a power of 2.
By the Fundamental Theorem of Finitely Generated Abelian Groups, an abelian group G whose
order is a power of 2, say 2k , has the form

G∼
= Z2a1 ⊕ Z2a2 ⊕ · · · Z2as .

Since a cyclic group Zm has a subgroup of order d for all d | m, then it is possible to find a subgroup
G2 of order 2k−1 . By induction, there exists a sequence of subgroups

{1} = Gk ≤ Gk−1 ≤ · · · ≤ G1 ≤ G0 = G

such that |Gi−1 : Gi | = 2 for all i with 1 ≤ i ≤ k. Let {1} = Hk ≤ Hk−1 ≤ · · · ≤ H1 ≤ H0 be


such a sequence of subgroups for Gal(Q(cos(2π/n))/Q). Because |Hi−1 : Hi | = 2, then Hi E Hi−1
for 1 ≤ i ≤ k. Define the fixed fields Ki = Fix(Q(cos(2π/n)), Hi ). In particular, K0 = Q and
Kk = Q(cos(2π/n)).
The field K1 is a quadratic extension of Q, so K1 = Q(α1 ) for some algebraic element α1 of
degree 2 over Q. But then α1 is a constructible number. Similarly, for all 1 ≤ i ≤ k, Ki = Ki−1 (αi ),
where αi is quadratic over Ki−1 . By an induction argument, we see that each αi is constructible
and every element in Ki is constructible. In particular, cos(2π/n) ∈ Kk is a constructible number,
11.4. GALOIS GROUPS OF CYCLOTOMIC EXTENSIONS 581

so the regular n-gon is constructible with a compass and a straightedge. This establishes the first if
and only if statement.
Finally, the second if and only if statement holds because [Q(ζn ) : Q] = φ(n). 

Before we give the (almost) final word on constructible regular polygons, we must introduce the
notion of a Fermat prime.

Definition 11.4.7
A Fermat prime is a prime number of the form 2` + 1.

Proposition 11.4.8
If 2` + 1 is a prime number then ` itself is a prime power, ` = 2r .

Proof. (Left as an exercise for the reader. See Exercise 11.4.1.) 

The only known Fermat primes are 3, 5, 17, 257, 65537. It is an unsolved problem if there are
any other Fermat primes, let alone an infinite number of Fermat primes. Note that
5
22 + 1 = 4294967297 = 641 × 6700417,
`
which gives a counterexample to the hypothesis that 22 + 1 is prime for all positive integers `.

Theorem 11.4.9
A regular n-gon is constructible with a compass and a straightedge if and only if n is the
product of 2m times a product of distinct Fermat primes.

Proof. Suppose that n has the prime factorization n = pa1 1 pa2 2 · · · pakk . The regular n-gon is con-
structible if and only if φ(n) is a power of 2. But

φ(n) = φ(pa1 1 )φ(pa2 2 ) · · · φ(pakk )


= (pa1 1 − pa1 1 −1 )(pa2 2 − p2a2 −1 ) · · · (pakk − pakk −1 ).

Suppose that p1 = 2 then any a1 will do because 2a1 − 2a1 −1 = 2a1 −1 . On the other hand, if pi is
an odd prime, then pai i − pai i −1 = pai i −1 (pi − 1) is a possibly a power of 2 if and only if ai = 1 and
pi = 2m + 1 for some positive integer m. Hence, pi must be a Fermat prime and can only occur at
most once in the prime decomposition of n. 

In 1796, at the age of 19, Gauss expressed cos 2π



17 using rational numbers, addition, multipli-
cation, subtraction, division, and square roots. This result showed that it is possible to construct
with a compass and a straightedge one edge of a regular 17-gon inscribed in a given circle. From
one edge, it is easy to construct the rest of the 17-gon. An explicit construction for the 17-gon was
not found until many years later in 1825 by Johannes Erchinger. At the time that Gauss proved
his result, it represented the first significant progress in the study of constructible polygons since
antiquity. He was so pleased with this particular result that, despite all his other accomplishments
in mathematics, many years later he requested that a regular heptadecagon be chiseled onto his
tombstone. (Between the technical difficulty of performing the exact construction in stone and that
the result would too closely resemble a circle, the stonemason declined.)
From a historical perspective, Theorem 11.4.9 stands as a pinnacle of mathematical achievement.
The theorem answers an ancient problem, what regular n-gons are constructible with compass and
straightedge, by giving a complete characterization. The final-final word on constructible regular
n-gons would involve determining which primes are Fermat primes, a number theory problem that
remains open to this day.
582 CHAPTER 11. GALOIS THEORY

The problem of constructibility of regular n-gons compellingly illustrates the deep interconnect-
edness of mathematics. The original problem came out of classical geometry and drew attention
from mathematicians all over the world. Theorem 11.4.9 that answers the problem almost to com-
plete satisfaction uses advanced techniques of algebra, including group theory, field theory, and the
combination of the two that results from Galois theory. Now, a complete solution appears to hide
in the realm of number theory and properties of prime numbers.

Exercises for Section 11.4


1. Prove that if 2` + 1 is a prime number then ` itself must be a power of 2.
2. Prove that complex conjugation is an automorphism σ in Gal(Q(ζn )/Q) for all n. Prove also that
Fix(Q(ζn ), hσi) = Q(ζn + ζn−1 ) = L and show that L = Q(ζn ) ∩ R.
3. Using the angle doubling formula, find a formula (possibly recursive) for 2 cos 2πn . Use this to deduce


a formula for ζ2n .


4. Finish Example 11.4.3 by giving the complete lattice of subfields of Q(ζ9 ), listing each field extension
of Q by generators.
5. Give the complete lattice of subfields of Q(ζ11 ), listing each field extension of Q as a simple extension.
Give minimal polynomials for each generator of a subfield of Q(ζ11 ).
6. Give the complete lattice of subfields of Q(ζ13 ), listing each field extension of Q as a simple extension.
Give minimal polynomials for each generator of a subfield of Q(ζ13 ).
7. Give the complete lattice of subfields of Q(ζ12 ).
8. Give the complete lattice of subfields of Q(ζ20 ).
9. Give generators of Gal(Q(ζ20 )/Q) by explicitly describing the generating automorphisms.
10. Show that for every finite abelian group G there exists some positive integer n such that G is isomorphic
to a quotient group of U (Z/nZ). Deduce that there exists a subfield K of a cyclotomic extension of
Q such that Gal(K/Q) ∼ = G.
11. Consider the primitive nth root of unity ζn and let K = Q(ζn ).
(a) Prove that if n is a power of a prime p, then NK/Q (1 − ζn ) = p.
(b) Prove that if n is divisible by at least two distinct primes, then NK/Q (1 − ζn ) = 1.
12. Working in Q(ζ7 ), find a polynomial f ∈ Q[x] such that GalQ (f (x)) = ∼ Z3 .


13. This exercise guides an explicit calculation of cos 17 in terms of radicals. Set ζ = ζ17 and observe
that cos 2π
17
= 12 (ζ + ζ 16 ).
2 4 u
(a) Show that U (Z/17Z) is generated by 3 and show that H1 = h3 i, H2 = h3 i, and H3 = h3 i are
the unique subgroups of index 2, 4, and 8 respectively.
(b) Call
X a
η1 = ζ + ζ 2 + ζ 4 + ζ 8 + ζ 9 + ζ 13 + ζ 15 + ζ 16 = ζ ,
a∈H1
3 5 6 7 10 11 12 14
η2 = ζ + ζ + ζ + ζ + ζ +ζ +ζ +ζ .

Using Figure 11.2, show that η2 < 0 < η1 . Prove also that η1 + η2 = −1 and that η1 η2 = −4.
Deduce that η1 and η2 solve the equation x2 + x − 4 = 0. Using the inequalities for η1 and η2 ,
give explicit formulas (using radicals) for both.
(c) Now call
X a
ε1 = ζ + ζ 4 + ζ 13 + ζ 16 = ζ , ε3 = ζ 3 + ζ 5 + ζ 12 + ζ 14 ,
a∈H2
2 8 9 15
ε2 = ζ + ζ + ζ + ζ , ε4 = ζ 6 + ζ 7 + ζ 10 + ζ 11 .

Using Figure 11.2, show that 0 < ε2 < ε1 . Prove also that ε1 + ε2 = η1 and that ε1 ε2 = −1.
Deduce that ε1 and ε2 solve the equation x2 − η1 x − 1 = 0. Using the inequalities for ε1 and ε2 ,
give explicit formulas (using radicals) for both.
(d) Repeat and change the previous part as needed to find explicit formulas (using radicals) for ε3
and ε4 .
11.5. SYMMETRIES AMONG ROOTS; THE DISCRIMINANT 583

ζ5 ζ4
ζ3
ζ6
ζ2
ζ7
ζ1
8
ζ

ζ9
ζ 16
10
ζ
ζ 15
ζ 11
ζ 14
ζ 12 ζ 13

Figure 11.2: Regular heptadecagon

(e) Now call


X
γ1 = ζ + ζ 16 = ζa,
a∈H3
4 13
γ2 = ζ + ζ .

Using Figure 11.2, show that 0 < γ2 < γ1 . Prove also that γ1 + γ2 = ε1 and that γ1 γ2 = ε3 .
Deduce that γ1 and γ2 solve x2 − ε1 x + ε3 = 0. Using the inequalities for ε1 and ε2 , give explicit
formulas (using radicals) for both.
(f) Conclude the exercise by using the previous part to show that
r !
√ √ √ √ √
  q q q
2π 1
cos = −1 + 17 + 2(17 − 17) + 2 17 + 3 17 − 2(17 − 17) − 2 2(17 + 17) .
17 16

14. Find an expression of cos(2π/15) as nested square roots.


15. Let p be a prime greater than 2 and write ζ = ζp . Denote by α ∈ C the number
p−1
X 2
α= ζk .
k=0

(a) Prove that Q(α) is the unique subfield of Q(ζp ) of degree 2 over Q.
(b) Prove that αα = p.
[The element α is called a Gauss sum.]

11.5
Symmetries among Roots; The Discriminant
The beginning of this chapter posed the study of symmetries among roots of polynomials as one of
the main motivations for developing Galois theory. Our approach jumped straight to the study of
automorphisms of fields and their relevance to the motivating problem. In this section, we approach
from a different direction to find the largest possible Galois group of a given polynomial.
584 CHAPTER 11. GALOIS THEORY

11.5.1 – Elementary Symmetric Polynomials

Definition 11.5.1
Let F be a field. A multivariable polynomial p(x1 , x2 , . . . , xn ) ∈ F [x1 , x2 , . . . , xn ] is called
symmetric if p(xσ(1) , xσ(2) , . . . , xσ(n) ) = p(x1 , x2 , . . . , xn ) for all σ ∈ Sn .

For example, the polynomial f (x1 , x2 , x3 ) = 2x31 + 2x32 + 2x33 − 7x1 x2 x3 is symmetric in the
variables. The polynomial g(x1 , x2 , x3 ) = x21 x2 + x22 x3 + x23 x1 is not symmetric because, if σ = (1 2)
then
g(xσ(1) , xσ(2) , xσ(3) ) = g(x2 , x1 , x3 ) = x22 x1 + x21 x3 + x23 x2 6= g(x1 , x2 , x3 ).
Definition 11.5.1 extends to rational expressions. With the concept of automorphisms at our
disposal, we observe that a permutation σ of the variables in a rational expression

p(x1 , x2 , . . . , xn )
q(x1 , x2 , . . . , xn )

is an automorphism ωσ ∈ Aut(F (x1 , x2 , . . . , xn )/F ). This gives an embedding of Sn into the auto-
morphism group Aut(F (x1 , x2 , . . . , xn )/F ). We would like to determine and to study properties of
Fix(F (x1 , x2 , . . . , xn ), Sn ).
In the ring F [x1 , x2 , . . . , xn ][x], consider the polynomial

q(x) = (x − x1 )(x − x2 ) · · · (x − xn ). (11.4)

This polynomial possesses n distinct roots, namely the indeterminates x1 , x2 , . . . , xn . Note that
if f (x) ∈ F [x] is any monic polynomial of degree n with roots α1 , α2 , . . . , αn ∈ F listed with
multiplicity, then f (x) is the image of q(x) under the evaluation homomorphism that maps each
indeterminate xi to αi . Expanding the factored expression in (11.4) gives
 
Y
n n−1
q(x) = x − (x1 + x2 + · · · + xn )x + xi xj  xn−2 + · · · + (−1)n (x1 x2 · · · xn ).
1≤i<j≤n

The coefficients of q(x), as polynomials in x1 , x2 , . . . , xn , play an important role. For the following
definition, recall that if U is a set, then Pk (U ) = {A ∈ P(U ) |A| = k}.

Definition 11.5.2
For any k ∈ {1, 2, . . . , n}, the kth elementary symmetric polynomials in x1 , x2 , . . . , xn is
X
sk (x1 , x2 , . . . , xn ) = xa1 xa2 · · · xak ,
A∈Pk ({1,2,...,n})

where each subset A of {1, 2, . . . , n} is understood as A = {a1 , a2 , . . . , ak }. We also denote


s0 (x1 , x2 , . . . , xn ) = 1.

With the elementary symmetric polynomials, the expression in (11.4) can be written as
n
X
(x − x1 )(x − x2 ) · · · (x − xn ) = (−1)k sk (x1 , x2 , . . . , xn )xn−k . (11.5)
k=0

We see that the polynomials sk (x1 , x2 , . . . , xn ) are symmetric polynomials in two different ways.
First, for all σ ∈ Sn ,

(x − xσ(1) )(x − xσ(2) ) · · · (x − xσ(n) ) = (x − x1 )(x − x2 ) · · · (x − xn ).


11.5. SYMMETRIES AMONG ROOTS; THE DISCRIMINANT 585

Hence, upon expanding the product on the left-hand side, we deduce that
sk (xσ(1) , xσ(2) , . . . , xσ(n) ) = sk (x1 , x2 , . . . , xn )
for all k and all σ ∈ Sn . For a second way to see that the sk are symmetric, consider the action of
σ on Pk ({1, 2, . . . , m}) via σ · {a1 , a2 , . . . , ak ) = {σ(a1 ), σ(a2 ), . . . , σ(ak )}. Then
X
sk (xσ(1) , xσ(2) , . . . , xσ(n) ) = xσ(a1 ) xσ(a2 ) · · · xσ(ak )
A∈Pk ({1,2,...,n})
X
= xa1 xa2 · · · xak
σ −1 ·A∈Pk ({1,2,...,n})
X
= xa1 xa2 · · · xak
A∈Pk ({1,2,...,n})

= sk (x1 , x2 , . . . , xn ).
As an explicit example, the four elementary symmetric polynomials in x1 , x2 , x3 , x4 are
s1 (x1 , x2 , x3 , x4 ) = x1 + x2 + x3 + x4 ,
s2 (x1 , x2 , x3 , x4 ) = x1 x2 + x1 x3 + x1 x4 + x2 x3 + x2 x4 + x3 x4 ,
s3 (x1 , x2 , x3 , x4 ) = x1 x2 x3 + x1 x2 x4 + x1 x3 x4 + x2 x3 x4 ,
s4 (x1 , x2 , x3 , x4 ) = x1 x2 x3 x4 .

Theorem 11.5.3
Every symmetric rational expression f (x1 , x2 , . . . , xn ) is a rational expression in the ele-
mentary symmetric polynomials as

f (x1 , x2 , . . . , xn ) = g(s1 , s2 , . . . , sn ).

Proof. Consider the subfield F (s1 , s2 , . . . , sn ) of F (x1 , x2 , . . . , xn ). The variables x1 , x2 , . . . , xn are


the roots of the polynomial
q(x) = xn − s1 xn−1 + s2 xn−2 − · · · + (−1)n sn .
Consequently, each generator xi is algebraic over F (s1 , s2 , . . . , sn ). Furthermore, F (x1 , x2 , . . . , xn )
is the splitting field of q(x) over F (s1 , s2 , . . . , sn ), so F (x1 , x2 , . . . , xn ) is a Galois extension of
F (s1 , s2 , . . . , sn ). By Theorem 7.6.3,
[F (x1 , x2 , . . . , xn ) : F (s1 , s2 , . . . , sn )] ≤ n!
since deg q(x) = n. However, for each σ ∈ Sn , the automorphism ωσ is a distinct nontrivial automor-
phism in Aut(F (x1 , x2 , . . . , xn )/F (s1 , s2 , . . . , sn )). Since the extension is Galois, then this implies
that [F (x1 , x2 , . . . , xn ) : F (s1 , s2 , . . . , sn )] ≥ |Sn | = n!. Consequently, we have equality and
Aut(F (x1 , x2 , . . . , xn )/F (s1 , s2 , . . . , sn )) ∼
= Sn .
By the Fundamental Theorem of Galois Theory, F (s1 , s2 , . . . , sn ) = Fix(F (x1 , x2 , . . . , xn ), Sn ). The
theorem follows. 
This theorem is sometimes called the Fundamental Theorem on Symmetric Functions. It is
possible to prove this theorem without the Fundamental Theorem of Galois Theory but the above
proof is much shorter than many alternative proofs.

Corollary 11.5.4
Every symmetric polynomial in F [x1 , x2 , . . . , xn ] is a polynomial in the elementary sym-
metric polynomials.
586 CHAPTER 11. GALOIS THEORY

Proof. The set of symmetric polynomials in F [x1 , x2 , . . . , xn ] is

Fix(F (x1 , x2 , . . . , xn ), Sn ) ∩ F [x1 , x2 , . . . , xn ].


1 ,x2 ,...,xn )
By Theorem 11.5.3, Fix(F (x1 , x2 , . . . , xn ), Sn ) = F (s1 , s2 , . . . , sn ). An element p(x r(x1 ,x2 ,...,xn ) in
F (x1 , x2 , . . . , xn ) is a polynomial if and only if the polynomial r(x1 , x2 , . . . , xn ) is a nonzero constant.
In particular, a rational expression in F (s1 , s2 , . . . , sn ) is a polynomial in F [x1 , x2 , . . . , xn ] if and
only if the denominator of the rational expression is a nonzero constant. The corollary follows. 

Theorem 11.5.3 along with the above corollary give an interesting application for symmetric
expressions of roots of polynomials.
Let F be a field. If a polynomial p(x) ∈ F [x] splits completely in a field extension K of F , then

p(x) = pn (x − α1 )(x − α2 ) · · · (x − αn ) (11.6)

for some α1 , α2 , . . . , αn ∈ K and where pn = LC(p(x)) is the leading coefficient of p(x). Then

p(x) = pn xn − s1 (α1 , α2 , . . . , αn )xn−1 + s2 (α1 , α2 , . . . , αn )xn−2 + · · ·


+ (−1)n−1 sn−1 (α1 , α2 , . . . , αn )x + (−1)n sn (α1 , α2 , . . . , αn ) .


In particular, the elementary symmetric polynomials applied to the roots of p(x) are in the field F .
Corollary 11.5.4 gives the following proposition.

Proposition 11.5.5
Let p(x) ∈ F [x]. Let α1 , α2 , . . . , αn be the roots of p(x) (counted with multiplicity). Then
any symmetric polynomial of the roots is in F .

Proof. Every symmetric polynomial in the roots of p(x) is of the form r(s1 , s2 , . . . , sn ), with the sk
evaluated at the roots of p(x), where r ∈ F [s1 , s2 , . . . , sn ]. Hence, the evaluation of r on the roots is
in F . 

Example 11.5.6. Consider the polynomial p(x) = x3 −3x2 +7x+5. Suppose that the three roots of
p(x) are α1 , α2 , α3 (not necessarily distinct). Without knowing the roots explicitly, we can calculate
α13 + α23 + α33 . We expand

(α1 + α2 + α3 )3 = α13 + α23 + α33 + 3α12 α2 + 3α22 α1 + 3α12 α3 + 3α32 α1


+ 3α22 α3 + 3α32 α2 + 6α1 α2 α3

and

α12 α2 + α22 α1 + α12 α3 + α32 α1 + α22 α3 + α32 α2


= (α1 α2 + α1 α2 + α2 α3 )(α1 + α2 + α3 ) − 3α1 α2 α3 .

Denoting by sk the kth elementary symmetric polynomial applied to the αi , we have

α13 + α23 + α33 = s31 − 3(s2 s1 − 3s3 ) − 6s3 = s31 − 3s2 s1 + 3s3 .

From the coefficients of p(x), we see that s1 = 3, s2 = 7 and s3 = −5. Thus, by (11.5),

α13 + α23 + α33 = 33 − 3 · 7 · 3 + 3(−5) = −51. 4

Though we study multivariable polynomial rings in more depth in Chapter 12, some terminology
is valuable here. In the polynomial ring, F [x1 , x2 , . . . , xn ], a term xa1 1 xa2 2 · · · xann is said to have
total degree a if a = a1 + a2 + · · · + an . A polynomial p(x1 , x2 , . . . , xn ) is called homogeneous of
degree k if it consists entirely of terms of total degree k. We observe that the elementary symmetric
polynomials sk (x1 , x2 , . . . , xn ) are homogeneous of degree k.
11.5. SYMMETRIES AMONG ROOTS; THE DISCRIMINANT 587

11.5.2 – Algebraic Relations among the Roots of a Polynomial


The proof of Theorem 11.5.3 contains another important result with implications for Galois groups
of polynomials. We proved that Gal(F (x1 , x2 , . . . , xn )/F (s1 , s2 , . . . , sn )) ∼
= Sn . This is the Galois
group of the polynomial q(x) in (11.4). However, this particular case is a situation in which there
are no algebraic relations among the roots of the polynomial even over the field F (s1 , s2 , . . . , sn ).
Let α1 , α2 , . . . , αn be in some extension K of a field F . We say that there exists an algebraic
relation over F between the elements α1 , α2 , . . . , αn if for some polynomial g ∈ F [x1 , x2 , . . . , xn ],

g(α1 , α2 , . . . , αn ) = 0.

Since g can have a constant term in F , we can equivalently consider algebraic relations to include
conditions of the form g(α1 , α2 , . . . , αn ) ∈ F .
If p(x) ∈ F [x] is a polynomial with deg p(x) = n, then any automorphism σ ∈ GalF (p(x))
permutes the roots of p(x). This gives the fundamental result.

Proposition 11.5.7
If p(x) ∈ F [x] is a separable polynomial with deg p(x) = n, then GalF (p(x)) ≤ Sn .

This proposition is more precise than Theorem 7.6.3, which simply put the upper bound of n!
on the index of the splitting field of p(x) over F .
Algebraic relations among the roots put conditions on the automorphisms in Gal(p(x)). For
example, suppose that p(x) is the product of two relatively prime irreducible polynomials p1 (x) and
p2 (x) of degree n1 and n2 . Then, any automorphism σ ∈ Gal(p(x)) can only permute the roots
of p1 (x) and permute the roots of p2 (x). Then Gal(p(x)) is contained in a subgroup of Sn that is
isomorphic to Sn1 ⊕ Sn2 . This observation can be restated as follows.

Proposition 11.5.8
A Galois extension K of F is such that Gal(K/F ) is a transitive subgroup of Sn if and only
if K is the splitting field of some irreducible polynomial f (x) ∈ F [x] with deg f (x) = n.

Proof. (Left as an exercise for the reader. See Exercise 11.5.1.) 

As a particular case of this, suppose that the roots of a separable polynomial p(x) satisfy the
relation that α1 ∈ F . We consider Gal(p(x)) as a subgroup of Sn , the group of bijections on
{α1 , α2 , . . . , αn }, the set of roots of p(x), with σ(αi ) = ασ(i) . The subgroup H = {σ ∈ Sn | σ(1) = 1}
of Sn fixes the element α1 . Furthermore, for all σ ∈ Sn , we have σ(α1 ) = αi if and only if σ ∈ (1 i)H.
As i runs through 1, 2, . . . , n, the subsets (1 i)H run through the cosets of H in Sn , which partition
Sn . Hence, since α1 ∈ F , then σ ∈ Gal(p(x)) if and only if Gal(p(x)) ≤ H. By observing that
H∼ = Sn−1 we can state that Gal(p(x)) is a subgroup of Sn−1 .
The following proposition gives a general principle.

Lemma 11.5.9
Let p(x) be a separable polynomial in F [x] and suppose that the roots {α1 , α2 , . . . , αn }
satisfy g(α1 , α2 , . . . , αn ) ∈ F for some multivariable polynomial g ∈ F [x1 , x2 , . . . , xn ]. In
the action of Sn on F [x1 , x2 , . . . , xn ], let H be the stabilizer of g. Suppose that

g(ασ(1) , ασ(2) , . . . , ασ(n) ) 6= g(ατ (1) , ατ (2) , . . . , ατ (n) ) whenever σH 6= τ H. (11.7)

Then GalF (p(x)) ≤ H.

Proof. The cosets of H partition Sn . Since the polynomial g(x1 , x2 , . . . , xn ) is fixed by any σ ∈ H,
then
g(α1 , α2 , . . . , αn ) = g(ασ(1) , ασ(2) , . . . , ασ(n) )
588 CHAPTER 11. GALOIS THEORY

for all σ ∈ H. The condition (11.7) that representatives from different cosets of H give different
values when applied to g(α1 , α2 , . . . , αn ) imply that σ fixes g(α1 , α2 , . . . , αn ) if and only if σ ∈ H.
Since all the automorphisms in Gal(p(x)) fix F , the condition g(α1 , α2 , . . . , αn ) ∈ F implies that
Gal(p(x)) ≤ H. 

Example 11.5.10. Suppose that a separable polynomial p(x) ∈ F [x] has deg p(x) = 4 and suppose
that the roots α1 , α2 , α3 , α4 satisfy the relation

α1 α2 + α3 α4 ∈ F.

This algebraic relation puts a condition on automorphisms σ ∈ GalF (p(x)). It is not hard to see
that the subgroup
H = h(1 3 2 4), (1 2)i ≤ S4
leaves the polynomial x1 x2 +x3 x4 fixed. Conversely, σ ∈ S4 −H, then σ ·x1 x2 +x3 x4 is either x1 x3 +
x2 x4 or x1 x4 + x2 x3 . It is an easy algebra exercise to show that x1 x2 + x3 x4 = x1 x3 + x2 x4 implies
(x1 − x4 )(x2 − x3 ) = 0. Since p(x) is separable, these three polynomials applied to α1 , α2 , α3 , α4
give distinct values. It is easy to calculate that

α1 α2 + α3 α4 if σ ∈ h(1 3 2 4), (1 2)i

σ(α1 α2 + α3 α4 ) = α1 α3 + α2 α4 if σ ∈ (2 3)h(1 3 2 4), (1 2)i

α1 α4 + α2 α3 if σ ∈ (2 4)h(1 3 2 4), (1 2)i.

Each of the above three cases is distinct, corresponding to the three cosets of h(1 3 2 4), (1 2)i. Con-
sequently, σ ∈ S4 fixes β if and only if σ ∈ h(1 3 2 4), (1 2)i. This illustrates Lemma 11.5.9 and we
deduce that Gal(p(x)) ≤ h(1 3 2 4), (1 2)i. It is easy to check that this subgroup of S4 is isomorphic
to D4 . 4

Example 11.5.11. As a nonexample, suppose that p(x) ∈ F [x] is a separable polynomial and
suppose that that the product of two of the roots (without loss of generality α1 and α2 ) is in F . Let
H = {σ ∈ Sn | σ · x1 x2 = x1 x2 }. This subgroup has order 2(n − 2)!  and is isomorphic to S2 ⊕ Sn−2 .
For various σ ∈ Sn , the product ασ(1) ασ(2) can take on up to n2 values. However, even if p(x) is
separable, there is no guarantee that αi αj 6= α1 α2 for {i, j} =
6 {1, 2}. Hence, we cannot deduce that
Gal(p(x)) ≤ H. 4

11.5.3 – The Discriminant of a Polynomial


The discriminant of a polynomial is a particular symmetric function in the roots of a polynomial
that has many important properties.
Let x1 , x2 , . . . , xn be n variables. Recall from Example 3.4.15 the Vandermonde polynomial
Y
(xi − xj ).
1≤i<j≤n

Also recall from Section 3.4.3 that σ ∈ Sn maps a pair (i, j) ∈ Tn to a pair (σ(i), σ(j)), which is
either another pair in Tn or another pair in Tn but with the entries inverted. By Definition 3.4.10,
the number of times the entries are inverted is called the number of inversions inv(σ). Hence,
Y Y Y
(xσ(i) − xσ(j) ) = (−1)inv(σ) (xi − xj ) = sign(σ) (xi − xj ).
1≤i<j≤n 1≤i<j≤n 1≤i<j≤n

Consequently, the square of the Vandermonde polynomial


Y
(xi − xj )2
1≤i<j≤n

is a symmetric polynomial, called the discriminant on n variables x1 , x2 , . . . , xn .


11.5. SYMMETRIES AMONG ROOTS; THE DISCRIMINANT 589

Definition 11.5.12
Let p(x) = pn xn + · · · + p1 x + p0 ∈ F [x] with roots α1 , α2 , . . . , αn (listed with multiplicity)
in some field extension K of F . The element in F defined by
Y
∆(p) = p2n−2
n (αi − αj )2 ,
1≤i<j≤n

is called the discriminant of the polynomial p(x).

The term pn2n−2 may seem superfluous but we will explain its value as we present more properties
of the discriminant.

Proposition 11.5.13
A polynomial p(x) ∈ F [x] has a double root if and only if ∆(p) = 0.

(αi −αj )2 = 0.
Q
Proof. If a polynomial p(x) has degree n, then pn 6= 0. Hence, ∆(p) = 0 if and only if
In turn, this is equivalent to αi = αj for some pair (i, j). 

This proposition is particularly interesting because it puts an equation on the set of polynomials
with a double root. In other words, the equation ∆ = 0, which is expressed as an equation in the
coefficients {a0 , a1 , . . . , an−1 } of the polynomial p(x) = xn + an−1 xn−1 + · · · + a1 x + a0 , gives a locus
(a “hypersurface” in Rn ) of the polynomials with double roots.

Example 11.5.14. Consider the general quadratic ax2 + bx + c. This has two roots α1 and α2 not
necessarily distinct. According to Definition 11.5.12, the discriminant is a2 (α1 − α2 )2 . However,

(α1 − α2 )2 + 4α1 α2 = α12 + 2α1 α2 + α22 = (α1 + α2 )2 .

We recover the well-known formula for the discriminant of a quadratic:


 2 !
2 2 2 2 2 b c
∆ = a (α1 − α2 ) = a ((α1 + α2 ) − 4α1 α2 ) = a − −4 = b2 − 4ac.
a a 4

Proposition 11.5.15
√ p(x) ∈ F [x] be a separable polynomial with discriminant ∆. Then ∆ ∈ F . Furthermore,
Let
∆ ∈ F if and only if GalF (p(x)) is a subgroup of An .

Proof. Since ∆ is a symmetric function in the roots of p(x), then by Proposition 11.5.5, ∆ ∈ F . Call
K the splitting field of p(x) over F .√
Up to multiplication by a unit, ∆ is pn−1n times the Vandermonde polynomial on√the roots of
p(x). The subgroup of Sn that preserves the Vandermonde polynomial is An . Since σ ∆ takes on
distinct values corresponding to distinct cosets of An , by Lemma 11.5.9, Gal(p(x)) ≤ An . 

11.5.4 – The Resultant


Let f (x) ∈ F [x]. We observed earlier that the discriminant of a polynomial is 0 if and only if
the polynomial has a double root in some field extension of F . Section 7.7 discussed the equivalent
criterion that f (x) has a double root if and only if f (x) and the derivative Dx (f (x)) are not relatively
prime. This second equivalent statement will lead to an efficient way of calculating the discriminant
of polynomials.
Let a(x) = am xm + · · · + a1 x + a0 and b(x) = bn xn + · · · + b1 x + b0 be two polynomials in F [x].
The polynomials a(x) and b(x) are not relatively prime if and only if they have a common divisor
590 CHAPTER 11. GALOIS THEORY

d(x) with deg d(x) ≥ 1. Then    


b(x) a(x)
a(x) = b(x).
d(x) d(x)
Conversely, suppose that f (x) and g(x) are nonzero polynomials satisfying deg f (x) ≤ n − 1,
deg g(x) ≤ m − 1, and f (x)a(x) = g(x)b(x). Let f (x) be a nonzero polynomial of least degree
such that f (x)a(x) = g(x)b(x) for some polynomial g(x). Since b(x)a(x) = a(x)b(x), then

(b(x) − q(x)f (x))a(x) = (a(x) − q(x)g(x))b(x)

for any q(x) ∈ F [x]. In particular, the remainder r(x) of the polynomial division of b(x) by f (x)
is such that there exists a polynomial r2 (x) satisfying r(x)a(x) = r2 (x)b(x). Either r(x) = 0 or
deg r(x) < deg f (x). However, by the minimality of the degree of f (x), we deduce that r(x) = 0
and hence that f (x) divides b(x). Then d(x) = b(x)/f (x) has degree 1 or greater. Then
 
b(x)
a(x) = g(x)b(x) =⇒ b(x)a(x) = d(x)g(x)b(x) =⇒ a(x) = g(x)d(x).
d(x)

Thus, d(x) is a common divisor of a(x) and b(x) of degree at least 1, and hence a(x) and b(x) are
not relatively prime. We have proven the following result.

Proposition 11.5.16
Two polynomials a(x) and b(x) in F [x] are not relatively prime if and only if there exist
nonzero polynomials f (x) and g(x) satisfying deg f (x) ≤ n − 1, deg g(x) ≤ m − 1, and
f (x)a(x) = g(x)b(x).

Interestingly enough, given two polynomials a(x) and b(x), the problem of finding the coefficients
of two polynomials f (x) and g(x) described in Proposition 11.5.16 is a linear algebra problem.
Writing f (x) = fn−1 xn−1 + · · · + f1 x + f0 and g(x) = gm−1 xm−1 + · · · + g1 x + g0 , the polynomial
equation f (x)a(x) − g(x)b(x) = 0 is

(fn−1 am − gm−1 bn )xm+n−1


+ (fn−1 am−1 + fn−2 am − gm−1 bn−1 − gm−2 bn )xm+n−2
+ · · · + (f0 a0 − g0 b0 ) = 0.

All the coefficients of powers of x must be 0. Consider the system of m + n equations corresponding
to the powers xm+n−1 , xm+n−2 , . . . , 1 in the m + n variables fn−1 , . . . , f1 , f0 , −gm−1 , . . . , −g1 , −g0 .
The coefficient matrix of this system is
n times m times
am bn
 
am−1 am bn−1 bn 
 .. ..
 

 . am−1 am . bn−1 bn 
 
 .. .. .. .. 
 a0

. am−1 . b0 . bn−1 .  .

(11.8)

 .. .. .. ..

. .


 a0 . am b0 . bn 

 a0 am−1 b0 bn−1 

 .. .. .. .. 
 . . . .
a0 b0

Definition 11.5.17
The resultant of two polynomials a(x) and b(x) of degree m and n respectively, written
R(a, b), is the determinant of the matrix in (11.8).
11.5. SYMMETRIES AMONG ROOTS; THE DISCRIMINANT 591

Example 11.5.18. Let a(x) = 2x2 + 3x + 1 and b(x) = 5x2 − x + 2. Then

2 0 5 0
3 2 −1 5
R(a, b) = = 120.
1 3 2 −1
0 1 0 2 4

Since the polynomial equation f (x)a(x) − g(x)b(x) = 0 leads to a homogeneous equation in the
coefficients of f (x) and g(x), then there is always the trivial solution. Furthermore, the system has
a nontrivial solution if and only if R(a, b) = 0. We have proven the following proposition.

Proposition 11.5.19
Two polynomials a(x), b(x) ∈ F [x] have a common root (possibly in a field extension of F )
if and only if R(a, b) = 0.

We now return to the study of the discriminant and apply the resultant techniques to a polynomial
and its derivative. Proposition 11.5.19 gives the following corollary.

Corollary 11.5.20
Let p(x) ∈ F [x]. The following are equivalent:
(1) p(x) is not separable;
(2) ∆(p) = 0;

(3) R(p, Dx (p)) = 0, where Dx (p) is the derivative of p(x) with respect to x.

Both the discriminant of p(x) and the resultant R(p, Dx (p)) are multivariable polynomials in
the coefficients of p(x). The last two equivalent conditions of the corollary show that, with the
assumption pn 6= 0, one of these multivariable polynomials is 0 if and only if the other one is 0. In
fact, in the exercises we will show the following more efficient way of calculating the discriminant.

Proposition 11.5.21
Let p(x) = pn xn + · · · + p1 x + p0 ∈ F [x] be a polynomial. Then
1
∆(p) = (−1)n(n−1)/2 R(p, Dx (p)). (11.9)
pn

Relationship (11.9) offers a motivation for the factor of p2n−2 n in the definition of the discrimi-
nant. From the resultant matrix (11.8) for R(p, Dx (p)), we see that when performing the Laplace
expansion to calculate the determinant, R(p, Dx (p)) is a homogeneous polynomial in the coefficients
p0 , p1 , p2 , . . . , pn . Furthermore, the top row of the matrix (11.8) for R(p, Dx (p)) has only two nonzero
entries, namely pn and npn . Then by the linearity property of the determinant, R(p, Dx (p)) is di-
visible by pn . So, the importance of the p2n−2 n factor in Definition 11.5.12 is encapsulated in the
following proposition.

Proposition 11.5.22
For a generic polynomial p(x) = pn xn + · · · + p1 x + p0 , the discriminant is a homogeneous
polynomial in the coefficients p0 , p1 , . . . , pn of degree 2n − 2.

11.5.5 – Useful CAS Commands


The following Maple commands are relevant to this section.
592 CHAPTER 11. GALOIS THEORY

Maple Function
discrim(a,x); Implements (11.9) to calculate the discriminant of the poly-
nomial a with variable x.
resultant(a,b,x); Calculates the resultant of a(x) and b(x).
convert( p, ’elsymfun’ ); Converts the symmetric polynomial into an expression in
the elementary symmetric polynomials in the relevant vari-
ables.

Exercises for Section 11.5


1. Show that a Galois extension K of F is such that Gal(K/F ) is a transitive subgroup of Sn if and only
if K is the splitting field of some irreducible polynomial f (x) ∈ F [x] with deg f (x) = n.
2. Let p(x) ∈ F [x] be a palindromic polynomial of degree 2n. Prove that | GalF (p(x))| ≤ 2n n!.
3. Prove that the Vandermonde polynomial in n variables satisfies the following identity

1 x1 x21 ··· xn−1


1
Y 1 x2 x22 ··· xn−1
2
(xi − xj ) = . .. .. .. .. .
.. . . . .
1≤i<j≤n
1 xn x2n ··· xn−1
n

4. Let α1 , α2 , α3 be the roots of x3 − 3x2 + 4x + 1 ∈ Q[x]. Find the value of α13 + α23 + α33 in Q.
5. Let α1 , α2 , α3 be the roots of 2x3 + 3x2 + 4x − 1 ∈ Q[x]. Find the value of α12 α22 + α12 α32 + α22 α32 in Q.
6. Consider the usual action of Sn on F [x1 , x2 , . . . , xn ] defined by permuting the variables according to
σ. Define Hp as the stabilizer in Sn of p, namely Hp = {σ ∈ Sn | σ · p = p}. Prove that both
X Y
τ ·p and τ ·p
τ τ

are symmetric polynomials, where both the sum and the product run over a distinct set of coset
representatives of Hp in Sn .
7. Consider the action of Sn on F [x1 , x2 , . . . , xn ] as described in the previous exercise. Set n = 4 and
consider p(x1 , x2 , x3 , x4 ) = x1 x2 + x3 x4 .
(a) Calculate the stabilizer of p and determine its isomorphism type.
(b) Deduce that Q(x1 , x2 , x3 , x4 ) = (x1 x2 + x3 x4 )(x1 x3 + x2 x4 )(x1 x4 + x2 x3 ) is a symmetric poly-
nomial.
(c) Prove that Q(x1 , x2 , x3 x4 ) = s21 s4 + s23 − 4s2 s4 .
(d) Suppose that f (x) = x4 − 3x3 + 2x + 5 ∈ Q[x]. Calculate Q(α1 , α2 , α3 , α4 ), where αi are the four
roots (possibly listed with multiplicity) of f (x).
8. This exercise finds a cyclic extension of degree 3 over Q.
1
(a) Prove that the function f : C − {0, 1} → C − {0, 1} defined by f (x) = 1−x has order 3.
(b) Suppose that the roots of a polynomial are α, f (α), f (f (α)). Find s1 , s2 , and s3 of these roots.
(c) Deduce that for all q ∈ Q, a polynomial of the form x3 − qx2 + (q − 3)x + 1 has three distinct
real roots and that the splitting field is a cyclic extension of Q of degree 3.
9. Calculate the resultant of the following pairs of polynomials:
(a) a(x) = 5x2 + 4x − 3 and b(x) = x2 + x + 3;
(b) a(x) = x2 − 3x + 2 and b(x) = 2x3 − 3x2 + 2x − 1.
10. Let p(x) be an arbitrary polynomial. Prove that R(p(x), x − α) = (−1)n p(α), where n = deg p.
11. Let p(x) = pn xn + · · · + p1 x + p0 be a polynomial in F [x]. Suppose that listed with multiplicity the
roots of p(x) in it splitting field are α1 , α2 , . . . , αn . Writing p(x) = pn (x − α1 ) · · · (x − αn ), prove that
n
Y Y
p0 (αi ) = (−1)n(n−1)/2 pn
n (αi − αj )2 .
i=1 1≤i<j≤n
11.6. COMPUTING GALOIS GROUPS OF POLYNOMIALS 593

12. Consider the polynomial ring F (x1 , x2 , . . . , xm , y1 , y2 , . . . , yn )[t], consider the polynomials

a(t) = A(t − x1 )(t − x2 ) · · · (t − xm ),


b(t) = B(t − y1 )(t − y2 ) · · · (t − yn ).

(a) Show that R(a, b) is An B m multiplied by a polynomial expression that is symmetric in the
symbols x1 , x2 , . . . , xm and symmetric in y1 , y2 , . . . , yn .
(b) Show that R(a, b) is a polynomial that is homogeneous of degree mn in the variables

x1 , x2 , . . . , xm , y1 , y2 , . . . , yn .

(c) Since R(a, b) = 0 whenever a(t) and b(t) have a common root, show that R(a, b) is divisible by
every polynomial xi − yj for 1 ≤ i ≤ m and 1 ≤ j ≤ n and deduce that

m Y
Y n
R(a, b) = An B m (xi − yj ).
i=1 j=1

(d) Show that


m
Y n
Y
R(a, b) = An b(yi ) = (−1)mn B m f (xi ).
i=1 j=1

13. Apply the previous two exercises to the situation a(x) = p(x) and b(x) = p0 (x) to deduce Proposi-
tion 11.5.21
14. Calculate the discriminant of x3 + 3x2 − 7 ∈ Q[x] using Proposition 11.5.21.
15. Calculate the discriminant of x4 + 2x + 1 ∈ Q[x] using Proposition 11.5.21.
16. Prove that the discriminant of xn + a ∈ Q[x] is (−1)n(n−1)/2 nn an−1 .
17. Prove that the discriminant of xn + cx + d is (−1)n(n−1)/2 nn dn−1 + (−1)(n−1)(n−2)/2 (n − 1)n−1 cn .
18. Use (11.9) to prove that the discriminant of the general cubic p(x) = ax3 + bx2 + cx + d is

∆(p) = −27a2 d2 + 18abcd − 4ac3 − 4b3 d + b2 c2 .

19. Let p be an odd prime. Use Exercises 11.4.11 and 11.5.16 to prove that the discriminant of the
cyclotomic polynomial Φp (x) is ∆(Φp ) = (−1)(p−1)/2 pp−2 .
n
Y i
20. Let p(x) = pn xn + · · · + p1 x + p0 be a generic polynomial. Prove that if a term C pkk appears in
k=0
the discriminant ∆(p), then the powers ik satisfy both of the following conditions:

n
X n
X
ik = 2n − 2 and kik = n(n − 1).
k=0 k=0

[Hint: Consider homogeneity in the coefficients of p(x) and homogeneity in the roots of p(x).]

11.6
Computing Galois Groups of Polynomials
We are now in a position to start a systemic study of the Galois groups of polynomials. Without
loss of generality, in this section we work with monic polynomials.
594 CHAPTER 11. GALOIS THEORY

11.6.1 – Reducible Polynomials


Suppose that a polynomial p(x) ∈ F [x] is reducible with p(x) = p1 (x)p2 (x). Let K1 be the splitting
field of p1 (x) over F and let K2 be the splitting field of p2 (x) over F . By the propositions in
Section 11.3.1, K1 K2 and K1 ∩ K2 are Galois extensions over F .
Since K1 ∩ K2 is a extension of F inside K1 K2 , then Gal(K1 K2 /K1 ∩ K2 ) E Gal(K1 K2 /F ) and

Gal(K1 K2 /F )/ Gal(K1 K2 /K1 ∩ K2 ) ∼


= Gal(K1 ∩ K2 /F ).

By Proposition 11.3.2,

Gal(K1 K2 /K1 ∩ K2 ) = Gal(K1 /K1 ∩ K2 ) ⊕ Gal(K2 /K1 ∩ K2 ).

Therefore, Gal(K1 K2 /F ) is a group that contains Gal(K1 /K1 ∩ K2 ) ⊕ Gal(K2 /K1 ∩ K2 ) as a normal
subgroup such that the quotient group thereof is Gal(K1 ∩ K2 /F ).
Example 11.6.1. As a relatively simple example, consider√ the polynomial√ p(x) = (x3 − 2)(x3 − 3).
3 3
The splitting field of p(x) is K1 K2 , where K1 = Q( 2, ζ3 ) and K2 = Q( 3, ζ3 ). It is not hard to
determine that K1 is not a subfield of K2 and vice versa. Consequently, by field degree considerations,
we deduce that √ K1 ∩ K√ ∼
2 = Q(ζ3 ). Now Gal(K1 /K1 ∩ K2 ) = Z3 , generated by the automorphism σ
that satisfies σ( √2) = √2ζ3 . Similarly, Gal(K2 /K1 ∩ K2 ) ∼
3 3
= Z3 , generated by the automorphism τ
that satisfies τ ( 3 3) = 3 3ζ3 . Finally, Gal(K1 ∩ K2 /Q) ∼
= Z2 generated by complex conjugation ρ,
which satisfies ρ(ζ3 ) = ζ3 = 1/ζ3 . We can give a presentation of Gal(K1 K2 /Q) as

Gal(K1 K2 /Q) ∼
= hρ, σ, τ | ρ2 = σ 3 = τ 3 = 1, στ = τ σ, ρσ = σ −1 ρ, ρτ = τ −1 ρi. 4

11.6.2 – Cubic Polynomials


Let F be a field of characteristic char F 6= 2 and let f (x) = x3 + ax2 + bx + c be an irreducible cubic
polynomial in F [x]. Call K the splitting field of f (x). By Proposition 11.5.8, Gal(f (x)) is a transitive
subgroup of S3 . There are only two options: A3 ∼ = Z3 and S3 . Either way, Gal(f (x)) contains a
3-cycle. By Proposition 11.5.15, if the discriminant ∆(f ) is a square inp p ≤ A3 , and
F , then Gal(f (x))
so must be isomorphic
p to A 3 . By the
p definition of the discriminant, ∆(f ) ∈ K. If ∆(f ) ∈
/ F,
then F ( F ( ∆(f )) ( K. Since [F ( ∆(f )) : F ] = 2, then 2 divides | Gal(f (x))| as well. We have
proved the following proposition.

Proposition 11.6.2
Let f (x) ∈ F [x] be an irreducible cubic, where char F 6= 2. Two mutually exclusive cases
occur:
(1) If ∆(f ) ∈ F , then Gal(f (x)) ∼
p
= Z3 .

/ F , then Gal(f (x)) ∼


p
(2) If ∆(f ) ∈ = S3 .

Example 11.6.3. Consider the polynomial f (x) = x3 −2x2 +x−1. Using the strategy in Section 7.3,
replace x with y + 32 , so f (y + 23 ) = y 3 − 13 y − 27
25
. The discriminant of the polynomial is
 2  3
25 1 621
∆ = −27 − −4 − =− = −23.
27 3 27

Since the discriminant is not a square in Q, then Gal(f (x)) ∼ = S3 . 4


p
The condition ∆(p) ∈ F is equivalent to | Gal(f (x))| = 3, which means that the splitting field
of f (x) over F is F (β), where β is one of the roots of f (x). This implies the following corollary to
Proposition 11.6.2.
11.6. COMPUTING GALOIS GROUPS OF POLYNOMIALS 595

Corollary 11.6.4
Let f (x) be an irreducible cubic in F [x] and let β be a root of f (x) in the splitting field of
f (x) over F . Then f (x) splits completely in F (β) if and only if ∆(f ) is a square in F .

11.6.3 – Quartic Polynomials


Let f (x) = x4 + ax3 + bx2 + cx + d be a quartic polynomial in a field F of characteristic char F 6= 2.
Section 7.3.2 presented Ferrari’s method to solve the quartic equation. To determine the Galois
group of f (x), it is useful to refer to various steps in Ferrari’s strategy.
By a translation of a variable x = y − a4 , the polynomial becomes f (x) = g(y) = y 4 + py 2 + qy + r.
Recall the resolvent cubic θf (t) = t3 − pt2 − 4rt + (4rp − q 2 ) that arose in Ferrari’s method to solve
the quartic. Solving the resolvent equation θf (t) = 0 leads to the ability to express g(y) = 0 as the
product of two quadratic equations.
Let α1 , α2 , α3 , α4 be the four roots of g(y), possibly listed with multiplicity. Define
β1 = α1 α2 + α3 α4 ,
β2 = α1 α3 + α2 α4 ,
β3 = α1 α4 + α2 α3 .

Proposition 11.6.5
The elements β1 , β2 , β3 are the roots of the resolvent cubic θf (t).

Proof. Consider the polynomial h(t) = (t − β1 )(t − β2 )(t − β3 ). Expanding, gives


h(t) = t3 − (β1 + β2 + β3 )t2 + (β1 β2 + β1 β3 + β2 β3 )t − β1 β2 β3 .
Call si the ith elementary symmetric polynomial in the αj elements. We calculate the elementary
symmetric polynomials in β1 , β2 , β3 . First β1 + β2 + β3 = s2 .
β1 β2 + β1 β3 + β2 β3
= α12 α2 α3 + α1 α22 α4 + α1 α32 α4 + α2 α3 α42 + α12 α2 α4 + α1 α22 α3 + α1 α3 α42 + α2 α32 α4
+ α12 α3 α4 + α1 α2 α32 + α1 α2 α42 + α22 α3 α4
= s1 s3 − 4s4 .
By Exercise 11.5.7, β1 β2 β3 = s21 s4 + s23 − 4s2 s4 . From the g(y) polynomial, we see that s1 = 0,
s2 = p, s3 = −q, and s4 = r and thus h(t) = t3 − pt2 − 4rt + (4pr − q 2 ), which is precisely the
resolvent cubic θf (t). 

Proposition 11.6.6
If θf (t) is the resolvent cubic of a quartic polynomial f , then ∆(θf ) = ∆(f ). In particular,
θf (t) is separable if and only if f (x) is separable.

Proof. Since all the roots of g(y) differ from the roots of f (x) by a fixed constant, then ∆(f ) = ∆(g).
Now
∆(θf ) = (β1 − β2 )(β1 − β3 )(β2 − β3 ).
We also have
β1 − β2 = α1 α2 + α3 α4 − α1 α3 − α2 α4 = (α1 − α4 )(α2 − α3 ),
β1 − β3 = α1 α2 + α3 α4 − α1 α4 − α2 α3 = (α1 − α3 )(α2 − α4 ),
β2 − β3 = α1 α3 + α2 α4 − α1 α4 − α2 α3 = (α1 − α2 )(α3 − α4 ).
596 CHAPTER 11. GALOIS THEORY

We deduce that Y Y
(βi − βj )2 = (αi − αj )2
1≤i<j≤3 1≤i<j≤4

and the proposition follows. The concluding remark follows from Corollary 11.5.20. 

The Galois group Gal(f (x)) is a subgroup of S4 . A renumbering of the roots of f (x) corresponds
to conjugation by the permutation that defines the renumbering. The labeling of the roots is not
intrinsically important. Consequently, we do not care as much about the specific subgroup of S4 that
is equal to Gal(f (x)) as we care about the isomorphism type of Gal(f (x)). Indeed, two conjugate
subgroups in a group are isomorphic. So we only need to compute Gal(f (x)) up to conjugacy in S4 .

Theorem 11.6.7
Let F be a field with char F 6= 2 and let f (x) = x4 + ax3 + bx2 + cx + d ∈ F [x] be a monic
irreducible quartic polynomial.

(1) If θf (t) is irreducible over F , then

/ F , then Gal(f (x)) ∼


p
(a) if ∆(f ) ∈ = S4 ;
(b) if ∆(f ) ∈ F , then Gal(f (x)) ∼
p
= A4 .
(2) If θf (t) has a unique root β in F , then
p
(a) p if 4β + a2 − 4b 6= 0 and ∆(f )(4β + a2 − 4b) ∈ / F or if 4β + a2 − 4b = 0 and
/ F , then Gal(f (x)) ∼
∆(f )(β 2 − 4d) ∈ = D4 ;
(b) otherwise, Gal(f (x)) ∼ = Z4 .
(3) If θf (t) splits completely over F , then Gal(f (x)) ∼
= Z2 ⊕ Z2 .

Proof. We know that since f (x) is a quartic we will view Gal(f (x)) as a subgroup of S4 , where S4
acts on the set of roots of f (x). Call K the splitting field of f (x) over F .
For part (1), let α1 be a root of f (x) and, by Proposition 11.6.5, let β1 be a root of θf (t).
Obviously, α1 and β1 are in K. Since f (x) is irreducible, then [F (α1 ) : F ] = 4. Since θf (t) is
irreducible, then [F (β1 ) : F ] = 3. Thus, [K : F ] is divisible by both 3 and 4. Hence, [K : F ] is
divisible by 12. Now A4 is the only subgroup of S4 of index 2. (If there were another subgroup H
of index 2, then A4 ∩ H would be a subgroup of A4 of order 6. In Example 3.6.7, the lattice of A4
shows there is no such subgroup.) Part (1) follows by Proposition 11.5.15.
For parts (2) and (3), suppose that θf (t) has a root β in F . Without loss of generality (by
relabeling the roots of f (x) if necessary), suppose that β = β1 = α1 α2 + α3 α4 . In Example 11.5.10,
we saw that the condition β ∈ F implies that Gal(f (x)) ≤ h(1 3 2 4), (1 2)i. Since f (x) is irreducible,
we know that 4 = [F (α1 ) : F ] divides | Gal(f (x))| = [K : F ]. Thus, Gal(f (x)) is a subgroup of
h(1 3 2 4), (1 2)i of order 4 or 8. It is not hard to show that the possibilities are

h(1 2), (3 4)i, h(1 2)(3 4), (1 3)(2 4)i, h(1 3 2 4)i, h(1 3 2 4), (1 2)i. (11.10)

However, by Proposition 11.5.8, Gal(f (x)) must be a transitive subgroup of S4 . This rules out the
first of the four possibilities, leaving only the latter three.
We deal first with case (3). By Corollary 11.6.4, the cubic θf splits completely over F if and
only if ∆(f ) = ∆(θf ) is a square in F . Since the discriminant of f is a square, then Gal(f (x)) ≤ A4 .
By Proposition 11.5.15, Gal(f (x)) ≤ A4 . The only subgroup listed in (11.10) that is in A4 is
h(1 2)(3 4), (1 3)(2 4)i, which is isomorphic to Z2 ⊕ Z2 . p
Finally, we address part (2). By Corollary 11.6.4, this time we deduce that ∆(f ) ∈/ F,
which also implies by Proposition 11.5.15, that Gal(f (x))  A4 . Consequently, Gal(f (x)) is ei-
ther h(1 3 2 4), (1 2)i, or h(1 3 2 4)i. We proceed to distinguish between these two cases.
11.6. COMPUTING GALOIS GROUPS OF POLYNOMIALS 597

First, we observe that

(α1 + α2 − α3 − α4 )2 = (α1 + α2 + α3 + α4 )2 − 4(α1 α3 + α1 α4 + α2 α3 + α2 α4 )


= σ1 (αi )2 − 4(σ2 (αi ) − β)
= 4β + a2 − 4b.
p p
Then ∆(f )(4β + a2 − 4b) = ∆(f )(α1 + α2 − α3 − α4 ), so is an p element ofpthe splitting field K.
Suppose that 4β + a2 − 4b 6= 0. Since (1 3 2 4) is odd, (1 3 2 4) · ∆(f ) = − ∆(f ). Furthermore,

(1 3 2 4) · (α1 + α2 − α3 − α4 ) = −(α1 + α2 − α3 − α4 ).
p p p
Thus, (1 3 2 4)· ∆(f )(4β + a2 − 4b) = ∆(f )(4β + a2 − 4b) and therefore, ∆(f )(4β + a2 − 4b) ∈
F . As for the transposition (1 2), since it is odd, it is easy to see that
p p
(1 2) · ∆(f )(α1 + α2 − α3 − α4 ) = − ∆(f )(α1 + α2 − α3 − α4 ).

Since char F 6= 2, the permutation (1 2) acts nontrivially. Thus,


p under the assumption that 4β +
a2 − 4b 6= 0, we conclude that (1 2) ∈ Gal(f (x)) if and only if ∆(f )(4β + a2 − 4b) ∈
/ F.
2
p Suppose now that 4β + a − 4b = 0. The above argument no longer holds since we cannot use
∆(f )(4β + a2 − 4b) to determine whether some permutations of the roots are in Gal(f (x)) or not.
We claim that β 2 − 4d 6= 0. Note that

β 2 − 4d = (α1 α2 + α3 α4 )2 − 4α1 α2 α3 α4 = (α1 α2 − α3 α4 )2 .

Assume both of the following relations hold simultaneously among the roots
(
α1 + α2 − α3 − α4 = 0
α1 α2 − α3 α4 = 0.

Then α1 = α3 + α4 − α2 , so α1 α2 − α3 α4 = 0 becomes (α3 + α4 − α2 )α2 − α3 α4 = 0, which


factors into (α3 − α2 )(α2 − α4 ) = 0. This is a contradiction since f (x) is separable. Consequently,
4β + a2 − 4b = 0 implies that α1 + α2 − α3 − α4 = 0, which in turn implies that α1 α2 − α3 α4 6= 0,
so β 2 − 4d 6= 0.
Similarly to the previous case, we observe that
p p p
(1 3 2 4) · ∆(f )(β 2 − 4d) = (1 3 2 4) · ∆(f )(α1 α2 − α3 α4 ) = ∆(f )(β 2 − 4d).

However,
p p p
(1 2) · ∆(f )(β 2 − 4d) = (1 2) · ∆(f )(α1 α2 − α3 α4 ) = − ∆(f )(β 2 − 4d).
p
So (1 2) ∈ Gal(f (x)) if and only if Gal(f (x)) = h(1 3 2 4), (1 2)i if and only if ∆(f )(β 2 − 4d) ∈
/ F.
This covers all the cases and completes the proof. 

Example 11.6.8. Consider the polynomial f (x) = x4 + 2x3 + 2x2 + 2 in Q[x]. By Eisenstein’s
Criterion, this polynomial is irreducible. First we effect a shift x = y − 21 and get
 
1 1 37
g(y) = f y − = y4 + y2 − y + .
2 2 16
Proceeding according to Ferrari’s method, the resolvent of g(y) is
1 37 29
θf (t) = t3 − t2 − t + .
2 4 8
In Theorem 11.6.7, the first thing we need to test is whether θf (t) is irreducible. Since it is a cubic,
we simply need to test whether it has a root in Q. By the Rational Root Theorem, the only possible
rational roots are of the form
divisor of 29
± .
divisor of 8
598 CHAPTER 11. GALOIS THEORY

This gives us 16 possibilities. Testing them all shows that none of these possibilities is a root. As a
cubic, since θf (t) has no root in Q, it is irreducible. We are in part (1) of the Theorem 11.6.7. Using
the general formula for the discriminant of the cubic calculated in Exercise 11.5.18, we calculate
that
∆(f ) = ∆(θf ) = 3136 = 562 .
Hence, ∆(f ) ∈ Q so by Theorem 11.6.7, Gal(f (x)) ∼
p
= A4 . 4

11.6.4 – The Polynomial xp − a ∈ Q[x]


We propose to study the Galois groups for polynomials of the form f (x) = xp − a ∈ Q[x] where p
is an odd√prime and a is not a pth power in Q. These polynomials are the minimal polynomials for
radicals p a and in particular they are irreducible.

We know that the roots of f (x) √ are p aζpk with √
k = 0, 1, . . . , p − 1. Consequently, the splitting
field of xp − a over Q is K = Q( p a, ζp ). Now [Q( p a) : Q] = p while [Q(ζp ) : Q] = p − 1. Since
gcd(p, p − 1) = 1, then by Theorem 7.2.13, [K : Q] = p(p − 1).
The Galois group Gal(K/Q) is generated by how automorphisms can act on the generators of K
over Q. So Gal(f (x)) is generated by
(√ √ (√ √
p
a 7→ p aζp p
a 7→ p a
σ: and τ :
ζp 7→ ζp ζp 7→ ζpr

where r is a primitive root modulo p, i.e., a generator of U (Fp ). It is not too hard to check that this
group has the presentation

Gal(f (x)) = hσ, τ | σ p = τ p−1 = 1, τ στ −1 = σ r i.

More precisely, the Galois group of xp − a is the holomorph (see Exercise 9.3.17) of Zp . Equivalent
notations are
Gal(f (x)) = Fp o U (Fp ) = Zp o Aut(Zp ) = Hol(Zp ).

11.6.5 – Kronecker Analysis (Optional)


We now explore a method designed by Kronecker to compute the Galois groups of polynomials from
another perspective. This approach is not particularly efficient but it is amenable to techniques
using computer algebra systems for separable polynomials of low degree. Furthermore, it provides
a simple proof for Dedekind’s Theorem, which we present in the following section.
Let F be any field and let f (x) ∈ F [x] be a separable polynomial of degree n. Let α1 , α2 , . . . , αn
be the roots of f (x) in the splitting field E. Using new variables u1 , u2 , . . . , un , y, define the Galois
resolvent of f as
Y
su (y) = (y − (u1 ασ(1) + u2 ασ(2) + · · · + un ασ(n) )) ∈ E[u1 , u2 , . . . , un , y].
σ∈Sn

Since the roots of f (x) are distinct, then su (y) is also separable.
We consider the action of two different groups on E[u1 , u2 , . . . , un , y], the automorphism group
Gal(E/F ), which acts on the roots α1 , α2 , . . . , αn , and the symmetric group Sn acting by permuting
the variable {u1 , u2 , . . . , un }. We investigate how these two actions interact.
Let ρ ∈ G = Gal(E/F ). Then ρ permutes the roots of f (x) so that ρ(αi ) = αψ(ρ)(i) , where
ψ : G → Sn is a homomorphic embedding of G into Sn . Thus,
Y 
ρ · su (y) = y − (u1 ρ(ασ(1) ) + u2 ρ(ασ(2) ) + · · · + un ρ(ασ(n) ))
σ∈Sn
Y
= (y − (u1 αψ(ρ)(σ(1)) + u2 αψ(ρ)(σ(2)) + · · · + un αψ(ρ)(σ(n)) )).
σ∈Sn
11.6. COMPUTING GALOIS GROUPS OF POLYNOMIALS 599

Since the product defining su (y) runs over all permutations in Sn , we have the equality {ψ(ρ)σ | σ ∈
Sn } = Sn for all ρ ∈ Sn . Hence, ρ · su (y) = su (y) and so all the coefficients of su (y) are fixed by
every ρ ∈ Gal(E/F ) and hence are in F . Thus, su (y) ∈ F [u1 , u2 , . . . , un , y].
Now consider the Sn action. Any τ ∈ Sn , on the y-linear terms of su (y) as

τ · (u1 ασ(1) + u2 ασ(2) + · · · + un ασ(n) ) = uτ (1) ασ(1) + uτ (2) ασ(2) + · · · + uτ (n) ασ(n)
= u1 ασ(τ −1 (1)) + u2 ασ(τ −1 (2)) + · · · + un ασ(τ −1 (n)) .

Hence, τ also permutes the factors y-linear factors of su (y).


By Corollary 6.5.6, the polynomial ring F [u1 , u2 , . . . , un , y] = F [u1 , u2 , . . . , un ][y] is a UFD, and
similarly when the field of coefficients is E. Consequently, su (y) factors into irreducible polyno-
mials in F [u1 , u2 , . . . , un ][y]. Let h(y) be any irreducible factor of su (y) in F [u1 , u2 , . . . , un ][y]. In
E[u1 , u2 , . . . , un ][y], h(y) is a product of the form
Y
h(y) = (y − (u1 ασ(1) + u2 ασ(2) + · · · + un ασ(n) )),
σ∈H

where H is a subset of Sn . Any τ ∈ Sn acts on h(y) by


Y
τ · h(y) = (y − (uτ (1) ασ(1) + uτ (2) ασ(2) + · · · + uτ (n) ασ(n) ))
σ∈H
Y
= (y − (u1 αστ −1 (1) + u2 αστ −1 (2) + · · · + un αστ −1 (n) )).
σ∈H

The new polynomial τ · h(y) is also irreducible in F [u1 , u2 , . . . , un ][y] because if it were not then
h(y) = τ −1 · (τ · h(y)) would be reducible. However, τ · h(y) consists of a product of linear terms
that appear in the product for su (y), so τ · h(y) is an irreducible factor of su (y). Since Sn acts
transitively on the linear terms of su (y), we have proven the following useful lemma.

Lemma 11.6.9
The orbit of h(y) under this action of Sn on F [u1 , u2 , . . . , un ][y] consists precisely of the
irreducible factors of su (y).

Galois resolvents in Kronecker’s analysis gives the following formula for the Galois group of a
polynomial.

Theorem 11.6.10
Let f (x) ∈ F [x] be monic and separable of degree n. Let su (y) and h(y) be as above. Then
GalF (f (x)) is isomorphic to the stabilizer Gh of h(y) under the action of Sn permuting
u1 , u2 , . . . , un . In other words,

G = GalF (f (x)) ∼
= {τ ∈ Sn | τ · h(y) = h(y)} = Gh .

Proof. Let h(y) be any irreducible factor of su (y) and let ω ∈ Sn be a permutation such that

y − (u1 αω(1) + u2 αω(2) + · · · + un αω(n) ) (11.11)

is a linear factor of h(y) in E[u1 , u2 , . . . , un , y]. Consequently,


Y
h(y) = (y − (u1 ασω(1) + u2 ασω(2) + · · · + un ασω(n) )), (11.12)
σ∈H

where H is a subset of Sn that contains 1. Suppose that τ ∈ ω −1 Hω, say with τ = ω −1 σ0 ω. Then
by the above discussion, σ0 · h(y) is another irreducible factor of su (y). However, because σ0 ∈ H,
the y-linear term indexed by σ0 ω is a factor of h(y) in E[u1 , u2 , . . . , un , y], whereas the y-linear term
600 CHAPTER 11. GALOIS THEORY

indexed by σ0 ωτ −1 = σ0 ω(ω −1 σ0 ω)−1 = ω is a factor of τ · h(y). However, the y-linear term index
by ω is a factor of h(y). Hence, τ · h(y) = h(y). This shows that ω −1 Hω ⊆ Gh .
Conversely, suppose that τ ∈ Gh . Then h(y) is a product of the y-linear term indexed by σωτ −1
with σ ∈ H. However, one of these these y-linear factors corresponds to the permutation ω. Hence,
there exists σ0 ∈ H such that σ0 ωτ −1 = ω. Then τ = ω −1 σ0 ω, which shows that Gh ⊆ ω −1 Hω.
Thus, H = ωGh ω −1 . In particular, H is a subgroup of Sn , not merely a subset.
Now consider the polynomial
Y
h̃(y) = (y − (u1 ρ(αω(1) ) + u2 ρ(αω(2) ) + · · · + un ρ(αω(n) )))
ρ∈G
Y (11.13)
= (y − (u1 αψ(ρ)(ω(1)) + u2 αψ(ρ)(ω(2)) + · · · + un αψ(ρ)(ω(n)) )).
ρ∈G

The action of G on h̃(y) simply permutes the linear factors in the product expression so h̃(y) is fixed
by the Galois group G and hence h̃(y) ∈ F [u1 , u2 , . . . , un , y]. The term in (11.11) corresponds to
ρ = 1 ∈ G so this term divides both h(y) and h̃(y) in E[u1 , u2 , . . . , un , y]. Since ρ · h(y) = h(y), then
all the linear factors of h̃(y) divide h(y) so h̃(y) divides h(y) in E[u1 , u2 , . . . , un ][y]. This means that
h(y) = h̃(y)q(y) in E[u1 , u2 , . . . , un , y]. However, for all ρ ∈ G,

ρ · h(y) = (ρ · h̃(y))(ρ · q(y)) =⇒ h(y) = h̃(y)(ρ · q(y)).

Thus, ρ · q(y) = q(y) for all ρ ∈ G, and hence q(y) ∈ F [u1 , u2 , . . . , un ][y], which means that h̃(y)
divides h(y) in F [u1 , u2 , . . . , un ][y]. Since h(y) is irreducible, we deduce that h(y) = h̃(y).
Identifying (11.13) with (11.12) we deduce that ψ(G) = H = ωGh ω −1 . Since ψ(G) is conjugate
to Gh in Sn , then ψ(G) is isomorphic to Gh . 

Example 11.6.11. Consider the polynomial f (x) = x3 + 2x2 − x − 1 ∈ Q[x]. By the Rational Root
Theorem, f (x) has no roots and since it is a cubic f (x) is irreducible. Let s1 , s2 , and s3 be the
elementary symmetric functions evaluated on the roots α1 , α2 , α3 of f (x). Numerically s1 = −2,
s2 = −1, and s3 = 1.
Calculating the Galois resolvent su (y) by hand is particularly onerous. Expanding it as a poly-
nomial expression in y, u1 , u2 , u3 , α1 , α2 , α3 produces a multivariable polynomial with 4096 terms
before collecting like terms. However, computer algebra systems can simplify the work. In Chap-
ter 12, we discuss powerful computation techniques to work with ideals in multivariable polynomial
rings. Furthermore, computer algebra systems combine a number of algorithms to be able to factor
multivariable polynomials. Using these techniques, we can show that

su (y) = (y 3 + (2u1 + 2u2 + 2u3 )y 2 + (5u1 u2 + 5u1 u3 + 5u2 u3 − u21 − u22 − u23 )y
+ (8u1 u2 u3 − u31 − u32 − u33 − 3u1 u22 − 3u2 u23 − 3u3 u21 + 4u21 u2 + 4u22 u3 + 4u23 u1 ))
× (y 3 + (2u1 + 2u2 + 2u3 )y 2 + (5u1 u2 + 5u1 u3 + 5u2 u3 − u21 − u22 − u23 )y
+ (8u1 u2 u3 − u31 − u32 − u33 + 4u1 u22 + 4u2 u23 + 4u3 u21 − 3u21 u2 − 3u22 u3 − 3u23 u1 )).

Choose h(y) as the first factor in the above product. The coefficients for y 2 and y are symmetric
in u1 , u2 , u3 . However, the last coefficient is not. Instead, it is stabilized by the cyclic subgroup
h(1 2 3)i. By Theorem 11.6.10, GalQ (f (x)) = Z3 = A3 . 4

Obviously, it is easier to find this result by calculating the discriminant ∆(f ) = 49, which implies
the same result by Proposition 11.6.2. However, the method of Kronecker analysis lends it better to
algorithmic methods to compute the Galois group of a polynomial. In fact, some computer algebra
systems implement this method and can calculate the Galois group of polynomials up to degree 9
or more. (In Maple, the command is simply galois. See the help files on how to use it.)
11.6. COMPUTING GALOIS GROUPS OF POLYNOMIALS 601

Exercises for Section 11.6


1. Determine the Galois group of x3 − 3x + 1 in Q[x].
2. Determine the Galois group of x3 + x + 5 in Q[x].
3. Determine the Galois group of x3 + x2 − 2x + 1 in Q[x].
4. Determine the Galois group of x3 − x − 1 in a) Q[x]; b) F13 [x]; c) F23 [x].
5. Determine the Galois group of x4 − 49 over Q.
6. Determine the Galois group of x4 + 2 over Q.
7. Determine the Galois group of x4 + 1 over Q.
8. Determine the Galois group of x4 + 2x2 − x + 2 over Q.
9. Determine the Galois group of x4 + 8x + 12 over Q.
10. Determine the Galois group of x4 + 5x3 + 10x2 + 10x + 5 over Q.
11. Determine the Galois group of x8 − 2 over Q.
12. Prove that the polynomial x4 − px2 + q is irreducible for p and q odd primes. Prove that the Galois
group of any such polynomial is D4 .
13. Consider the polynomial f (x) = x4 + x + b. Prove that the Galois group of f (x) over Q is S4 or A4
and the latter occurs if and only if 256b3 − 27 is a square in Q.
14. Determine the Galois group of the palindromic polynomial x4 + ax3 + bx2 + ax + 1 for various values
of a and b.
15. Prove that there is no polynomial f (x) ∈ Q[x] with deg f (x) < 8 such that Gal(f (x)) is isomorphic to
Q8 . [Hint: Determine the isomorphism type of the Sylow 2-subgroups in S6 .]
16. Determine the Galois group of x5 − 2x3 − 2x2 + 4 over Q.
17. Prove that there are five nonisomorphic transitive subgroups of S5 . (These are isomorphic to Z5 ,
D5 , Hol(Z5 ), A5 , and S5 .) Deduce that irreducible quintic polynomials must have a Galois group
isomorphic to one of these. Show that A5 does not contain S5 or Hol(Z5 ) and conclude that an
irreducible quintic f (x) ∈ F [x] with a discriminant that is a square in F has a Galois group that is
isomorphic to A5 , D5 , or Z5 .
18. Let f (x) ∈ Q[x] be an irreducible quintic polynomial that has exactly 3 real roots. Let K be the
splitting field of f (x) over Q.
(a) Show that complex conjugation is an automorphism in Gal(K/Q).
(b) Use Exercise 3.5.42 to conclude that GalQ (f (x)) ∼
= S5 .
19. Consider the polynomial f (x) = x6 − 2ax3 + b in Q[x] and let K be the splitting field of f (x) over Q.
q √
(a) Show that the six roots of this equation are ζ3j 3 a + (−1)i a2 − b with i = 0, 1 and j = 0, 1, 2.
√ p
3
√ p
3
√ √ √
(b) Observe 3 b = a + a2 − b a − a2√− b, √−3, and√ a2 − b are in K and deduce that K is
an extension of degree 1 or 3 of L = Q( −3, a2 − b, 3 b).
(c) Show that regardless of a and b, the field L is a Galois extension of Q.
(d) Consider the element θ ∈ K defined by
p3
√ p
3

a + a2 − b a − a2 − b
θ= p 3
√ + p
3
√ .
a − a2 − b a + a2 − b
 2 
Prove that θ is a root of the polynomial g(y) = y 3 − 3y + 4ab − 2 .
2
(e) Show that the discriminant of g(y) is ∆ = 12a b
3(b − a2 ).
2 3
(f) Prove that if b − a 6= −1, then K = Gal(x − b) ⊕ Gal(g(y)).
20. Use Exercise 11.6.19 to prove that the Galois group of x6 − 6x3 + 7 is isomorphic to S3 ⊕ S3 .
21. Use Exercise 11.6.19 to prove that the Galois group of x6 − 10x3 + 8 is isomorphic to D6 .
22. Use Exercise 11.6.19 to prove that the Galois group of x6 − 6x3 + 8 is isomorphic to S3 .
602 CHAPTER 11. GALOIS THEORY

11.7
Fields of Finite Characteristic
11.7.1 – Galois Groups of Finite Fields
The characterization of finite fields given in Section 7.7.2 allows us to calculate the group of auto-
morphisms of finite extensions of finite fields. In fact, these turn out to be particularly easy.
Recall that if F is a finite field, |F | = pn for some prime and some positive integer n. Such a
field has Fp as its prime subfield with [F : Fp ] = n. Furthermore, for each prime p and each positive
integer n, there exists a unique (up to isomorphism) field F of order pn , denoted Fpn , namely the
n
spitting field of xp − x ∈ Fp [x]. Consequently, F is a Galois extension of Fp .
If f (x) ∈ Fp [x] is an irreducible polynomial of degree n, then Fp [x]/(f (x)) is an extension of Fp
of degree n. By the uniqueness of finite fields of a given order, Fp [x]/(f (x)) ∼
= Fpn . We deduce that
n
f (x) splits completely in Fpn and f (x) must divide xp − x.
The Frobenius automorphism σp : Fpn → Fpn defined by σp (α) = αp fixes Fp . Therefore,
σp ∈ Gal(Fpn /Fp ). However, a stronger result holds.

Proposition 11.7.1
The Galois group Gal(Fpn /Fp ) is the cyclic group of order n and generated by the Frobenius
automorphism σp .

Proof. We know that | Gal(Fpn /Fp )| = [Fpn : Fp ] = n so the order of σp divides n. Assume that
σpd = id for some strict divisor d of n. Then for all α ∈ Fpn ,
d d
σpd (α) = α ⇐⇒ αp = α ⇐⇒ αp − α = 0.

This condition would mean that every α ∈ Fpd but this is a strict subfield of Fpn so we have arrived
at a contradiction. Consequently, we conclude that σp has order n and thus Gal(Fpn /Fp ) is generated
by σp . The theorem follows. 

Since Gal(Fpn /Fp ) is abelian, every subgroup is a normal subgroup. By the Galois correspon-
dence, every field extension K of Fp in Fpn is Galois. After all, such a K must be K ∼ = Fpd , where
d | n. All subgroups of Gal(Fpn /Fp ) are cyclic of the form hσpd i for some divisor d of n. Under the
Galois correspondence, the subgroup hσpd i of Gal(Fpn /Fp ) corresponds to the subfield

Fix(Fp , σpd ) = {α ∈ Fpn | αd = α} = Fpd .

Thus,
Gal(Fpn /Fpd ) = hσpd i ∼
= Zn/d .
Note that n/d = [Fpn : Fpd ]. This proves the following generalization of Proposition 11.7.1.

Proposition 11.7.2
Let F be any finite field of order q and let K be an extension of F of degree m. Then K/F
is a Galois extension with Gal(K/F ) ∼ = Zm .

11.7.2 – Roots of Unity in Finite Fields


Section 11.4 studied the roots of unity over Q along with cyclotomic extensions. These extensions
offered specific examples of Galois groups. In Section 11.8, we will revisit them one more time for one
of the most celebrated applications of Galois theory. However, in order for the results of Section 11.8
to be relevant over finite fields, we need to briefly consider roots of unity over Fp .
11.7. FIELDS OF FINITE CHARACTERISTIC 603

Definition 11.7.3
Let F be any field not necessarily finite. The splitting field of xn − 1 over F is the nth
cyclotomic field over F and is denoted F (n) . The roots of xn − 1 in F (n) are called the nth
roots of unity and we denote this subset by µn .

Proposition 11.7.4
Let F be a field of characteristic p. Suppose that n = pk m where p - m. Then µn = µm
and is a cyclic group of order m with the operation being multiplication in F (n) .

Proof. First suppose that k = 0 so that p - n. Then xn − 1 and its derivative Dx (xn − 1) = nxn−1
have no common roots. Hence, xn − 1 is separable. Thus, |µn | = n. To see that µn is a subgroup
of U (F (n) ), first note that µn is nonempty since 1 ∈ µn . Also, for any α, β ∈ µn ,

(αβ −1 )n = αn β −n = 1n 1−n = 1

so αβ −1 ∈ µn . By the One-Step Subgroup Criterion, µn is a group. By Proposition 7.5.2, µn is


cyclic.
Now suppose that k > 0. Then, using the Frobenius automorphism σp ,
k k
(xm − 1)p = σpk (xm − 1) = σpk (xm ) − σpk (1) = xp m
− 1 = xn − 1.

Hence, xn − 1 is not separable but a product of pk copies of xm − 1. The propositions follows. 

In a field F of characteristic p (whether F is finite or not), there are no primitive nth roots of
unity when p | n. Consequently, we only consider nth roots of unity if p - n.
In any field F of characteristic p, if p - n we define the nth cyclotomic polynomial ΦF,n (x) in
a manner analogous to cyclotomic polynomials over Q. If F = Fp , then following the recursive
definition in Z[x],
Y
xn − 1 = Φd (x),
d|n

reduces modulo p. Thus, ΦFp ,n (x) is the polynomial in Fp [x] obtained by reducing the coefficients
of Φn (x) ∈ Z[x] modulo p.

11.7.3 – Dedekind’s Theorem


Given a polynomial f (x) ∈ Z[x] and a prime p, call f (x) the polynomial in Fp [x] obtained by
reduction modulo p. It turns out that the Galois group of f (x) over Fp [x] gives information about
the Galois group of f (x) over Q.

Theorem 11.7.5 (Dedekind’s Theorem)


Let f (x) ∈ Z[x] be monic and separable of degree n. Let p be a prime number such that
p - ∆(f ). Let
f (x) = f 1 (x)f 2 (x) · · · f r (x)
be the factorization of f (x) in Fp [x]. Set di = deg f i (x).
(1) GalFp (f (x)) is cyclic of order lcm(d1 , d2 , . . . , dr ).
(2) GalQ (f (x)) contains an element that acts on the roots of f (x) according to a per-
mutation of cycle type (d1 , d2 , . . . , dr ), i.e., a permutation of disjoint cycles of length
d1 , d2 , . . . , dr .
604 CHAPTER 11. GALOIS THEORY

Proof. For part (1), note that the discriminant ∆ of a polynomial in Z[x] is an integer polynomial
in the coefficients. Since f (x) ∈ Z[x], then ∆(f ) is an integer and ∆(f ) = ∆(f ) in Fp . If p - ∆(f ),
then ∆(f ) 6= 0 and so f is separable.
The polynomial f splits completely in the finite field Fpm if and only if f i splits completely in
m
Fpm for all i. This is equivalent to f i divides xp − x and therefore, by Theorem 7.7.12, di divides
m. Hence, f splits completely in the finite field Fpm if and only if lcm(d1 , d2 , . . . , dr ) | m.
For part (2), consider the universal Galois resolvent for a monic polynomial of degree n,
Y
Su (y) = (y − (u1 xσ(1) + · · · + un xσ(n) )) ∈ Z[x1 , . . . , xn , u1 , . . . , un , y].
σ∈Sn

Being symmetric in x1 , x2 , . . . , xn , we also have Su (y) ∈ Z[e1 , . . . , en , u1 , . . . , un , y], where ei is the


ith elementary symmetric polynomial in the variables x1 , x2 , . . . , xn . The Galois resolvent su (y)
of a specific f (x) = xn + fn−1 xn−1 + · · · + f1 x + f0 ∈ Z[x] is obtained from Su (y) by mapping
ei 7→ (−1)i fn−i . However, this mapping commutes with reduction modulo p, so the Galois resolvent
of f ∈ Fp [x] is equal to the reduction of su (y) modulo p, namely su (y).
Let h(y) be an irreducible factor of su (y) in Q[u1 , . . . , un ][y]. In Exercise 11.7.8 we prove that
when h(y) is multiplied by an element in Q to become monic in y, the resulting polynomial is in
Z[u1 , . . . , un ][y]. Hence, we can assume that h(y) ∈ Z[u1 , . . . , un ][y]. By Theorem 11.6.10, the Galois
group G = GalQ (f (x)) is isomorphic to the stabilizer Gh = {σ ∈ Sn | σ · h = h}.
Let h(y) ∈ Fp [u1 , u2 , . . . , un , y] be the reduction of h(y) modulo p. This is not necessarily
irreducible. Let ḡ(y) be an irreducible factor of h(y). Then ḡ(y) is an irreducible factor of su , so by
Theorem 11.6.10, the Galois group G = GalFp (f (x)) is isomorphic to the stabilizer Gḡ .
We now show that Gḡ ⊆ Gh . Assume that this is not the case. Then there exists τ ∈ Sn such
that τ · ḡ = ḡ but that τ · h = h1 6= h. By Lemma 11.6.9, h1 is another irreducible factor of su (y).
Hence, su (y) = h(y)h1 (y)q(y) in Z[u1 , . . . , un ][y] for some polynomial q. Reducing modulo p, gives
su (y) = h(y)h1 (y)q(y). The action of Sn on the u-variables is compatible with reduction modulo p
so h1 = τ · h. Since ḡ divides h, we see that ḡ = τ · ḡ divides h1 . Thus, ḡ 2 divides su . However, this
is a contradiction because su (y) is separable. We conclude that Gḡ ⊆ Gh . This implies that G is
isomorphic to a subgroup of G.
Finally, by the proof of part (1), G, viewed as a subgroup of Sn , contains a permutation of cycle
type (d1 , d2 , . . . , dr ). Hence, GalQ (f (x)) contains an element that acts on the roots according to a
permutation of the same cycle type. 

Dedekind’s Theorem allows us to obtain a lower bound on the Galois group of a polynomial
f (x). By showing that the Galois group contains permutations of a given cycle type, it is sometimes
possible to conclude that Gal(f (x)) contains a certain subgroup of Sn .

Example 11.7.6. Consider the quintic polynomial f (x) = x5 − 3x2 + 3x + 3 ∈ Z[x]. By Eisenstein’s
Criterion, we see that f (x) is irreducible. Since f (x) is irreducible, the Galois group is a transitive
subgroup of S5 (acting on the set of roots of f (x)). All transitive subgroups of S5 have a 5-cycle.
Using factorization algorithms in a computer algebra system, we can factor this polynomial over
various finite fields. In particular, reduced modulo 23,

f (x) = (x + 3)(x + 4)(x + 12)(x2 + 4x + 12)

so GalF23 (f (x)) is not just isomorphic to Z2 but, under its injection into S5 , is conjugate to a
subgroup generated by a single transposition. By Dedekind’s Theorem, GalQ (f (x)) contains both a
5-cycle and a 2-cycle. By Exercise 3.5.42, GalQ (f (x)) ∼
= S5 . 4

Using Dedekind’s Theorem requires factoring f (x) in the finite field Fp . Berlekamp’s Algorithm
provides an efficient method to factor polynomials over finite fields. Many computer algebra systems
implement this algorithm, making Dedekind’s Theorem efficient for finding information about the
Galois group of a monic polynomial with integer coefficients. In particular, Berlekamp’s algorithm
efficiently determines when f (x) is irreducible, which in turn implies that f (x) is irreducible in Z[x].
11.8. SOLVABILITY BY RADICALS 605

11.7.4 – Useful CAS Commands


The following Maple command is relevant to this section.
Maple Function
Factor(a) mod p; Calculates the factorization of the polynomial a(x) ∈ Fp using
Berlekamp’s algorithm.

Exercises for Section 11.7


1. Prove that an algebraically closed field must be infinite.
2. Determine the splitting field over Fp of the polynomial xp − x + a for a 6= 0. Explicitly show that this
extension is cyclic by showing that σ(α) = α + 1 generates the Galois group.
3. Prove that in Fp [x],
2 Y
xp + x = (xp − x + a).
a∈Fp

5
4. Consider the polynomial f (x) = x + 20x − 32 ∈ Z[x]. Use a CAS to show that it is irreducible and
to calculate its discriminant. Deduce that the Galois group of f (x) over Q is A5 .
5. Give a table that lists the number of elements of a given cycle type for each of the transitive subgroups
of S5 . Deduce that if we know a transitive subgroup of S5 contains both a 4-cycle and a 3-cycle then
it is all of S5 .
6. Use the previous exercise to show that x5 + 3x4 + 1 ∈ Q[x] has a Galois group of S5 .
7. Consider the polynomial f (x) = x7 + 14x4 − 24 ∈ Z[x]. Use a CAS to calculate its discriminant.
Find the factorization modulo 5 to deduce that f (x) is irreducible. Consider f (x) modulo 41 and use
Exercise 3.5.44 to show that the Galois group is A7 .
8. The rings Z[x1 , x2 , . . . , xn ] and Q[x1 , x2 , . . . , xn ] are UFDs.
(a) Prove that if f ∈ Z[x1 , x2 , . . . , xn ] is irreducible and nonconstant, then it is irreducible as an
element of Q[x1 , x2 , . . . , xn ].
(b) Suppose that g ∈ Z[x1 , x2 , . . . , xn ] and that f is an irreducible factor of g as an element in
Q[x1 , x2 , . . . , xn ]. Prove that there exists c ∈ Q∗ such that cf is an irreducible factor of g in
Z[x1 , x2 , . . . , xn ].

11.8
Solvability by Radicals
Much of the early development of modern algebra arose out of the effort to understand solutions
to polynomial equations. One of the central goals involved finding explicit formulas for roots of
a polynomial equation and, in particular, a formula involving radicals. From antiquity until the
16th century, scholars only knew how to solve the quadratic equation (and some equations that
quickly reduced to it). The Cardano-Tartaglia-Ferrari method to solve the cubic and the quartic
(Section 7.3), first published in 1545, offered hope that similar methods might exist to solve higher
degree polynomials with radicals.
Many mathematicians subsequently attempted to find a formula for the roots of the general
quintic equation. Mathematicians used countless techniques, many of which are beyond the natu-
ral scope of most undergraduate programs (Elliptic functions, hypergeometric series, etc.). Some
methods met with success but formulas using radicals remained elusive.
As we have seen in earlier sections of this book, field theory and Galois theory solved many
problems in mathematics that had remained open for centuries. Field theory also offers a framework
to describe a real number obtained from the rationals by a combination of radicals. Here again,
Galois theory establishes a startling and unexpected result: There does not exist a formula using
radicals for solutions to a general polynomial p(x) ∈ C[x] of degree deg p ≥ 5.
606 CHAPTER 11. GALOIS THEORY

11.8.1 – Radical and Solvable Extensions

Definition 11.8.1
A field extension K/F is called radical if there is a chain of fields

F = K0 ⊆ K1 ⊆ · · · ⊆ Kn−1 ⊆ Kn = K with Ki = Ki−1 (γi ) (11.14)

where γimi ∈ Ki−1 for some positive integer mi , and this for all 1 ≤ i ≤ n.

We point out that cyclotomic extensions of any field are radical extensions. Indeed, a cyclotomic
extension of F is F (ζ), where ζ is a primitive root of unity, which means that ζ n = 1 for some n.
p
5
√ √
Example 11.8.2. The field extension Q( 1 − 3 + 2)/Q is a radical extension because of the
following chain of subfields:

K0 = Q,
K1 = K0 (γ1 ) with γ12 = 2 ∈ K0 ,
K2 = K1 (γ2 ) with γ22 = 3 ∈ K1 ,
√ √
K3 = K2 (γ3 , with γ35 = 1 − 3 + 2 ∈ K2 . 4

Definition 11.8.3
Let F be a field. We say that an element α (in some field extension of F ) is solvable by
radicals over F if α is in a radical extension of F .

This definition makes precise the notion that α is obtained by successive additions, subtractions,
multiplications, divisions, and nth roots, starting from elements in F .

Example 11.8.4. Consider the polynomial f (x) = x3 − 4x2 + x + 1 ∈ Q[x]. It is easy to check that
f (x) has no roots in Q so, since it is a cubic, it is irreducible. The discriminant is ∆(f ) = 169 = 132 .
By Theorem 7.3.2, since ∆(f ) > 0, then f (x) has three real roots, so the splitting field K of f (x)
over Q is a subfield of R. Furthermore, since ∆(f ) is a square, then by Proposition 11.6.2, K = Q(α),
where α is one of the roots of f (x). Hence, [K : Q] = 3. √
It turns out that K is not a radical extension. Assume K is radical. Then K = Q( m a) for
some a ∈ Q and some m ≥ 3. Since K √ is Galois over Q, then the polynomial xm − a would split
completely in K so that, in particular, m
aζm ∈ K. However, we already saw that K ⊆ R. Hence,
the assumption that K is a radical extension leads to a contradiction. 4

This example shows that if K is radical over F and L is a field such that F ⊆ L ⊆ K, then L is
not necessarily radical over F . However, every element of L is solvable by radicals. This motivates
the following more refined definition.

Definition 11.8.5
Let F be a field. An extension L/F is called solvable if F ⊆ L ⊆ K, where K is a radical
extension of F .

The splitting field K introduced in the previous example is a solvable extension of Q even though
it is not a radical extension. We extend this definition to a single polynomial.

Definition 11.8.6
Let f (x) ∈ F [x]. If the splitting field of f (x) is a solvable extension over F , then f (x) is
said to be solvable by radicals.
11.8. SOLVABILITY BY RADICALS 607

The main theorem of this section, Galois’ Theorem, depends on properties of radical and solvable
extensions, so we develop those properties here.

Proposition 11.8.7
Let K/F be a radial extension.
(1) If L/K is a radical extension, then L/F is radical.
(2) For any field extension L of F , the extension KL/L is radical.

(3) If L/F is also a radical extension, then LK/F is radical.

Proof. In all parts of this proof, let F = K0 ⊆ K1 ⊆ · · · Kr−1 ⊆ Kr = K be a chain of subfields as


in (11.14) with Ki = Ki−1 (γi ), where γimi ∈ Ki−1 for 1 ≤ i ≤ r.
Part (1) is easy to see from the definition of radical extension that concatenating the defining
chain for K/F with the chain L/K, results in a chain for L/F in which each field is a cyclic extension
of the previous one. This part follows immediately.
For part (2), let L/F be any field extension. We have a chain of subfield

L = K0 L ⊆ K1 L ⊆ · · · ⊆ Kr−1 L ⊆ Kr L = KL. (11.15)

For each i ≥ 1, the composite field Ki L is Ki−1 (γi )L = Ki−1 L(γi ). Furthermore, γimi ∈ Ki−1 ⊆ Ki L
so (11.15) is a chain that makes KL/L a radical extension.
For part (3), let L/F be a radical extension. By part (2), KL/L is a radical extension. However,
since L/F is radical, then by part (1) KL/F is also a radical extension. 

Proposition 11.8.8
If K/F is a separable and radical extension, then the Galois closure of K is also a radical
extension over F .

Proof. Let F = K0 ⊆ K1 ⊆ · · · Kr−1 ⊆ Kr = K be a chain of subfields as in (11.14).


Since K is a finite extension of F , then K = F (α1 , α2 , . . . , αn ) for some elements αi ∈ K. The
Galois closure E of K over F is the composite of the splitting fields of the minimal polynomials of
each αi . For every σ ∈ Gal(E/F ), we have a chain of subfields

F = σ(F ) = σ(K0 ) ⊆ σ(K1 ) ⊆ · · · ⊆ σ(Kr−1 ) ⊆ σ(Kr ) = σ(K). (11.16)

Recall that Ki = Ki−1 (γi ) with γimi ∈ Ki−1 . Thus, σ(Ki ) = σ(Ki−1 )(σ(γi )) with σ(γi )mi ∈
σ(Ki−1 ). Thus, the chain in (11.16) shows that σ(K) is another radical extension of F .
Now E is the composite of σ(K) for all σ ∈ Gal(E/F ). A repeated application of Proposi-
tion 11.8.7 (11.8.7), establishes that E is a radical extension of F . 

Corollary 11.8.9
If a finite and separable extension L/F is solvable, then so is the Galois closure of L.

Proof. If L/F is finite and separable, then L is contained in a separable and radical extension K of
F . By Proposition 11.8.8, the Galois closure E of K is a radical extension of F . The Galois closure
of L is the composite of all field σ(L) for σ ∈ Gal(E/F ) and hence is a subfield of E. Since E/F is
a radical extension, then the Galois closure of L is a solvable extension of F . 
608 CHAPTER 11. GALOIS THEORY

11.8.2 – Galois’ Theorem


Galois and, shortly after him, Abel developed group theory for the purposes of the following theorem
along with its immediate Corollary 11.8.11. Instead of writing as a mystery novel, we state this
profound theorem first and then developed the propositions necessary for its proof.

Theorem 11.8.10 (Galois’ Theorem)


Let char F = 0 and let L/F be a Galois extension. Then L/F is a solvable extension if and
only if Gal(L/F ) is a solvable group.

Corollary 11.8.11
Suppose that char F = 0. The polynomial p(x) ∈ F [x] can be solved by radicals if and only
if GalF (p(x)) is solvable.

Though this textbook defined a solvable group in Section 9.1, from a historical perspective, it is
the property of solvable field extensions that inspired the group theoretic concept. Corollary 11.8.11
establishes a complete characterization of polynomials for which the roots can be expressed by a
combination of radicals. If F = Q, there exist polynomials with Galois group Sn for all positive
integers n. Consequently, Corollary 11.8.11 immediately leads to the following theorem, whose
historical importance cannot be overstated.

Theorem 11.8.12
The general equation of degree n in Q[x] cannot be solved by radicals for n ≥ 5.

This theorem closed the book on the search for a formula to general polynomial equations using
radicals by affirming that such a solution does not exist for polynomials of degree 5 or more. That
formulas do exist for polynomials of degree 2, 3, and 4 follows from the fact that S2 , S3 , and S4
(along with all their subgroups) are solvable groups.
This is extremely interesting because many students of algebra tend to hold the intuition that
algebraic elements consist of everything that can be expressed with radicals. Theorem 11.8.12
establishes that this is not enough. The set of elements that are solvable by radicals is a field in itself.
(See Exercise 11.8.7.) However, it is a strict subfield of the algebraic closure Q. In Exercise 11.7.6,
we showed that the Galois group of f (x) = x5 + 3x4 + 1 is S5 . Hence, the roots of f (x) cannot be
written using addition, subtractions, multiplications, divisions and nth roots, starting from rational
numbers.
In order to complete the proof of Theorem 11.8.10, we need to apply properties of the Galois
correspondence to solvable extensions. This begins with a characterization of cyclic extensions.

Proposition 11.8.13
Let F be a field of characteristic not dividing n which contains the nth roots of unity. Then
F (γ) where γ n ∈ F is a cyclic extension of F of degree dividing n.

Proof. Let µn denote the group of the nth roots of unity. Each automorphism σ ∈ G = Gal(F (γ)/F ),
there exists ζσ ∈ µn such that σ(γ) = ζσ γ. Now for any two σ, τ ∈ G,

(στ )(γ) = σ(τ (γ)) = σ(ζτ γ) = ζτ σ(γ) = ζτ ζσ γ = ζσ ζτ γ.

Consequently, the mapping σ 7→ ζσ provides an injective homomorphism of Gal(F (γ)/F ) into µn .


By the First Isomorphism Theorem, we deduce that Gal(F (γ)/F ) is isomorphic to a subgroup of µn .
However, µn is a finite subgroup of the group of units of a field so it is cyclic by Proposition 7.5.2.
Hence, Gal(F (γ)/F ) is cyclic. 

This proposition has a converse.


11.8. SOLVABILITY BY RADICALS 609

Proposition 11.8.14
Let F be a field of characteristic not dividing n that contains the nth roots of unity. Any
cyclic extension of F of degree n is of the form F (γ), where γ n ∈ F .

Proof. Let K be a cyclic extension of F with Gal(K/F ) generated by the automorphism σ. Let
α ∈ K and consider the element

γ = α + ζ −1 σ(α) + ζ −2 σ 2 (α) + · · · + ζ −n+1 σ n−1 (α), (11.17)

where ζ is an nth root of unity. Since ζ n = 1 in F and σ n = id, then

ζ −1 σ(γ) = ζ −1 σ(α) + ζ −1 σ 2 (α) + ζ −2 σ 3 (α) + · · · + ζ −n+1 α




= ζ −1 σ(α) + ζ −2 σ 2 (α) + · · · + ζ −n+1 σ n−1 (α) + α = γ.

Thus, σ(γ) = ζγ. We observe now that σ(γ n ) = ζ n γ n = γ n . Hence, γ n is fixed by σ and therefore,
γn ∈ F .
By Proposition 11.1.9, since σ i : K → K for 0 ≤ i ≤ n − 1 are distinct automorphisms that fix
F , then they are linearly independent over F as functions. Hence, the function

x 7→ x + ζ −1 σ(x) + ζ −2 σ 2 (x) + · · · + ζ −n+1 σ n−1 (x)

is nonzero so there exists an α ∈ K such that the associated γ defined in (11.17) is nonzero. Since
σ i (γ) = ζ i γ, then no power σ i fixes γ. By the Galois correspondence, γ is not in any strict subfield
of K. Hence, K = F (γ). Since γ n ∈ F , the proposition follows. 

The above proof involved the expression

α + ζ −1 σ(α) + ζ −2 σ 2 (α) + · · · + ζ −n+1 σ n−1 (α)

for some α ∈ K and some primitive root of unity. This quantity is called the Lagrange resolvent of
α with σ. It is particularly useful in the situation when the base field F contains the nth roots of
unity.

Example 11.8.15. As a simple example


√ of the Lagrange resolvent,
√ let ζ be a 3rd root of unity and
consider the cyclic extension Q(ζ, 3 5) of Q(ζ). Let α = 1 + 3 5. The Lagrange resolvent is
√ √ √ √ √
γ = (1 + 5) + ζ 2 (1 + ζ 5) + ζ(1 + ζ 2 5) = (1 + ζ + ζ 2 )1 + 3 5 = 3 5.
3 3 3 3 3

We see that γ 3 = 135 ∈ Q(ζ). 4



Example 11.8.16. For a more complicated example of the Lagrange resolvent, let ζ = (−1+i 3)/2
be a 3rd root of unity and consider the cyclic extension K = Q(ζ, α) of Q(ζ) where α is a root of
p(x) = x3 − 7x − 7. The discriminant is ∆(p) = 49 so GalQ (p(x)) ∼ = Z3 and this is also true for the
Galois group of p(x) over Q(ζ).
Let α1 , α2 , α3 be the roots of p(x) and let σ ∈ Gal(K/Q(ζ)) that maps (α1 , α2 , α3 ) to (α2 , α3 , α1 ).
Using a CAS, we can show that the Lagrange resolvent of α1 with σ is an element γ such that
γ 3 = 105 + 21ζ, which is an element of Q(ζ). 4

We are now in a position to prove Galois’ Theorem.

Proof (of Galois’ Theorem). Let L be a Galois extension over a field F of characteristic 0.
First, suppose that L/F is a solvable extension. Then F ⊆ L ⊆ K, where K is a radical extension
as in (11.14). Expand this chain to the chain

F = K0 ⊆ K10 ⊆ K1 ⊆ K20 ⊆ K2 ⊆ · · · ⊆ Kn0 ⊆ Kn = K,


610 CHAPTER 11. GALOIS THEORY

where Ki0 = Ki−1 (ζmi ) with ζmi an mi th root of unity and Ki (γi ) with γimi ∈ Ki−1 . By the Galois
correspondence, we have

1 = Gal(K/Kn ) ⊆ Gal(K/Kn0 ) ⊆ Gal(K/Kn−1 ) ⊆ · · · ⊆ Gal(K/K1 ) ⊆ Gal(K/K10 ) ⊆ Gal(K/K0 ).


(11.18)
We know that cyclotomic extensions are abelian. By definition, Ki0 is Galois over Ki−1 and by the
Galois correspondence, Gal(K/Ki−1 ) E Gal(K/Ki0 ) and

Gal(K/Ki−1 )/ Gal(K/Ki0 ) ∼
= Gal(Ki0 /Ki−1 )

is an abelian group. Also, by Proposition 11.8.13, Ki is a cyclic extension of Ki0 . Again by the
Galois correspondence, Gal(K/Ki0 ) E Gal(K/Ki ) and

Gal(K/Ki0 )/ Gal(K/Ki ) ∼
= Gal(Ki /Ki0 )

is a cyclic group. Hence, the chain (11.18) shows that Gal(K/F ) is a solvable group.
Since L/F is a Galois extension, Gal(K/L) is a normal subgroup of Gal(K/F ) and

Gal(L/F ) = Gal(K/F )/ Gal(K/L).

By Exercise 9.1.8, quotient groups of solvable groups are solvable so Gal(L/F ) is solvable.
Now suppose that Gal(L/F ) is a solvable group. Let n = [L : F ] and let ζ be a primitive nth
root of unity over F . Adjoin ζ to both L and F to obtain the following diagram of fields.

L(ζ)

F (ζ)

Since L(ζ) = LF (ζ) and both F (ζ) and L are Galois over F , then by Proposition 11.3.5,

Gal(L(ζ)/F (ζ)) ∼
= Gal(L/L ∩ F (ζ)),

which is a subgroup of Gal(L/F ). As a subgroup of a solvable group Gal(L(ζ)/F (ζ)) is solvable.


By Exercise 9.1.7, we deduce that there is a chain of subgroups

{1} = G0 ≤ G1 ≤ G2 ≤ · · · ≤ Gs = Gal(L(ζ)/F (ζ))

such that Gi−1 E Gi and Gi /Gi−1 is a cyclic group of prime order for all 1 ≤ i ≤ s. Set Li =
Fix(L(ζ), Gi ). The Galois correspondence produces a chain of subfields

F (ζ) = Ls ⊆ Ls−1 ⊆ · · · L1 ⊆ L0 = L(ζ).

The Galois correspondence also implies that Li−1 is a Galois extension of Li with Gal(Li−1 /Li ) ∼=
Gi /Gi−1 .
Since Li−1 /Li is a cyclic extension of prime degree pi , which divides | Gal(L/F )|. (See Exer-
cise 11.8.4.) Since ζ ∈ Li , then ζ to some appropriate power is a primitive pi th root of unity.
Hence, we can use Proposition 11.8.13 and deduce that Li−1 = Li (γi ) with γipi ∈ Li . Consequently,
L(ζ)/F (ζ) is a radical extension. Furthermore, F (ζ)/F is obviously a radical extension so L(ζ)/F
is a radical extension. However, F ⊆ L ⊆ L(ζ) so L/F is a solvable extension. 
11.9. PROJECTS 611

Galois’ Theorem motivated the definition of solvable groups. Because the problem of solving
polynomial by radicals had such historical importance, Galois’ Theorem channeled attention toward
deciding properties of solvable groups. Consequently, theorems about solvable groups imply results
about solvability of polynomials by radicals. For example, consider the Feit-Thompson Theorem
that states that if a group has odd order, then it is solvable. Combined with Corollary 11.8.11,
the Feit-Thompson Theorem implies that every polynomial p(x) ∈ F [x], where char F = 0, whose
splitting field E has [E : F ] odd, is solvable by radicals.

Exercises for Section 11.8



1. Let F = Q( n a) where a is a positive rational number such that xn − a is irreducible. Let E be a field

with Q ⊆ E ⊆ F such that [E : Q] = d. Prove that E = Q( d a).

2. Let F = Q( n a) where a is a positive rational number such that xn − a is irreducible. Prove that if n
is odd, then F contains no nontrivial subfields that are Galois over Q.
3. Suppose that E/F is an abelian extension, that [E : F ] = n, and that F contains a primitive nth root
of unity. Show that E is a splitting field over F of some polynomial of the form (xn1 − a1 )(xn2 −
a2 ) · · · (xnr − ar ), where ai ∈ F . [Hint: Prove ni | n.]
4. Suppose that L/F is a Galois extension and that F ⊆ Fi−1 ⊆ Fi ⊆ L with Fi Galois over Fi−1 . Prove
that | Gal(Fi /Fi−1 )| divides | Gal(L/F )|.
5. Let p be an odd prime. For all integers k with 2 ≤ k ≤ p − 2, define
(
p−2 p−2
+ p−2
  
0 if k−2 + k−1 k 
is even
ak = p−2 p−2
+ p−2
 
p if k−2 + k−1 k
is odd,

and consider the polynomial


p−2
!
p
X k
f (x) = x + ak x + (p − 1)x + 1.
k=2

(a) Prove that f (x) reduces to xp − x + 1 in Fp [x] and reduces to (x2 + x + 1)(x + 1)p−2 in F2 [x].
∼ Sp .
(b) Deduce that GalQ (f (x)) =
6. Suppose that xn − a ∈ Q[x] is irreducible. Prove that the splitting field E of xn − a has index [E : Q]
equal to nφ(n) or 21 nφ(n).
7. Let Q be the algebraic closure of Q. Let S be the subset of elements in Q that are solvable by radicals
over Q. (This is a strict subset by Theorem 11.8.12.) Prove that S is a subfield of Q.
8. Let C be the field of constructible numbers over Q. Prove that C is a subfield of S, the subfield of
elements in Q that are solvable by radicals. (See the previous exercise.)
9. Suppose that we have a chain of fields F ⊆ L ⊆ K ⊆ R, where the extension L/F is Galois and the
extension K/F is radical. Prove that [L : F ] ≤ 2.
p √
10. Let D be a square-free integer and let a ∈ Q − {1}. Prove that Q( a D) cannot be a cyclic extension
of degree 4 over Q.
11. Let f (x) ∈ Q[x] of degree n and let G = GalQ (f (x)). Define q(x) as the polynomial q(x) = f (x2 ).
Prove that GalQ (q(x)) is a subgroup of the wreath product Z2 oρ G where ρ : G → Sn corresponds to
G acting on the set of roots of f (x).

11.9
Projects
Project I. Galois Groups of Bicubics. Study the Galois groups of polynomials in Q[x] of the
form x6 + ax4 + bx2 + c. Can you determine the possible orders of some Galois groups for
certain values of a, b, or c? Can you determine the group structure of the Galois groups?
612 CHAPTER 11. GALOIS THEORY

Project II. Quaternion Galois Groups. Exercise 11.6.15 shows that there is no polynomial
p(x) ∈ Q[x] of degree less than 8 such that GalQ (p(x)) ∼ = Q8 . Try to find a polynomial of
degree 8 in Q[x] that has a Galois group isomorphic to Q8 . Try to extend your result to
characterize all irreducible polynomials of degree 8 whose Galois group is Q8 .
Project III. Lagrange Resolvents. Example 11.8.16 gave a calculation of a Lagrange resolvent
associated to a root of a nontrivial cubic polynomial in Q[x]. The Lagrange resolvent for the
root of a cubic is generally not a symmetric polynomial in the roots. Can you justify the
result of the numerical calculation for γ 3 of Example 11.8.16 from theoretical reasons? Can
you provide a similar formula γ 3 with arbitrary cubic p(x) ∈ Q[x] such that GalQ (p(x)) ∼
= Z3 ?
Can you generalize to higher degree polynomials?

Project IV. Nested Polynomials. Let p(x), q(x) ∈ Q[x]. Can you determine anything about
the Galois group of p(q(x)) from the Galois groups Gal(p(x)) and Gal(q(x))? Start by calcu-
lating some examples. (Feel free to use a CAS to assist with calculations.)
Project V. Dynatomic Polynomials. In the exercises of Section 7.5, we introduced the concept
of dynatomic polynomials. The study of Galois groups of dynatomic polynomials ΦP,n (x) given
a polynomial P (x) ∈ Q[x] is challenging. Try to discover what you can about the Galois groups
Gal(ΦP,n ) for various P and for various n. Can you find an upper bound on | Gal(ΦP,n )|? Can
you determine any internal structure to Gal(ΦP,n )?
12. Multivariable Polynomial Rings

Linear algebra encompasses many topics but, at the introductory levels, it studies solution sets to
systems of linear equations in multiple variables. This theory has applications in many branches of
mathematics, in computer science, and in the natural and social sciences. The structures introduced
in solving systems of linear equations generalized to the theory of vector spaces, and motivated the
concepts of linear transformation, kernel, image, subspaces, and so on. However, the study of solving
linear equations in multiple variables could also be generalized in a different direction, namely the
study of systems of polynomial equations in multiple variables.
In relatively recent years (recent for the history of mathematics), mathematicians have discovered
a number of algorithms that make the study of systems of polynomial equations far more tractable
than they might appear at first glance. The natural context in which to study systems of polynomial
equations is the context of multivariable polynomial rings or modules over such rings. This chapter
introduces methods used in applications of multivariable polynomial rings.
Sections 12.1 through 12.3 provide the theoretical underpinnings to studying systems of poly-
nomial equations. First, we introduce the concept of Noetherian modules and Noetherian rings,
which describe certain finiteness conditions present in multivariable polynomial rings. We then in-
troduce the notion of affine space and an affine variety, the set of solutions of a system of polynomial
equations. We also provide a complete correspondence between ideals and affine varieties.
In Sections 12.4 through 12.7, we present algorithms related to polynomial rings and introduce
the concept of a Gröbner basis of an ideal, a generating set that is optimal for the application of
many algorithms. We also illustrate the value of these algorithms for solving systems of polynomial
equations. Finally, Section 12.8 gives a brief introduction to algebraic geometry, a vast field that
stems from applying the theory of multivariable polynomial rings to study geometric concepts.
The ability to solve systems of polynomial equations has found innumerable applications in
computation. Consequently, mathematicians and scientists have developed many computer imple-
mentations of polynomial division, Buchberger’s algorithm, and other algorithms related to solving
systems of polynomials equations. Besides the implementation in commercial computer algebra
systems (e.g., Maple, Mathematica), some other freeware packages include CoCoA (Computational
Commutative Algebra), GAP (Groups, Algorithms, Programming), Macaulay 2, Magma, or Sage.
A variety of recent books address applications of Gröbner bases and computational algebra. See for
example [1, 16, 17, 26].

12.1
Introduction to Noetherian Rings
In order to study polynomial rings, we take a step back in abstraction and present a notion that is
restrictive enough to capture the finiteness conditions that give multivariable polynomial rings some
of their valuable properties but broad enough to include other rings besides F [x1 , x2 , . . . , xn ], where
F is a field.
In this section, the ring of coefficients R always denotes a commutative ring.

613
614 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

12.1.1 – Noetherian Modules

Definition 12.1.1
Let (S, 4) be a poset.

• An increasing sequence x1 4 x2 4 · · · in S is stationary if there exists some n such


that xi = xn for all i ≥ n.
• If every increasing (resp. descending) sequence in S is stationary, then S is said to
satisfy the ascending chain condition (resp. descending chain condition).

The ascending chain condition has an equivalent formulation.

Proposition 12.1.2
A poset (S, 4) satisfies the ascending (resp. descending) chain condition if and only if every
nonempty subset of S has a maximal (resp. minimal) element.

Proof. (⇐=) Suppose that the poset is such that every subset of S has a maximal element. Let
x1 4 x2 4 · · · be an increasing sequence in S. The elements of this sequence for a set, which must
have maximal element a. Suppose that xn = a. Then since a = xn 4 xi for all i ≥ n and since a is
maximal, then xi = a for all i = i ≥ n. Thus, the sequence is stationary.
(=⇒) Suppose there exists a nonempty subset T of S that does not have a maximal element.
Define a sequence as follows. Let x1 be any element in T . Let xi be any term in the sequence. Then
since T does not have a maximal element, there exists some element xi+1 such that xi 4 xi+1 and
xi+1 6= xi . This inductive definition creates an increasing sequence that is not stationary.
The proof of the equivalent for the descending chain condition is identical. 
Example 12.1.3. Consider the poset (N∗ , |) of positive integers with the partial order of divisi-
bility. This poset does not satisfy the ascending chain condition since for example, 2, 22 , 23 , . . . is
an ascending chain that does not have a maximal element. On the other hand, (N∗ , |) satisfies the
descending chain condition. If a ∈ N∗ , then a has a finite number of divisors so any chain that
includes a can only have predecessors (divisors) that are strict a finite number of times. Hence, a
chain containing any positive integer a must become stationary. 4
Properties of ascending chains or descending chains can be applied in many algebraic contexts.
However, for Noetherian rings and modules, we consider chains of submodules.

Definition 12.1.4
Let R be any ring. An R-module M is called Noetherian (resp. Artinian) if its poset of
submodules (Σ, ⊆) satisfies the ascending (resp. descending) chain condition.

Example 12.1.5. Consider the ring of integers Z as a Z-module. Submodules of Z consist of its
ideals. Since Z is a PID, every ideal has the form (a) for some nonnegative a ∈ Z. For ideal
containment, (a) ⊆ (b) if and only if b | a. Hence, given any ideal, (a), the only ideals that can
contain it are generated by the divisors of a and hence there is only a finite number of them. Thus,
Z satisfies the ascending chain condition, so Z is a Noetherian module. However, the chain of ideals
(a) ⊇ (a2 ) ⊇ · · · ⊇ (ak ) ⊇ · · ·
never terminates. Hence, Z is not an Artinian module over Z. 4
Example 12.1.6. As a Z-module, the abelian group Q is neither Noetherian nor Artinian. From
the previous example, we already see that Q is not Artinian. However, the chain of submodules
1 1 1
Z⊆ Z ⊆ Z ⊆ ··· ⊆ kZ ⊆ ···
2 4 2
12.1. INTRODUCTION TO NOETHERIAN RINGS 615

is not stationary. Thus, Q, as a Z-module, does not satisfy the ascending chain condition. 4

Example 12.1.7. The previous example inspires us to find a Z-modules that satisfies the descending
chain condition but not the ascending chain condition. Consider the group Q/Z. Let p be a prime
number and define Gn the subgroup of the form
 
a n
Gn = |0 ≤ a < p .
pn

Thus, (Gn , +) is isomorphic to Zpn . The chain of submodules

{0} = G0 ⊆ G1 ⊆ G2 ⊆ · · ·

is never stationary so Q/Z is not Noetherian. On the other hand, a Z-submodule of Q/Z is of the
form M = ab Z with gcd(a, b) = 1. However, there are only b elements in M . A strict submodule
of M will have d elements, where d | b. Since b has a finite number of divisors, every descending
chain of modules that includes M is eventually stationary. Hence, Q/Z satisfies the descending chain
condition, so is Artinian as a Z-module. 4

Example 12.1.8. Every finite abelian group G, as a Z-module, satisfies both the ascending and
the descending chain condition. This is simply because G has a finite set of subgroups, so every
nonempty subset of Sub(G) has a maximal element. 4

Proposition 12.1.9
Let M be a module and N a submodule. M is Noetherian (resp. Artinian) if and only if
N and M/N are Noetherian (resp. Artinian).

Proof. We only provide a proof for Noetherian modules since the proof for Artinian modules is
similar.
Suppose that M is Noetherian. Any ascending chain of submodules in N is also an ascending
chain of submodules in M so is stationary. Thus, N is Noetherian. By the Fourth Isomorphism
Theorem, a chain of submodules in M/N corresponds uniquely to a chain of submodules of M but
that contain N . Any such chain in M is stationary, so the chain in M/N is stationary. Thus, M/N
is Noetherian.
Conversely, suppose that N and M/N are Noetherian. Let

M0 ⊆ M1 ⊆ M2 ⊆ · · ·

be an ascending chain of submodules in M . Then the chain {Mi ∩ N }i≥0 is an ascending chain of
submodules in N , while {π(Mi )}i≥0 is an ascending chain of submodules in M/N , where π : M →
M/N is the canonical projection. Both of these chains are constant after a large enough index k.
Suppose that i ≥ k. Let a ∈ Mi+1 . Since π(Mi ) = π(Mi+1 ), then π(a) ∈ π(Mi ) so there exists
b ∈ Mi such that π(b) = π(a). Hence, π(a − b) = 0 so a − b ∈ ker π = N . Setting n = a − b, we
see that n ∈ Mi+1 ∩ N but since Mi = Mi+1 , then n ∈ Mi ∩ N . In particular, both n and b are in
Mi so a = b + n ∈ Mi . Consequently, Mi = Mi+1 , which shows that the ascending chain {Mi } is
stationary. So M is Noetherian. 

Corollary 12.1.10
Let M1 , M2 , . . . , Mr be Noetherian (resp. Artinian) R-modules. The direct sum M1 ⊕M2 ⊕
· · · ⊕ Mr is a Noetherian (resp. Artinian) R-module.

Proof. (Left as an exercise for the reader. See Exercise 12.1.1.) 


616 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

As we have seen, many theorems concerning Noetherian modules also hold for Artinian modules
and vice versa. However, this is not always the case. There is no equivalent to the following
proposition for Artinian modules. It is precisely the equivalent condition in the following proposition
that makes Noetherian modules so interesting.

Proposition 12.1.11
Let R be a commutative ring. An R-module M is Noetherian if and only if every submodule
of M is finitely generated.

Proof. First, suppose that M is Noetherian. Let N be a submodule of M . Let (Σ, ⊆) be the poset
of all finitely generated submodules of N . Since M is Noetherian, then by Proposition 12.1.2, Σ
has a maximal element N 0 . Assume that N 0 is a strict submodule of N . Then consider the module
N 0 + Rx for some element x ∈ N − N 0 . Then N 0 + Rx is still finitely generated and a submodule of
N . Hence, it contradicts the maximality of N 0 . Thus, N 0 = N .
Conversely, suppose that every submodule of M is finitely generated. Consider an ascending
chain of submodules M1 ⊆ M2 ⊆ · · · of M . The set
[
N= Mn
n≥0

is a submodule of M . By the hypothesis, N is finitely generated. Let {x1 , x2 , . . . , xr } be a set of


generators for N . Now suppose that xi ∈ Mni . Setting n0 = max{n1 , n2 , . . . , nr }, we see that all
xi ∈ Mn for all n ≥ n0 . Thus, Mn = Mn0 for all n ≥ n0 . Hence, the ascending chain of submodules
is stationary. 
At first pass, the reader might think that the condition “every submodule is finitely generated”
is heavier than necessary. However, it is incorrect to surmise that a module is finitely generated if
and only if every submodule is finitely generated. Example 12.1.15 below illustrates a situation in
which a module is finite generated but some submodules are not.

12.1.2 – Noetherian Rings

Definition 12.1.12
A commutative ring R is said to be Noetherian (resp. Artinian) if it is a Noetherian (resp.
Artinian) as an R-module.

Recall that when considering R as an R-module, the R-submodules of R are precisely the ideals
of R. Hence, a ring R is Noetherian (resp. Artinian) if it satisfies the ascending (resp. descending)
chain condition on ideals. By Proposition 12.1.11, a ring R is Noetherian if and only if every ideal
is finitely generated.
The following examples illustrate some of the finiteness conditions on Noetherian and Artinian
rings.
Example 12.1.13. Since a field F has only two ideals, namely (0) and F , then every chain of ideals
is stationary. Thus, every field is both Artinian and Noetherian. 4
Example 12.1.14. Every principal ideal domain (PID) R is Noetherian. This is because, by defi-
nition, every ideal I in R is generated by a single element. Since every ideal is finitely generated, by
Proposition 12.1.11, R is a PID. By Proposition 12.1.11, the condition for a ring to be Noetherian
is a direct generalization of the principal ideal condition. 4
Example 12.1.15. An example of a non-Noetherian ring is the polynomial ring R = F [x1 , x2 , . . .]
in a countable number of variables, where F is a field. The ascending chain of ideals
(x1 ) ⊆ (x1 , x2 ) ⊆ (x1 , x2 , x3 ) ⊆ · · ·
12.1. INTRODUCTION TO NOETHERIAN RINGS 617

never terminates so F [x1 , x2 , . . .] is not Noetherian. It is not Artinian either because (x1 ) ⊇ (x21 ) ⊇
· · · is a descending chain that never terminates.
Notice that the ring R itself is a finitely generated R-module, generated by 1, whereas the
submodule consisting of polynomials without a constant term is not finitely generated. Hence,
this gives an example where the R-module is finitely generated but not every submodule is finitely
generated. 4

Though Proposition 12.1.9 guarantees that ideals of a Noetherian ring must be Noetherian, an
arbitrary subring of a Noetherian ring need not be Noetherian. Consider the previous example.
Note that F [x1 , x2 , . . .] is an integral domain so we can construct its field of fractions F (x1 , x2 , . . .).
Since this is a field, it is Noetherian. Consequently, this gives an example where the subring of a
Noetherian ring is not Noetherian.
Proposition 12.1.9 has many consequences for Noetherian and Artinian rings. The following
proposition is a first one. A few other similar results appear in the exercises.

Proposition 12.1.16
Let R be a Noetherian (resp. Artinian) ring and let M be a finitely generated R-module.
Then M is a Noetherian (resp. Artinian) R-module.

Proof. If M is finitely generated by the elements x1 , x2 , . . . , xn , then there exists a surjective R-


module homomorphism π : Rn → M defined by

π(r1 , r2 , . . . , rn ) = r1 x1 + r2 x2 + · · · + rn xn .

By the First Isomorphism Theorem, M ∼= Rn / ker π. By Corollary 12.1.10, Rn is a Noetherian (resp.


Artinian) R-module and by Proposition 12.1.9, then both ker π and M ∼ = Rn / ker π are Noetherian
(resp. Artinian). 

We now consider some properties of Noetherian rings that do not hold for Artinian rings.

Proposition 12.1.17
Let R be a subring of a ring S. Suppose that R is a Noetherian ring and that S is finitely
generated as an R-module. Then S is a Noetherian ring.

Proof. By Proposition 12.1.16, S is Noetherian as an R-module. Every ideal I in S is an S-


submodule. Any S-module is also an R-module since R is a subring of S. By Proposition 12.1.11,
every R-submodule of I is finitely generated as an R-module, which means that it is certainly finitely
generated as an S-module. In particular, I itself is finitely generated as an S-module. Thus, S is a
Noetherian ring. 

For the following proposition we recall the following notation for polynomials. If p(x) ∈ R[x],
then LC(p(x)) or more simply LC(p) denotes the leading coefficient of p(x) and LT(p(x)) or LT(p)
is the leading term. So if
p(x) = an xn + · · · + a1 x + a0 ,
then deg p(x) = n, LC(p) = an and LT(p) = an xn . If I is an ideal in R[x], we also define LC(I) as
the set of leading coefficients of polynomials that occur in I. It is easy to see that with I an ideal
in R[x], then LC(I) is an ideal in the coefficient ring R.

Theorem 12.1.18 (Hilbert’s Basis Theorem)


If R is a Noetherian ring, then the polynomial ring R[x] is Noetherian.
618 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

Proof. Let I be an arbitrary ideal in R[x]. Since R is Noetherian, LC(I) is finitely generated, say
by a1 , a2 , . . . , ak . Let f1 , f2 , . . . , fk be polynomials in I such that LC(fi (x)) = ai for i = 1, . . . , k.
Define mi = deg fi (x) and let m = max{m1 , m2 , . . . , mk }.
Call J = (f1 , f2 , . . . , fr ) the new ideal in R[x]. Note that J ⊆ I.
Let f ∈ I. Then LC(f ) ∈ LC(I) so there exist r1 , r2 , . . . , rk ∈ R such that LC(f ) = r1 a1 + r2 a2 +
· · · + rk ak . If n = deg f (x) ≥ m, then the polynomial
k
X
f (x) − ri xn−mi fi (x)
i=1

has degree strictly less than deg f (x) since the powers of x and the constants ri are such that the
subtraction cancels the leading term of f (x). Notice that the polynomial on the right is an element
of J. By repeating this process, we can subtract from f (x) an element of J and obtain a polynomial
g(x) of degree strictly less than m. In other words, there exists a polynomial g(x) with deg g(x) < m
such that f (x) − g(x) ∈ J.
As R submodules of R[x], we have shown that I = (I ∩ M ) + J, where M = R + Rx + · · · +
Rxm−1 . Now M is obviously finitely generated as an R-module so it is a Noetherian R-module by
Proposition 12.1.16. The intersection I ∩ M is a submodule of M so by Proposition 12.1.11, I ∩ M
is finitely generated, say by the set {g1 , g2 , . . . , g` }, as an R-module. It is clear that

{f1 , f2 , . . . , fk , g1 , g2 , . . . , g` }

generate I as an R[x]-module. Thus, I is finitely generated. Since I was arbitrary, again by


Proposition 12.1.11, we deduce that R[x] is a Noetherian ring. 

Hilbert’s Basis Theorem leads us to the following essential corollaries that are essential for sub-
sequent study of multivariable polynomial rings.

Corollary 12.1.19
Let R be Noetherian. Then R[x1 , x2 , . . . , xn ] is Noetherian. More generally, every finitely
generated R-algebra is Noetherian.

Proof. The first part of the corollary follows by a repeated application of Hilbert’s Basis Theorem.
The second half of the corollary follows from Proposition 12.1.9. 

Corollary 12.1.20
Let F be a field. Every ideal in F [x1 , x2 , . . . , xn ] is finitely generated.

Though the study of solution sets to systems of polynomial equations is more general and more
involved than linear equations, Hilbert’s Basis Theorem, and in particular Corollary 12.1.20, provides
a key result that makes subsequent useful algorithms possible.

Exercises for Section 12.1


1. Prove Corollary 12.1.10.
2. Let V be a vector space over a field F . Prove that the following are equivalent:
(a) V satisfies the ascending chain condition;
(b) V satisfies the descending chain condition;
(c) dimF V is finite.
3. Let R = C 0 ([−1, 1], R) be the ring (with + and ×) of continuous real-valued functions on the interval
[−1, 1]. Let   
1 1
Fn = f ∈ R | f (x) = 0 for all x ∈ − , .
n n
12.2. MULTIVARIABLE POLYNOMIALS AND AFFINE SPACE 619

(a) Prove that Fn is an ideal for all integers n ≥ 1.


(b) Prove that the chain F1 ⊆ F2 ⊆ · · · ⊆ Fn ⊆ · · · never terminates.
(c) Deduce that C 0 ([−1, 1], R) is not a Noetherian ring.
4. Prove that if R is a Noetherian (resp. Artinian) ring and ϕ : R → S is a surjective homomorphism
onto a ring S, then S is Noetherian (resp. Artinian).
5. Let R be a commutative ring.
(a) Prove that if M is a Noetherian R-module and ϕ : M → M is an epimorphism (surjective module
homomorphism) then ϕ is an isomorphism.
(b) Prove that if M is an Artinian R-module and ϕ : M → M is an endomorphism (injective module
homomorphism) then ϕ is an isomorphism.
[Hint: Consider Ker ϕn and M/ Ker ϕn .]
6. Let R be a commutative ring. Suppose that M is completely reducible with a finite number of
components. Prove that M satisfies both the ascending and the descending chain conditions.
7. This exercise generalizes the previous one. We call a composition series of M a chain of submodules

{0} = M0 ⊆ M1 ⊆ · · · Mn−1 ⊆ Mn = M

where Mi is a submodule of Mi+1 such that Mi+1 /Mi is an irreducible R-module. Prove that M has
a composition series if and only if M satisfies both chain conditions.
8. Let R be a Noetherian ring and let D be a multiplicatively closed subset of R that does not contain 0.
Prove that D−1 R is a Noetherian ring. [Hint: Prove that the ideals in D−1 R are of the form D−1 I,
where I is an ideal of R.]
9. Let {Ij }j∈J be an arbitrary collection of ideals in a Noetherian ring R. Prove that if I is the least
ideal containing all Ij , then there exists a finite subset {j1 , j2 , . . . , jr } ⊆ J such that

I = Ij1 + Ij2 + · · · + Ijr .

10. Let R be Noetherian. Prove that R[[x]] is Noetherian. [Hint: Modify the proof of the Hilbert Basis
Theorem but instead of LC(I), consider the ideal of coefficients of the terms of least degree of each
power series f ∈ R[[x]].]
11. An ideal I in a ring R is said to be irreducible if whenever I = I1 ∩I2 , then I = I1 or I = I2 . Prove that
in a Noetherian ring, every ideal is a finite intersection of irreducible ideals. [Hint: Assume otherwise
and consider the poset of ideals that is not a finite intersection of irreducible ideals.]

12.2
Multivariable Polynomials and Affine Space
12.2.1 – Terminology for Multivariable Polynomials
We can think of the multivariable polynomial ring in two different ways.
We have encountered some theorems that affirm that if a ring R satisfies some property then the
ring R[x] satisfies that same property. For example, Theorem 6.5.5 states that if R is a UFD then
R[x] is as well; Hilbert’s Basis Theorem states that if R is Noetherian, then so is R[x]. In each case,
the immediate corollary is that R[x1 , x2 , . . . , xn ] satisfies that property. Such corollaries involve a
recursive application of the associated theorem and viewing R[x1 . . . , xn ] as R[x1 , . . . , xn−1 ][xn ]. In
this view of the multivariable polynomials, one writes a polynomial f ∈ R[x1 . . . , xn ] as
m
X
f (x1 , x2 , . . . , xn ) = gi (x1 , . . . , xn−1 )xin .
i=0

In this perspective, f is viewed as a polynomial in xn with coefficients in ring R[x1 , . . . , xn−1 ].


620 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

The above perspective has merit but, when F is a field, rings of the form F [x1 , x2 , . . . , xn ] have
additional desirable properties. In particular, F [x1 , x2 , . . . , xn ] is a unital associative algebra over F ,
a vector space equipped with a multiplication that is associative, and has an identity. The standard
basis of F [x1 , x2 , . . . , xn ] consists of all monomials

xα1 α2 αn
1 x2 · · · xn ,

where αi ∈ N for 1 ≤ i ≤ n. To simplify notation, we often write the above monomial as xα , where
α ∈ Nn . Hence, a polynomial f in F [x1 , x2 , . . . , xn ] is a (finite) linear combination
X
f= cα xα ,
α

where cα ∈ F . The product on F [x1 , x2 , . . . , xn ] follows from properties of distributivity and the
product xα · xβ = xα+β , where the addition α + β occurs in the monoid Nn . (We point out that the
product in F [x1 , x2 , . . . , xn ] is a convolution product on Fun(Nn , F ) as described in Section 5.4.3.)

Definition 12.2.1
(1) The n-tuple α ∈ Nn is called the multidegree of the monomial xα . We write mdeg xα =
α.
(2) The integer α1 +α2 +· · ·+αn is called the total degree of the monomial and is denoted
|α|.

(3) The constant cα is called the coefficient of the monomial xα .


(4) The product cα xα , assuming cα 6= 0, is called a term of the polynomial.
(5) The total degree of a nonzero polynomial f is the maximum total degree |α| over all
the terms of f .

Example 12.2.2. In R[x, y, z], consider the polynomial

f (x, y, z) = 7x2 y 3 − 2xy 3 z 2 + 2y 3 z 2 + x − 3.

It has five terms:


terms multidegree total degree
7x2 y 3 (2, 3, 0) 5
−2xy 3 z 2 (1, 3, 2) 6
2y 3 z 2 (0, 3, 2) 5
x (1, 0, 0) 1
−3 (0, 0, 0) 0

The total degree of the polynomial is 6. 4

Definition 12.2.3
A polynomial in F [x1 , x2 , . . . , xn ] is called homogeneous if all of the monomials have the
same total degree. For any polynomial f , the homogeneous component of degree d is the
sum of the terms of total degree d that appear in f .

The polynomial in Example 12.2.2 is obviously not homogeneous. The homogeneous component
of degree 5 is 7x2 y 3 + 2y 3 z 2 . Note that the elementary symmetric polynomials sk (x1 , x2 , . . . , xn )
introduced in Section 11.5.1 are homogeneous of total degree k.
12.2. MULTIVARIABLE POLYNOMIALS AND AFFINE SPACE 621

12.2.2 – Affine Space; Affine Varieties

Multivariable polynomials serve a dual purpose. We can and do treat them purely as algebraic
objects. However, we can also consider them as functions from F n → F . Let c = (c1 , c2 , . . . , cn ) ∈ F n
and f ∈ F [x1 , x2 , . . . , xn ]. In the functional perspective, as usual, f (c) denotes f evaluated at c
and is an element of F . Though F n has the structure of a vector space over F , as a function
f : F n → F is not generally a linear transformation. It is not uncommon to consider the solution
set of f (x1 , x2 , . . . , xn ) = 0, but again this is generally just a subset of F n and not a subspace.
In the more algebraic perspective, the evaluation evc : F [x1 , x2 , . . . , xn ] → F defined by evc (f ) =
f (c) is a ring homomorphism. Hence, we can consider the kernel ker evc . This is an ideal in
F [x1 , x2 , . . . , xn ], the ideal of all polynomials that evaluate to 0 at c.

Definition 12.2.4
Let F be a field. The set F n is called the n-dimensional affine space over F . It is alternately
denoted by AnF .

From a set-theoretic perspective, there is no need for this terminology. However, in classical
algebraic geometry, additional geometric structure is imposed on F n and the terminology of affine
space refers to that additional structure, which differs from the vector space structure.
In the affine space, we care about solution sets to systems of polynomial equations.

Definition 12.2.5
Let F be a field and let S ⊆ F [x1 , x2 , . . . , xn ] be a subset of polynomials, not necessarily
finite. Then we define

V(S) = {c ∈ F n | f (c) = 0 for all f ∈ S}

and we call it the affine variety defined by S.

Example 12.2.6. Consider the polynomial p(x, y) = 4x2 − x4 + 4y 2 − y 4 − 3 ∈ R[x, y]. As a subset
of affine space R2 , the variety of V(p) is depicted below.

Example 12.2.7. Consider the polynomial p(x, y, z) = (x2 + y 2 − z 3 )2 − (x2 + y 2 + 3z 2 ) ∈ R[x, y, z].
As a subset of affine space R3 , the affine variety of V(p) is depicted below.
622 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

x y
4

Example 12.2.8. In the ring R[x, y, z] consider the two polynomials f1 (x, y, z) = x2 + y 2 − z 2 and
f2 (x, y, z) = y 2 + z − 4. The affine variety V(f1 , f2 ) is depicted on the left below.

z
y

y
y
x
x

Since V(f1 , f2 ) is the set of points in R3 that satisfy both f1 (x, y, z) = 0 and f2 (x, y, z) = 0,
then V(f1 , f2 ) = V(f1 ) ∩ V(f2 ). The diagram on the right shows the varieties V(f1 ) and V(f2 )
separately to illustrate their intersection. 4

Definition 12.2.9
An affine variety V in AnK is called a hypersurface if V = V(f ), where f is an irreducible
polynomial. (If n = 2 a hypersurface is called a curve and if n = 3, a hypersurface is called
a surface.)

Example 12.2.10. As an example that illustrates the importance of the field, consider the poly-
nomial p(x, y) = x2 + y 2 − 1 ∈ F [x, y].
If F = R, then the hypersurface V(p) in R2 is the usual unit circle in the affine real plane.
If F = Q, then V(p) in Q2 consists of rational Pythagorean pairs. It is possible to show that all
solutions to x2 + y 2 − 1 = 0 in Q are of the form
a 2 − b2 2ab
x= and y= .
a 2 + b2 a 2 + b2
The unit circle in R2 might give some sense of V (p) in Q2 , but the betweenness (or continuity)
assumptions in the Euclidean space is that the unit circle in R2 has no holes. This is not the case
for geometry in Q2 .
If F = F7 , it is not particularly easy to visualize the affine variety V(p) in the affine space F27 .
The pairs of points that are solutions to x2 + y 2 = 1 is
(0, 1), (0, 6), (1, 0), (6, 0), (2, 2), (2, 5), (5, 2), and (5, 5).
12.2. MULTIVARIABLE POLYNOMIALS AND AFFINE SPACE 623

We could depict it with the following diagram of points.


y y

x or
The latter diagram used {−3, −2, −1, 0, 1, 2, 3} as the distinct set of representatives in F7 .
If F = C, then we would need 2 C-dimensions or 4 real dimensions to fully depict p(z, w) = 0.
If we write z = x1 + iy1 and w = x2 + iy2 , then the equation z 2 + w2 − 1 = 0 breaks into real and
complex components as

x21 − y12 + x22 − y21 − 1 = 0 and 2x1 y2 + 2x2 y2 = 0.

In this perspective, we can view (though visualizing is another story) V(p) as a variety described
by two polynomial equations in the four-dimensional real affine space R4 . 4

The construction V is in fact a function from the set of subsets of F [x1 , x2 , . . . , xn ] to the set
of subsets of F n . This function, which corresponds to finding solutions to systems of polynomials,
satisfies many ring-theoretic properties that make ring theory the best context in which to study
systems of polynomial equations.
Suppose that S ⊆ S 0 are subsets of F [x1 , x2 , . . . , xn ]. If c ∈ V(S 0 ) then f (c) = 0 for all f ∈ S.
Thus, c ∈ V(S). This implies that the affine variety construction V is an inclusion-reversing function
from the poset of subsets (P(F [x1 , x2 , . . . , xn ]), ⊆) to the poset of subsets (P(F n ), ⊆). In other words,

S ⊆ S 0 =⇒ V(S 0 ) ⊆ V(S). (12.1)

Given a set S ∈ F [x1 , x2 , . . . , xn ], consider the ideal I = (S) and the variety V(I) associated to the
ideal I. From (12.1), V(I) ⊆ V(S). Now if c ∈ V(S), then f (c) = 0 for all f ∈ S. On the other
hand, every polynomial p ∈ I is of the form

p = g1 f1 + g2 f2 + · · · + gn fn

for gi ∈ F [x1 , x2 , . . . , xn ] and fi ∈ S. Thus,

p(c) = g1 (c)f1 (c) + g2 (c)f2 (c) + · · · + gn (c)fn (c) = 0

and so c ∈ V(I). Thus, V(S) ⊆ V(I). We have proven the following important proposition.

Proposition 12.2.11
For all subsets S ∈ F [x1 , x2 , . . . , xn ], the affine variety V(S) is equal to V(I), where I = (S).

Hilbert’s Basis Theorem asserts that every ideal I ∈ F [x1 , x2 , . . . , xn ] is generated by a finite
number of elements f1 , f2 , . . . , fs . Along with Proposition 12.2.11, this leads to the surprising result
that every affine variety V(S) is of the form V(f1 , f2 , . . . , fs ), where (f1 , f2 , . . . , fs ) = (S), or in
other words, that every affine variety is the solution set to a finite system of polynomials

f (x , x , . . . , xn ) = 0
 1 1 2


..
 .

f (x , x , . . . , x ) = 0.
s 1 2 n
624 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

We point out that the study of affine varieties directly generalizes the introductory linear algebra
topic of solving systems of linear equations. Systems of linear equations simply involve polynomials
fi of the form
fi (x1 , x2 , . . . , xn ) = ai1 x1 + ai2 x2 + · · · + ain xn − bi ,

for scalars aij , bi ∈ F , for 1 ≤ i ≤ s and 1 ≤ j ≤ n.


In the same way that the variety construction V(S) corresponds to all the points that are zeros
of all polynomials in S, we can reverse the way of thinking.

Definition 12.2.12
Let Z ⊆ F n be a subset of points in the affine space. Then we define

I(Z) = {f ∈ F [x1 , x2 , . . . , xn ] | f (c) = 0 for all c ∈ Z}.

The subset of polynomials I(Z) is nonempty since 0 ∈ I(Z). If f, g ∈ I(Z), then f (c) − g(c) = 0
for all c ∈ Z, so f − g ∈ I(Z). Furthermore, if f ∈ I(Z) and p ∈ F [x1 , x2 , . . . , xn ], then p(c)f (c) = 0
for all c ∈ Z, so pf ∈ I(Z). We have shown the following proposition.

Proposition 12.2.13
Let Z ⊆ F n be a subset of the affine space. The subset I(Z) is an ideal in F [x1 , x2 , . . . , xn ].

As we will see, the functions I and V share many parallel properties. A first similarity is that
I is an inclusion-reversing function from the poset of subsets (P(F n ), ⊆) to the poset of subsets
(P(F [x1 , x2 , . . . , xn ]), ⊆). In other words,

Z ⊆ Z 0 =⇒ I(Z 0 ) ⊆ I(Z).

Indeed, if f ∈ I(Z 0 ), then f (c) = 0 for all c ∈ Z 0 . In particular, f (c) = 0 for all c ∈ Z since Z ⊆ Z 0 .
Hence, if f ∈ I(Z 0 ), then f ∈ I(Z).
Since the I and V functions have opposite domains and codomains, as depicted below,

P(F [x1 , x2 , . . . , xn ]) P(F n )

we might wonder if they are inverse functions. This cannot be the case because V is not injective.
We can see this from the fact that V(S) = V(I), where I is the ideal generated by S. Even if we
restrict V to the set of ideal in F [x1 , x2 , . . . , xn ], the variety function V still fails to be injective. For
example, in R[x, y] with corresponding affine space A2R , the ideals (x, y) and (xn , y m ), where n and
m are any positive integers, give the same variety, namely the origin {(0, 0)}. On the other hand,
I({(0, 0)}) = (x, y), so I(V(xn , y m )) = (x, y).
Having defined the term “affine variety,” it is important to note that not every subset of F n can
arise as an affine variety. For example, consider the set Z as a subset of the one-dimensional affine
space R1 . Affine varieties in the affine space of R are solution sets to ideals of polynomials in R[x].
Since R[x] is a PID, then affine varieties in R correspond to solution sets of a polynomials. Hence,
the varieties in R correspond to ∅ (for the polynomial 1), R (for the polynomial 0), and any finite
subset of R.
As we continue to develop the algebraic-geometric structure of affine space AnF , we point out
that we are no longer interested in subsets of F n and subsets of F [x1 , x2 , . . . , xn ]. Instead, we care
12.2. MULTIVARIABLE POLYNOMIALS AND AFFINE SPACE 625

about the correspondence between affine varieties and ideals in F [x1 , x2 , . . . , xn ]:

ideals in F [x1 , x2 , . . . , xn ] affine varieties in AnF .

V (12.2)

12.2.3 – Useful CAS Commands


This section deals with the connection between the polynomial equations and their solution sets. If
we are studying the polynomial rings R[x, y] or R[x, y, z], it is possible to visualize the solution sets
with the following plotting commands.

Maple Function
with(plots); Imports a library of plotting commands, among which are the
following three procedures.
implicitplot Plots the solution set to an algebraic equation in two variables
in a specified domain of those two variables.
implicitplot3d Plots the solution set to an algebraic equation in three vari-
ables in a specified domain of those three variables.
intersectplot Plots the intersection of two surfaces in R3 , specified either
as function graphs, parametric surfaces, or solutions to equa-
tions.
with(PolynomialIdeals); The package PolynomialIdeals contains a data structure
and a variety of commands that are useful for manipulating
ideals in polynomial rings of multiple variables.

Exercises for Section 12.2


1. Show that there are d + 1 monomials in F [x, y] of total degree d. More generally, prove that there
are n+d−1
d
monomials in F [x1 , x2 , . . . , xn ] of total degree d. [Hint: Use the following combinatorial
argument. Consider the number of ways of putting n − 1 separators between d ones.]
2. Sketch the following affine varieties in R2 .
(a) V(x2 − y 2 )
(b) V(2x + 3y − 6, y − 2x + 1)
(c) V(x2 + 72x + 4y 2 − 36x − 9)
3. Sketch the following varieties in R3 .
(a) V(x2 + y 2 − z 2 − 1)
(b) V(x2 + y 2 − z 2 , x2 + y 2 + z 2 − 10)
4. The graph of the polar function r = (1 − cos θ) is a cardioid. Show that this cardioid is an affine
variety (by finding an equation for it).
5. Consider the flower curve traced out by the polar equation r2 = cos(4θ). Prove that this set is an
affine variety in R2 defined by the single polynomial equation (x2 + y 2 )3 − (x4 − 6x2 y 2 + y 4 ) = 0.
6. A torus in R3 of large radius R and smaller tube radius r can be parametrized by
~
X(u, v) = ((R + r cos(u)) cos v, (R + r cos(u)) sin v, r sin u).

Show that this torus is an affine variety.


7. Suppose that in the affine space R3 the following two systems of linear polynomial equations have the
same line as a solution set
( (
a1 x + b1 y + c1 z − d1 = 0 a3 x + b3 y + c3 z − d3 = 0
and
a2 x + b2 y + c2 z − d2 = 0 a4 x + b4 y + c4 z − d4 = 0.
626 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

Prove that as ideals in R[x, y, z],

(a1 x + b1 y + c1 z − d1 , a2 x + b2 y + c2 z − d2 ) = (a3 x + b3 y + c3 z − d3 , a4 x + b4 y + c4 z − d4 ).

8. Consider the system of polynomial equations


(
x2 + y 2 − 5 = 0
x2 y2
4
+ 9
− 1 = 0.

(a) Eliminate the variable x to find a polynomial q(y) that must be 0.


(b) Prove that q(y) ∈ (x2 + y 2 − 5, x2 /4 + y 2 /9 − 1) = I.
(c) Prove that I = (x2 + y 2 − 5, q(y)).
(d) Prove that there is a polynomial p(x) such that I = (p(x), q(y)).
9. The solution set of a system of two linear equations, whose coefficients are not multiples of each other,
is a line but certainly never a point. In R[x, y, z], consider the system of polynomial equations
(
(x2 − 1)2 − y 2 − z 2 = 0
x2 + y 2 − 1 = 0.

Prove that the solution set of this system is precisely 4 points. (In particular, it is not a curve.)
2
10. Let F be any field. Consider the set of n × n matrices Mn×n (F ) as the affine space F n .
(a) Prove that SLn (F ) is an affine variety in Mn×n (F ).
(b) Prove that set of orthogonal matrices On (F ) is an affine variety.
(c) Find an explicit set of equations that define O3 (F ).

12.3
The Nullstellensatz
The correspondence between affine varieties and ideals in F [x1 , x2 , . . . , xn ] (12.2) is not a bijection.
The main theorems concerning this correspondence are called Nullstellensatz, coming in the so-called
weak form and the strong form. The German term “Nullstellensatz” literally means the Theorem
(“satz”) of the Locations (“stellen”) of Zeros (“null”). These theorems turn out to be profound
generalizations of the Fundamental Theorem of Algebra.

12.3.1 – The Nullstellensatz (Weak Form)


The following sequence of propositions deals with the different notions of finitely generated as an R-
module, finitely generated as an R-algebra, and finitely generated as a field. The relationships
between these notions imply profound consequences for the solutions to systems of polynomial
equations.

Proposition 12.3.1
Let A be a Noetherian ring. Let A ⊆ B ⊆ C be rings such that C is finitely generated
as an A-algebra and such that C is finitely generated as a B-module. Then B is finitely
generated as an A-algebra.

Proof. Suppose that x1 , x2 , . . . , xm generate C as an A-algebra and that y1 , y2 , . . . , yn generate C


as a B-module. For each i with 1 ≤ i ≤ m,
n
X
ui = bji vj (12.3)
j=1
12.3. THE NULLSTELLENSATZ 627

for constants βij ∈ B. Furthermore, since C is a ring, the product of generators of C over B leads
k
to structure constants γij ∈ B satisfying
n
X
k
vi vj = γij vk . (12.4)
k=1

Let B 0 be the A-subalgebra B 0 = A[βij , γij k


] ⊆ B. By Corollary 12.1.19, B 0 is a Noetherian ring.
An element in C has the form p(u1 , u2 , . . . , um ), for some polynomial p ∈ A[x1 , x2 , . . . , xm ].
Substituting (12.3) in for p(u1 , u2 , . . . , um ) and repeatedly using the products in (12.4), we see that C
is generated by the y1 , y2 , . . . , yn as a module (not just as an algebra) over B 0 . By Proposition 12.1.16,
C is a Noetherian module over B 0 . Since B is a B 0 submodule of C, then by Proposition 12.1.9,
B is a Noetherian B 0 -module. Consequently, by Proposition 12.1.11, B is finitely generated as a
B 0 -module, say by elements w1 , w2 , . . . , w` . In particular, B is generated by w1 , w2 , . . . , w` as a B 0
algebra. Hence,
B = A[βij , γij
k
, wi ]
so B is finitely generated as an A-algebra. 

Proposition 12.3.2
Let F be a field and let E be a finitely generated F -algebra. If E is a field then it is a finite
extension of F .

Proof. Suppose that the field E is given by E = F [α1 , α2 , . . . , αn ]. Assume that E is not algebraic
over F . Then at least one of the generators is transcendental over F . In fact, we can renumber
the generators of E so that for some r ≥ 1, the generators α1 , . . . , αr are algebraically independent
over F and that αr+1 , . . . , αn are algebraic over K = F (α1 , α2 , . . . , αr ). Then E is a finite extension
of K, that is a finite-dimensional vector space over K, and also finitely generated as a K-module.
Since F ⊆ K ⊆ E, by Proposition 12.3.1, K is finitely generated as a F -algebra, so in fact K =
F [β1 , β2 , . . . , βs ], where βi ∈ F (α1 , α2 , . . . , αr ).
Now each βi is of the form
fi (α1 , α2 , . . . , αr )
βi =
gi (α1 , α2 , . . . , αr )
for polynomials fi , gi ∈ F [x1 , x2 , . . . , xr ]. Recall that F [x1 , x2 , . . . , xr ] = F [α1 , α2 , . . . , αr ] is a UFD.
There are a variety of ways to see that F [x1 , x2 , . . . , xr ] has an infinite number of prime (irreducible)
elements. Consequently, there exists an irreducible polynomial h that does not divide any of the gi .
By properties of addition and multiplication of fractions, every rational expression f /g ∈ K, when
in reduced form, has a denominator that is divisible by some divisors of g1 , g2 , . . . , gr . However,
the rational expression 1/h, which is in F (α1 , α2 , . . . , αr ), does not satisfy this property. This is a
contradiction. Therefore, E is algebraic over F . Since E is algebraic and generated over F by a
finite number of elements, then E is a finite extension of F . 

Theorem 12.3.3 (The Weak Nullstellensatz)


Let F be a field and let R be a finitely generated F -algebra. Let M be a maximal ideal of
R. Then the field R/M is a finite extension of F . In particular, if F is algebraically closed,
then R/M ∼= F.

When F is algebraically closed, the Weak Nullstellensatz has many important equivalent formu-
lations.

Corollary 12.3.4
Let F be an algebraically close field. The maximal ideals in F [x1 , x2 , . . . , xn ] are of the
form (x1 − c1 , . . . , xn − cn ) for some point (c1 , c2 , . . . , cn ) ∈ F n .
628 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

Proof. Let M be a maximal ideal of the F [x1 , x2 , . . . , xn ]. Since F [x1 , x2 , . . . , xn ]/M ∼


= F by the
Nullstellensatz, then for all xi , in F [x1 , x2 , . . . , xn ]/M , we have xi = ci for some ci ∈ F . Thus,
xi − ci ∈ M . Then the ideal (x1 − c1 , . . . , xn − cn ) ⊆ M . However, F [x1 , x2 , . . . , xn ]/I ∼
= F so by
the Fourth Isomorphism Theorem, M = I. 

Corollary 12.3.5
Let F be an algebraically closed field. Suppose that f1 , f2 , . . . , fs ∈ F [x1 , x2 , . . . , xn ] are
such that the system of equations

f (x , x , . . . , xn ) = 0
 1 1 2


..
. (12.5)


f (x , x , . . . , x ) = 0
s 1 2 n

has no solutions in F n . Then the ideal (f1 , f2 , . . . , fs ) is equal to F [x1 , x2 , . . . , xn ].

Proof. We prove the contrapositive. Consider the ideal I = (f1 , f2 , . . . , fs ) in F [x1 , x2 , . . . , xn ].


Suppose that I is a strict ideal. By Proposition 12.2.11 the set of solutions of (12.5) is V(I). Every
strict ideal in F [x1 , x2 , . . . , xn ] is in a maximal ideal and by Corollary 12.3.4, every maximal ideal
F [x1 , x2 , . . . , xn ] is of the form M = (x1 − c1 , . . . , xn − cn ). Consequently, if f ∈ I ⊆ M , then

f (x1 , x2 , . . . , xn ) = p1 (x1 , x2 , . . . , xn )(x1 − c1 ) + · · · + pn (x1 , x2 , . . . , xn )(xn − cn )

for some polynomials pi ∈ F [x1 , x2 , . . . , xn ]. But then f (c) = 0. Hence, c ∈ V(I) and thus the
system (12.5) has c = (c1 , c2 , . . . , cn ) as a solution. 

Another way of stating this corollary is that if F is an algebraically closed field and I is an ideal
F [x1 , x2 , . . . , xn ], then
V(I) = ∅ =⇒ I = F [x1 , x2 , . . . , xn ].
The version of the Nullstellensatz given in Corollary 12.3.5 shows why the Nullstellensatz serves the
role of the Fundamental Theorem of Algebra for multivariable polynomials. It says that any ideal I
of polynomials that is strictly less that C[x1 , x2 , . . . , xn ] has some common zeros in Cn .
We point out that the property of the field being algebraically closed is a necessary requirement.
For example, the polynomial x2 + y 2 + 1 ∈ R[x, y] has no solutions and yet the ideal (x2 + y 2 + 1)
is not all of R[x, y].

12.3.2 – The Strong Nullstellensatz


Recall from Definition 5.5.21 that the radical of an ideal I in a commutative ring R is

I = {r ∈ R | rn ∈ I for some n ∈ N∗ }.
√ √
We saw earlier that I is another ideal. Furthermore,
p√ it is obvious that I ⊆ I. In Exercise 5.5.36,

we showed that that every ideal I satisfies I = I. This inspires the following terminology.

Definition 12.3.6

An ideal I in a commutative ring R is called a radical ideal if I = I.

Theorem 12.3.7 (The Strong Nullstellensatz)


Let F be an
√ algebraically closed field and let I be an ideal in F [x1 , x2 , . . . , xn ]. Then
I(V(I)) = I.
12.3. THE NULLSTELLENSATZ 629

Proof. By Hilbert’s√Basis Theorem, I = (f1 , f2 , . . . , fs )√for some finite set of polynomials fi .


We first prove I ⊆ I(V(I)). Suppose that f ∈ I. Then for some positive integer m, the
polynomial f m ∈ I. Hence, f m = p1 f1 + p2 f2 + · · · + ps fs for some polynomial pi ∈ F [x1 , x2 , . . . , xn ].
Then for any c ∈ V(I), we know that fi (c) = 0. Hence, f m (c) =√(f (c))m = 0. Since F is an integral
domain, f (c) = 0 for all c ∈ V(I). Thus, f ∈ I(V(I)) and thus I ⊆ I(V(I)).
We now prove the reverse containment. Let f ∈ I(V(I)) so that f vanishes at every common
zero of the polynomials f1 , f2 , . . . , fs . We must show that there exists m ∈ N∗ and polynomials
p1 , p2 , . . . , ps ∈ F [x1 , x2 , . . . , xn ] such that

f m = p1 f1 + p2 f2 + · · · + ps fs . (12.6)

Consider the ideal I˜ = (f1 , . . . , fs , 1 − yf ) in the new ring F [x1 , . . . , xn , y]. We prove that V(I)
˜ = ∅.
n+1
Let (b1 , . . . , bn , bn+1 ) ∈ F .

Case 1: (b1 , . . . , bn ) ∈ V(I). Then f (b1 . . . , bn ) = 0 and, evaluated at (b1 , . . . , bn , bn+1 ), the poly-
nomial 1 − yf is 1. In particular, (b1 , . . . , bn , bn+1 ) ∈ ˜
/ V(I).

Case 2: (b1 , . . . , bn ) ∈ / V(I). Then for at least one of the generating polynomials fi , we must have
fi (b1 , . . . , bn ) 6= 0. Viewing fi as a polynomial in x1 , . . . , xn , y, though a constant with respect
to y, gives fi (b1 , . . . , bn , bn+1 ) 6= 0. This shows that (b1 , . . . , bn , bn+1 ) ∈ ˜
/ V(I).

The two cases cover all points in the affine space so V(I) ˜ = ∅.
By the Weak Nullstellensatz (Corollary 12.3.5), we deduce that I˜ = F [x1 , . . . , xn , y]. In particu-
˜ Thus, there exist polynomials q1 , . . . , qs , r ∈ F [x1 , . . . , xn , y] such that
lar, 1 ∈ I.
s
X
1 = r(x1 , . . . , xn , y)(1 − yf ) + qi (x1 , . . . , xn , y)fi .
i=1

Considering this expression as an element in F (x1 , . . . , xn )[y], set y = f1 . This gives the identity as
a rational expression
s  
X 1
1= qi x1 , . . . , xn , fi (x1 , . . . , xn ).
i=1
f

Let m be the maximum power of y appearing in any polynomial qi . Then multiplying this rational
expression by f m clears the denominators and returns an element in F [x1 , . . . , xn ], namely
s  
X 1
fm = f m qi x1 , . . . , xn , fi (x1 , . . . , xn ).
i=1
f

m
√ pi = f qi (x1 , . . . , xn , 1/f (x1 , . . . , xn )) establishes the desired result (12.6). Thus,
Hence, setting
I(V(I)) ⊆ I and the theorem follows. 

Example 12.3.8. Consider the polynomials f1 (x, y) = (x − 1)2 + y 2 − 1 and f2 (x, y) = x. The
solutions separately of these polynomials correspond respectively to a circle of radius 1 centered at
(1, 0) and the y-axis. If we consider the solution set of both polynomials, we can start with the ideal
I = (f1 , f2 ) and consider the variety V(I). Geometrically, V(I) corresponds to the intersection of the
circle described above and the y-axis. Hence, V(I) = {(0, 0)}. It is easy to see that I(V(I)) = (x, y).
On the other hand, note that f1 (x, y) = x2 − 2x + y 2 . Since x2 − 2x = (x − 2)x, we see that
I ⊆ (x, y 2 ). But x = f2 (x, y) ∈ I and y 2 = f1 (x, y) − (x − 2)f2 (x, y), so (x, y 2 ) ⊆ I. This shows that
I = (x, y 2 ). So this is the simplest expression of I. Note that this differs √ from (x, y) since y ∈ / I.
However, this does give an example of the Strong Nullstellensatz that I = (x, y) = I(V(I)). 4

We summarize the results in this section to give a complete correspondence between affine vari-
eties and ideals.
630 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

Theorem 12.3.9 (Ideal-Variety Correspondence 1)


Let F be any field. The maps V and I,

ideals in F [x1 , x2 , . . . , xn ] affine varieties in AnF

are inclusion-reversing functions. The function I is an injection with V(I(V )) = V for all
affine varieties V in AnF .

Proof. All that is left to show is that I is injective. Let V be an affine variety V(f1 , f2 , . . . , fs ).
Suppose that f ∈ I(V ). By definition f (c) = 0 for all c ∈ V . Thus, c ∈ V(I(V )) for all c ∈ V and so
V ⊆ V(I(V )). Conversely, f1 , f2 , . . . , fs ∈ I(V ) by definition of I(V ). Thus, (f1 , f2 , . . . , fs ) ⊆ I(V ).
Since V is inclusion reversing, we deduce that V(I(V )) ⊆ V(f1 , f2 , . . . , fs ) = V . 

Theorem 12.3.10 (Ideal-Variety Correspondence 2)


Let F be an algebraically closed field. The maps V and I,

radical ideals in F [x1 , x2 , . . . , xn ] affine varieties in AnF

are inclusion-reversing bijective functions.

Proof. Let V be the affine variety V = V(f1 , f2 , . . . , fs ). √ Then V = V(I), where I is the ideal
I = (f1 , f2 , . . . , fs ). By the Strong Nullstellensatz, I(V ) = I. In particular, the image of I from
the set of affine varieties in AnF is the set of radical ideals. By Theorem
√ 12.3.9, V(I(V )) = V . Again
by the Strong Nullstellensatz, if I is a radical ideal, then I(V(I)) = I = I. 

Together, the above theorems give strong results about affine varieties, and by extension to
solution sets of systems of polynomial equations. These results are in some sense stronger than
properties of systems of linear equations studied in linear algebra. For example, a system of the
form (
a11 x + a12 y + a13 z − b1 = 0
a21 x + a22 y + a23 z − b2 = 0

corresponds to the intersection of two planes in R3 . If the planes are in general position, their
intersection is a line. The pair of equations does not correspond uniquely to the solution set since
there are many pairs of planes that intersect in L. According to Theorem 12.3.10, if we work over
C, though the system of equations will not uniquely correspond to the solution set, the ideal

I = (a11 x + a12 y + a13 z − b1 , a21 x + a22 y + a23 z − b2 ),

which is a radical ideal, does correspond uniquely to the affine variety (solution set) L.
12.4. POLYNOMIAL DIVISION; MONOMIAL ORDERS 631

Exercises for Section 12.3


1. Let F be a field and let c ∈ F n .
(a) Prove that the evaluation function, evc : F [x1 , x2 , . . . , xn ] → F defined by evc (f ) = f (c), is a
ring homomorphism.
(b) Prove that Ker(evc ) = (x1 − c1 , x2 − c2 , . . . , xn − cn ).
(c) Prove that F is algebraically closed if and only if all maximal ideals of F [x1 , x2 , . . . , xn ] are
kernels of evaluation homomorphisms.
2. Let V and W be affine varieties in An F with V = V(I) and W = V(J) for some ideals I and J in
F [x1 , x2 , . . . , xn ].
(a) Prove that V ∩ W = V(I + J) and conclude that V ∩ W is another affine variety.
(b) Prove that V ∪ W = V(IJ) and conclude that V ∪ W is another affine variety.
(c) Show that an intersection of any collection (not necessarily finite) of affine varieties is again an
affine variety.
(d) Show that a union of a collection of affine varieties need not be an affine variety.
3. Prove that the ideal I = (x2 + y 2 − z 2 , x2 − y 2 + z 2 ) is equal to J = (x2 , y 2 − z 2 ). Using Exercise 12.3.2,
and interpreting V(J), describe what V(I) is geometrically.
4. Prove that every finite set of points in F n is an affine variety. (See Exercise 12.3.2.)
5. Prove that every affine variety is the intersection of a finite number of hypersurfaces.
6. Let R be a commutative ring. Recall that the nilradical of R is the ideal

NR = {r ∈ R | rm = 0 for some m ≥ 1}.



(a) Prove that NR/I = I/I for any ideal I in R.
(b) Deduce that I is a radical ideal if and only if NR/I = {0}.
7. Let Z ⊆ F n be a subset of the affine space F n , where F is a field. Prove that V(I(Z)) is the smallest
(by inclusion) affine variety that contains the subset Z.

12.4
Polynomial Division; Monomial Orders
The study of systems of polynomial equations in F [x1 , x2 , . . . , xn ], where F is a field, generalizes two
areas of algebra studied in previous courses: (1) polynomial equations in one variable; (2) systems
of linear equations. Techniques used in each of these areas inspire algorithms that are relevant for
solving systems of polynomial equations.
Recall that F [x] is a Euclidean domain with the degree serving as a Euclidean function. Conse-
quently, F [x] is a PID and also a UFD. Because F [x] is a Euclidean domain, it is easy to tell when a
polynomial f (x) ∈ F [x] is in an ideal I = (p(x)) of F [x]: if and only if the remainder of f (x), when
divided by p(x), is 0. Furthermore, the Euclidean Algorithm gives a method to calculate the great-
est common divisor of two polynomials. In contrast, though the ring of multivariable polynomials
F [x1 , x2 , . . . , xn ] is a UFD (Theorem 6.5.5), for n > 1, it is not a PID and hence it is not a Euclidean
domain. Consequently, it is much harder to tell when a polynomial f ∈ F [x1 , x2 , . . . , xn ] is in an
ideal. Consequently, among problems we would like to solve in the study of F [x1 , x2 , . . . , xn ], is the
Ideal Membership Problem: how to decide if f ∈ I.
The Gauss-Jordan elimination algorithm on a system of linear equations gives the solutions to the
system as a parametrization. Hence, as a part of solving systems of polynomial equation, we consider
the Problem of Parametrizing Varieties: Given a system of polynomial equations in x1 , x2 , . . . , xn ,
find parameters t1 , t2 , . . . , tm and rational functions g1 , g2 , . . . , gn such that

xi = gi (t1 , t2 , . . . , tm )
632 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

for all (t1 , t2 , . . . , tm ) in some set give solutions to the system of equations. As a reverse and related
problem, we will consider the Implicitization Problem: Given a parametrization,

x = g1 (t1 , t2 , . . . , tm )
 1


..
 .

x = g (t , t , . . . , t ),
n n 1 2 m

where gi are rational functions, find polynomials f1 , f2 , . . . , fs in F [x1 , x2 , . . . , xn ] such that the set
parametrized by the gi functions is the solution set to fi (x1 , x2 , . . . , xn ) = 0 with 1 ≤ i ≤ s.
In this section and the next two, we introduce a variety of algorithms associated to multivariable
polynomials rings over a field.

12.4.1 – Monomial Orderings


Many algorithms, including polynomial division, requires a choice of how to order the monomials of
a polynomial. With polynomials in one variable, the usual manner of ordering monomials is from
largest to least degree, i.e., xn ≥ xm if and only if n ≥ m. The Gauss-Jordan elimination algorithm
implicitly uses an ordering on the variables, namely

x1 > x2 > · · · > xn > 1.

In other words, the Gauss-Jordan algorithm performs certain operations first on x1 , then on x2 ,
and so forth. The choice of monomial order in the case of the single variable is natural, especially
because of the Euclidean division on polynomials. The choice of ordering variables in the Gauss-
Jordan elimination algorithm is arbitrary.
To define a partial order on monomials xα in the variables x1 , x2 , . . . , xn is tantamount to choosing
a partial order < on Nn so that xα < xβ if and only if α < β. From now on, when we discuss partial
orders on monomials, we view them equivalently as orders on Nn .
Not every partial order on the monomials of x1 , x2 , . . . , xn will be useful for algorithms. Often, the
partial order on monomials is necessary to decide the leading term of a polynomial, i.e., the greatest
monomial with respect to a specified order. To ensure that an algorithm can always proceed, two
monomials should always be comparable. This means that for all α, β ∈ Nn we would like for α 4 β
or β 4 α. Hence, we will only consider total orders on Nn .
As another requirement, we need for algorithms to terminate. By Proposition 1.4.21, this re-
quirement translates into the property that with respect to 4 on Nn every nonempty subset S ⊆ Nn
contains a least element, namely that 4 is a well-ordering.
Finally, for many algorithms on monomials it turns out to be convenient that an order on the
monomials is preserved during multiplication. In other words,

xα 4 xβ =⇒ xα xγ 4 xβ xγ for all γ ∈ Nn .

These three requirements lead to the following definition.

Definition 12.4.1
A partial order 4 on Nn is called a monomial order if
(1) 4 is a total order (α 4 β or β 4 α for all α, β ∈ Nn );
(2) 4 is a well-ordering (every nonempty subset of Nn has a least element);

(3) if α 4 β then α + γ 4 β + γ for all γ ∈ Nn .

There are a variety of monomial orders commonly used in algorithms on multivariable polynomial
rings. In the following examples, we prove that the lexicographic order is a monomial order but leave
proofs for other examples to the exercises. The exercises also discuss other monomial orders.
12.4. POLYNOMIAL DIVISION; MONOMIAL ORDERS 633

Example 12.4.2 (Lexicographic, I). The natural numbers (N, ≤) is a totally ordered set. Sec-
tion 1.4.6 described the lexicographic order on Cartesian products of posets. The lexicographic order
≤lex on Nn is defined by α = β if and only if αi = βi for all 1 ≤ i ≤ n and

(α1 , α2 , . . . , αn ) <lex (β1 , β2 , . . . , βn ) ⇐⇒ αj < βj where j = min{i ∈ {1, . . . , n} | αi 6= βi }.

For example, in N4 we have (2, 7, 1, 8) ≤lex (3, 1, 4, 1) and (1, 7, 5, 21) ≤lex (1, 7, 7, 2).
We show that the lexicographic order (often abbreviated to the lex order ) is a monomial order.
Let α, β ∈ Nn be distinct and let j be the least index for which αj 6= βj . Since ≤ is a total order on
N, we deduce that αj < βj or else αj > βj . Therefore, α ≤lex β or β ≤lex α and so ≤lex is a total
order. Also, for all γ ∈ Nn , the first index in which α + γ differs from β + γ is j and since αj < βj ,
then αj + γj < βj + γj so α + γ ≤lex β + γ.
Finally, we can show that ≤lex is a well-ordering. Let S be any nonempty subset of Nn . Call
S = S0 . Recursively define the following integers and sets

ci = min{αi | α ∈ Si−1 } and Si = {α ∈ Si−1 | αi = ci }.

Assuming that Si−1 is nonempty, the integer ci exists by virtue of the well-ordering of ≤ on N and
therefore, Si exists and is nonempty. By induction, ci and Si exist for all 1 ≤ i ≤ n. By construction,

S0 ⊇ S1 ⊇ · · · ⊇ Sn

and, in fact, Si = {α ∈ S | αj = cj for 1 ≤ j ≤ i}. Consequently, Sn = {(c1 , c2 , . . . , cn )}. Now


suppose that β ∈ Si and γ ∈ Si−1 − Si . Then αj = γj for all 1 ≤ j ≤ i − 1 but βi = ci and so by
definition of ci , we must have γi > βi . Thus, β <lex γ. By induction, (c1 , c2 , . . . , cn ) <lex α for all
α ∈ S − {(c1 , c2 , . . . , cn )}. Thus, S has a least element.
Consider the polynomial f (x, y, z) = 7x2 y 3 − 2xy 3 z 2 + 2y 3 z 2 + x − 3 ∈ R[x, y, z]. With respect
to ≤lex writing the terms of f (x, y, z) in decreasing order gives

f (x, y, z) = 7x2 y 3 − 2xy 3 z 2 + x + 2y 3 z 2 − 3.

Note that equivalently, α <lex β if and only if the leftmost nonzero entry of β − α is positive. 4

Example 12.4.3 (Lexicographic, II). The previous example described the lexicographic order
on Nn but it is not as general as it could be. According to Example 12.4.2,

x1 >lex x2 >lex · · · >lex xn

because these monomials correspond to the n-tuples

(1, 0, 0, . . . , 0) >lex (0, 1, 0, . . . , 0) >lex · · · >lex (0, 0, 0, . . . , 1).

However, we can also define a lexicographic order in which the variables are ordered differently.
As a specific example, consider the lexicographic order on monomials in x, y, z such that y > z >
x. Writing the terms of the polynomial f (x, y, z) in Example 12.4.2 in decreasing order gives

f (x, y, z) = −2xy 3 z 2 + 2y 3 z 2 + 7x2 y 3 + x − 3.

Since there are n! ways of ordering the variables x1 , x2 , . . . , xn , there are n! lexicographic monomial
orders on n variables. 4

Example 12.4.4 (Graded Lexicographic). Lexicographic orders are a natural order to devise
for monomials. However, it is sometimes desirable to group together monomials of the same total
degree. For example, in the expression

g(x, y, z) = (−7y 2 z 4 + 3x2 y 2 z 2 + x5 z) + (2xy 2 z 2 + 7x4 y − 3yz 4 ) + (x2 + y 2 ),

monomials of same total degree are gathered. These are the homogeneous components of g.
634 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

With respect to some order on the variables (as described in Example 12.4.3), we define the
graded lexicographic order ≤grlex on Nn by
(
|α| < |β| if |α| =
6 |β|
α <grlex β ⇐⇒
α <lex β if |α| = |β|.

In other words, the graded lexicographic (or more briefly grlex ) order first distinguishes monomials
by their total degrees and then among monomials of the same total degree, distinguishes by a lex
order.
Writing the terms of g(x, y, z) in decreasing grlex order with x > z > y gives

g(x, y, z) = x5 z + 3x2 y 2 z 2 − 7y 2 z 4 + 7x4 y + 2xy 2 z 2 − 3yz 4 + x2 + y 2 . 4

Example 12.4.5 (Graded Reverse Lexicographic). With respect to some order on the vari-
ables, the graded reverse lexicographic order ≤grevlex on Nn is defined by α <grevlex β if and only if
α1 + α2 + · · · + αn < β1 + β2 + · · · + βn , or α1 + α2 + · · · + αn = β1 + β2 + · · · + βn and for the least
xi such that αi 6= βi , we have βi < αi .
Reverse lexicographic order corresponds to reversing both the order on the variables and the
partial order of comparison for the powers. For example, assuming that x1 > x2 > x3 > x4 , over
N4 , we have
α = (4, 1, 2, 3) <grevlex (0, 5, 1, 3) = β
because the rightmost entry in which α and β differ—the third entry—has β3 < α3 . So the graded
reverse lexicographic (or more briefly grevlex ) order distinguishes between monomials first by total
degree and then by a reverse lexicographic.
As a specific example, note that
yz 4 >grevlex xy 2 z 2
because they have the same total degree but, starting from the smallest variable (y), the power on
y of yz 4 , namely 1, is less than the power on y of xy 2 z 2 .
Writing the terms of the polynomial g(x, y, z) in decreasing grevlex order with x > z > y gives

g(x, y, z) = x5 z + 3x2 y 2 z 2 − 7y 2 z 4 + 7x4 y − 3yz 4 + 2xy 2 z 2 + x2 + y 2 .

Similarly, writing the terms of g(x, y, z) in decreasing grevlex order with y > z > x gives

g(x, y, z) = −7y 2 z 4 + 3x2 y 2 z 2 + x5 z − 3yz 4 + 2xy 2 z 2 + 7x4 y + y 2 + x2 . 4

As we saw in the above examples, since a monomial order ≤ is a total order, we can follow
the habit with polynomials of a single variable of writing the terms of a multivariable polynomial
in decreasing order with respect to ≤. It is useful to have a notation for the leading term of a
polynomial with respect to a given monomial order.

Definition 12.4.6
Let 4 be a fixed monomial order on Nn . Let p ∈ F [x1 , x2 , . . . , xn ] and suppose that aα xα
is the term of p with the largest (with respect to 4) multidegree. Then

(1) the multidegree of p is mdeg p = α;


(2) the leading term of p is LT(p) = aα xα ;
(3) the leading coefficient of p is LC(p) = aα ;

(4) the leading monomial of p is LM(p) = xα .

Definition 12.4.1(3) leads to the important result that is useful for many algorithms.
12.4. POLYNOMIAL DIVISION; MONOMIAL ORDERS 635

Proposition 12.4.7
Let f, g ∈ F [x1 , x2 , . . . , xn ]. With respect to any monomial order,
(1) mdeg(f g) = mdeg(f ) + mdeg(g);

(2) mdeg(f + g) ≤ max(mdeg(f ), mdeg(g)).

Proof. (Left as an exercise for the reader. See Exercise 12.4.6.) 

12.4.2 – A Multivariable Polynomial Division Algorithm


Polynomial division in F [x] is such that for all p(x), a(x) ∈ F [x] and a(x) 6= 0, there exists polyno-
mials q(x) and r(x) such that

p(x) = a(x)q(x) + r(x) where r(x) = 0 or deg r(x) < deg a(x).

Recall that q(x) is called the quotient and r(x) is the remainder.
From the perspective of ideals in F [x], polynomial division corresponds to finding an element
r(x) with r(x) = 0 or deg r(x) < deg a(x) such that f (x) − r(x) ∈ (a(x)). This shows that the ideal
membership problem in F [x] is trivial. Every ideal I in F [x1 , x2 , . . . , xn ] is principal so I = (a(x)).
Thus, f (x) ∈ I if and only if the remainder of f (x) when divided by a(x) is 0.
In the context of a multivariable polynomial ring F [x1 , x2 , . . . , xn ], ideals are no longer neces-
sarily principal. However, by Hilbert’s Basis Theorem, every ideal I is finitely generated with say
I = (a1 , a2 , . . . , as ). So a multivariable polynomial division should allow for multiple divisors. An
algorithm for such a division should take a polynomial f , a list of polynomials a1 , a2 , . . . , as and
return a list q1 , q2 , . . . , qs and a polynomial r such that

f = a1 q1 + a2 q2 + · · · + as qs + r

and no term of r is divisible by LT(ai ) for all 1 ≤ i ≤ s. Note that the reference to a leading term
means that this division algorithm must be done in reference to a specific monomial order 4.
The following algorithm implements a multivariable polynomial division.

Algorithm 12.4.1: MultiPolyDivision(f, (a1 , a2 , . . . , as ))

g←f
r←0
for j ← 1 to s
do qj ← 0
whileg 6= 0

 i←1
while i ≤ s




if LM(ai ) | LM(g)



 

qi ← qi + LT(g)/LT(ai )

 


 
 
then g ← g − (LT(g)/LT(ai ))ai

 
do

do i←1




 


 

else i ← i + 1

 


if i = s +
1




r ← r + LT(g)


 then


g ← g − LT(g)
return (r, (q1 , q2 , . . . , qs ))
636 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

Every time g is replaced by g − (LT(g)/LT(ai ))ai , the leading term of (LT(g)/LT(ai ))ai is LT(g)
so the difference polynomial g − (LT(g)/LT(ai ))ai removes the leading term of g. Therefore, each
time g is changed, either g becomes 0 or
 
LT(g)
LT g − ai ≺ LT(g).
LT(ai )

If instead no LT(g) is not divisible by any LT(ai ), then in the conditional statement “if i = s + 1”
the leading term of g is passed to the remainder r. Then again, the new leading term of g is strictly
less. Consequently, through each iteration of the outermost while loop, the leading terms of g
create a strictly decreasing sequence of monomials, starting with LT(f ). Since a monomial order
is a well-ordering, the sequence of leading monomials terminates by Proposition 1.4.21. Thus, the
algorithm terminates.
It is not hard to check that the identity g = a1 q1 + a2 q2 + · · · + as qs + r is preserved through
each while loop. Consequently, f − r = a1 q1 + a2 q2 + · · · + as qs at the end of the algorithm. These
remarks prove the following proposition.

Proposition 12.4.8
The algorithm MultiPolyDivision terminates. None of the terms of r are divisible a
leading term LT(ai ). Furthermore, r = 0 or mdeg(r) 4 mdeg(f ).

Definition 12.4.9
We call r the remainder of f divided by the s-tuple G = (a1 , a2 , . . . , as ) and we denote r
by rem (f, G).

It is important to notice that the algorithm MultiPolyDivision depends not only on the
monomial order chosen, but also on the order of the polynomials in the list (a1 , a2 , . . . , as ). The
algorithm always tries to divide g by a1 and once the leading term of g is not divisible by the leading
term of a1 , attempts to divide g by a2 , and so forth.
Example 12.4.10. Consider f (x, y) = 2x2 y + 3xy 2 + 4y 2 ∈ R[x, y] and let I = (x2 − xy + 1, y 2 − 1).
We set a1 = x2 −xy+1 and a2 = y 2 −1. Let us use the lexicographic order with x > y. Implementing
the above division algorithm and keeping track of terms in a vein similar to polynomial long division,
we get the following calculation.

q1 = 2y
q2 = 5x + 4 r
x2 − xy + 1
2x2 y+3xy 2 +4y 2
y2 − 1
2x2 y−2xy 2 +2y
5xy 2 +4y 2 −2y
5xy 2 −5x
5x +4y 2 −2y → 5x
2
4y −2y
4y 2 −4
−2y+4 → 5x − 2y
4 → 5x − 2y + 4
0

The order in which terms get added to q1 , q2 , and r is: (1) 2y is added to q1 because LT(a1 ) = x2
divides 2x2 y; (2) 5x is added to q2 because LT(a1 ) = x2 does not divide 5xy 2 but LT(a2 ) = y 2 does;
(3) the term 5x moves over to the remainder column because it is not divisible by either LT(a1 ) or
12.4. POLYNOMIAL DIVISION; MONOMIAL ORDERS 637

LT(a2 ); (4) 4 is added to q2 because LT(a2 ) divides 4y 2 ; (5) after that none of the terms in g are
divisible by leading terms of a1 or a2 so the rest moves over to r.
The result of this calculation is that

2x2 y + 3xy 2 + 4y 2 = (2y)(x2 − xy + 1) + (5x + 4)(y 2 − 1) + 5x − 2y + 4 (12.7)

and we observe that no term of r = 5x − 2y + 4 is divisible by either LT(a1 ) = x2 or LT(a2 ) = y 2 .4

As the following example shows, the monomial order changes the result of the polynomial division
algorithm.
Example 12.4.11. Consider the same polynomial and the same ideal as in the previous example
but use the lexicographic order with y > x. The polynomial division algorithm would be the
following. (Note that we continue to list the terms of polynomials in decreasing order with respect
to the monomial order.)

q1 = −3y − 5x
q
 2=4 r
−xy + x2 + 1
3xy 2 +4y 2 +2x2 y
y2 − 1
3xy 2 −2x2 y−3y
4y 2 +5x2 y+3y
4y 2 −4
5x2 y+3y +4
5x2 y−5x3 −5x
3y +5x3 +5x + 4 → 3y + 5x3 + 5x + 4
0

In the last stage, we passed all the terms of the intermediate polynomial g to r because none of them
are divisible by LM(a1 ) = −xy or LM(a2 ) = y 2 . Interestingly enough, the term 5x3 is divisible by
x2 , but, in the lex order with y > x, the term x2 in a1 is not the leading term.
This division algorithm leads to the polynomial division of

3xy 2 + 4y 2 + 2x2 y = (−3y − 5x)(−xy + x2 + 1) + 4(y 2 − 1) + 3y + 5x3 + 5x + 4. (12.8)


4
As a final example, we illustrate how the order in which we list the a1 , a2 , . . . , as generators of I
affect the outcome of this algorithm.
Example 12.4.12. Use the same polynomials, the same ideal I, and lex order with y > x as in
Example 12.4.11 but set a1 = y 2 − 1 and a2 = −xy + x2 + 1. The division algorithm gives:

q1 = 3x + 4
q2 = −2x r
y2 − 1
3xy 2 +4y 2 +2x2 y
−xy + x2 + 1
3xy 2 −3x
4y 2 +2x2 y+3x
4y 2 −4
2x2 y+3x +4
2x2 y−2x3 −2x
2x3 +5x+4 → 2x3 + 5x + 4
0
638 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

The result of this calculation is that

3xy 2 + 4y 2 + 2x2 y = (3x + 4)(y 2 − 1) + (−2x)(−xy + x2 + 1) + 2x3 + 5x + 4. 4

It is essential to notice that remainder r is a different polynomial in each of the above three
cases. In other words, the division algorithm depends both on the monomial order chosen for the
division algorithm and on the order in which the generators of the ideal are listed for the purposes
of the algorithm.
This nonuniqueness of the remainder makes the Ideal Membership Problem more challenging
than in the case of the Euclidean domain F [x]. The difference of (12.7) and (12.8), and then divided
by 5, gives the combination

(x + y)(x2 − xy + 1) + x(y 2 − 1) = x3 + y.

In particular, x3 + y is in the ideal (x2 − xy + 1, y 2 − 1). However, with respect to the lex order
with y > x, none of the terms of y + x3 is divisible by the leading terms of a1 = −xy + x2 + 1 and
a2 = y 2 − 1. Consequently, it is not true that a polynomial p is in some ideal I if its remainder is 0
when divided by a generating set of I.

12.4.3 – Useful CAS Commands


Many computer algebra systems implement a multivariable polynomial division algorithm.

Maple Function
with(Groebner); Imports a library of commands for ideals of multivariable polynomial
rings, including polynomial division and Gröbner bases. The following
command is in the Groebner package.
NormalForm Multivariable polynomial division. The command NormalForm(f,G,T),
where f is a polynomial, G is a list of polynomials, and T is a mono-
mial order, implements the multivariable polynomial division algorithm
described in this section.
Maple has commands to define monomial orderings necessary for these
algorithms. Consult Maple help files to see how to define them.

Exercises for Section 12.4


1. Consider the polynomial f (x, y) = 5x4 y + 7xy 3 − x3 y 2 + 2y 3 − 4xy 3 ∈ R[x, y]. Write the terms in
decreasing order with respect to the following monomial orders: (a) lex x > y; (b) lex y > x; (c) grlex
y > x; (d) grevlex x > y.
2. Consider the polynomial f (x, y) = x2 + xy + y 2 + x3 + x2 y + xy 2 + y 3 ∈ R[x, y]. Write the terms in
decreasing order with respect to the following monomial orders: (a) lex x > y; (b) lex y > x; (c) grlex
x > y; (d) grlex y > x; (e) grevlex x > y.
3. Consider the polynomial f (x, y, z) = 7x2 y 2 + x3 − z 3 + 2xyz 2 − 2yz 3 ∈ R[x, y, z]. Write the terms in
decreasing order with respect to the following monomial orders: (a) lex x > y > z; (b) lex z > y > x;
(c) grlex y > x > z; (d) grevlex x > z > y.
4. Prove that for any ordering of the variables, the graded lexicographic order is a monomial order on
Nn .
5. Prove that for any ordering of the variables, the graded reverse lexicographic order is a monomial
order on Nn .
6. Let f, g ∈ F [x1 , x2 , . . . , xn ] and let ≤ be a monomial order.
(a) Prove that mdeg(f g) = mdeg(f ) + mdeg(g), where the addition is in Nn .
(b) Prove that mdeg(f + g) ≤ max(mdeg(f ), mdeg(g)).
12.4. POLYNOMIAL DIVISION; MONOMIAL ORDERS 639

7. Let w ∈ (Q>0 )n . Define the weighted lexicographic (briefly wlex) order on Nn , weighted by w, as
(
w · α < w · β if w · α 6= w · β
α <w β ⇐⇒
α <lex β if w · α = w · β,

where the lexicographic order is with respect to some order on the variables x1 , x2 , . . . , xn .
(a) Prove that for any weighted vector w and any order on the variables, the order ≤wlex is a
monomial order.
(b) Prove that with w = (1, 1, . . . , 1), we recover the graded lexicographic order.
8. Let w = (1, 2, 3) and consider the w-weighted lexicographic order defined in Exercise 12.4.7.
9. Let w ∈ (R>0 )n such that w1 , w2 , . . . , wn are linearly independent over Q. Define the weighted
lexicographic order on Nn weighted by w, written ≤w , as

α <w β ⇐⇒ w · α < w · β.

(a) Prove that ≤w is a monomial order.


√ √
(b) Using the fact that {1, 2} is linear independent over Q, setting w = (1, 2), write the terms of
f (x, y) = 5x4 y + 7xy 3 − x3 y 2 + 2y 3 − 4xy 3 in decreasing ≤w order.
10. Let M be an invertible n × n matrix of nonnegative integers that is invertible as a matrix in Mn (Q).
Define the M -matrix order ≤M on Nn by

α <M β ⇐⇒ M α <lex M β,

where by M α we consider matrix multiplication with α viewed as a column vector.


(a) Prove that for any such M , the M -matrix order is a monomial order.
(b) Prove that the lexicographic order with x1 > x2 > · · · > xn is the matrix order with matrix I.
(c) Prove that the lexicographic order with x1 < x2 < · · · < xn is the matrix order with the matrix
of 0s but with 1s on the opposite diagonal.
(d) Find a matrix M that corresponds to the graded lexicographic order with x1 > x2 > · · · > xn .
11. Let ≤ be a monomial order.
(a) Let m be a monomial. Prove that LT(mf ) = mLT(f ) for all f ∈ F [x1 , x2 , . . . , xn ].
(b) Prove or disprove that LT(f g) = LT(f ) · LT(g) for all f, g ∈ F [x1 , x2 , . . . , xn ].
12. Using the lexicographic order with x > y, perform the polynomial division algorithm of f (x, y) =
x2 y + xy 2 by a1 = x2 + y and a2 = xy − 2 to find the quotients q1 and q2 , and the remainder
rem (f, (a1 , a2 )).
13. Using the lexicographic order with y > x, perform the polynomial division algorithm of f (x, y) =
4xy 3 − 2x3 y + 5xy by a1 = x2 y + 2x − 1 and a2 = y 2 + x2 − 5 to find the quotients q1 and q2 , and the
remainder rem (f, (a1 , a2 )).
14. Using the graded lexicographic order with x > y, perform the polynomial division algorithm of
f (x, y) = 4xy 3 − 2x3 y + 5xy by a1 = x2 y + 2x − 1 and a2 = y 2 + x2 − 5 to find the quotients
q1 and q2 , and the remainder rem (f, (a1 , a2 )).
15. Perform the polynomial division algorithm of f (x, y, z) = x4 + y 4 + z 4 by a1 = x2 + 2xy − z and
a2 = 2xz − z 2 + 1 to find the quotients q1 and q2 , and the remainder rem (f, (a1 , a2 )),
(a) using the graded lexicographic order with x > y > z;
(b) using the lexicographic order y > z > x.
16. Using lexicographic order with y > z > x, find the remainder of the polynomial division of f (x, y, x) =
4xy 2 − 3xyz + 7yz 2 by a1 = y 2 − x, a3 = y 2 z − x2 and a3 = z 2 − 3y + 1.
17. Consider the parametric curve ~r(t) = (cos t, sin t, sin(2t)) with t ∈ [0, 2π].
(a) Prove that the image of the parametric curve is an affine variety in R by showing that it is
exactly V(x2 + y 2 − 1, z − 2xy).
(b) Using the lexicographic order with z > y > x, perform the division algorithm on f (x, y, z) =
x3 + y 3 + z 3 by a1 = x2 + y 2 − 1 and a2 = z − 2xy.
(c) Using the lexicographic order with x > y > z, perform the division algorithm on f (x, y, z) =
x3 + y 3 + z 3 by a1 = x2 + y 2 − 1 and a2 = z − 2xy.
640 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

18. The image of the parametric curve ~r(t) = (t, t2 , t3 ) with t ∈ R is called a twisted cubic.
(a) Prove that the twisted cubic is an affine variety by explicitly showing that it is V(y − x2 , z − x3 ).
(b) Let f ∈ R[x, y, z]. Prove that the result of the division algorithm of f by a1 = y − x2 and
a2 = z − x3 using the lexicographic order with z > y > x gives a polynomial r(x).
19. Consider the circle C in the yz-plane of radius 1 and center (y, z) = (2, 0).
(a) Show that C is an affine variety C = V(x, (y − 2)2 + z 2 − 1) in R3 .
(b) Let f (x, y, z) = (x2 + y 2 + z 2 − 5)2 − 16(1 − z 2 ). Prove that f (x, y, z) is the torus obtained by
rotating around the z-axis the circle in the xz-plane of radius 1 and center (2, 0, 0).
(c) Show that I = (x, (y − 2)2 + z 2 − 1) is a radical ideal.
(d) Show from geometric reasoning that f ∈ I.
(e) Find q1 and q2 such that f = xq1 + ((y − 2)2 + z 2 − 1)q2 .

12.5
Gröbner Bases
The previous section concluded with the observation that x3 + y ∈ (x2 − xy + 1, y 2 − 1) but that,
using the lexicographic order with y > x, the polynomial y +x3 is its own remainder when divided by
the pair (yx − x2 + 1, y 2 − 1). Hence, the polynomial division algorithm was generally not sufficient
to solve the ideal membership problem.
Example 12.5.1. As a more striking example, consider the ideal I = (xy 2 + 1, x2 y − 1) in C[x, y].
The polynomial x + y is in I because
x + y = x(xy 2 + 1) − y(x2 y − 1).
However, x + y is its own remainder when divided by the pair (xy 2 + 1, x2 y − 1) with respect to any
monomial order. (See Proposition 12.5.5.) In Exercise 12.5.14, we show that I = (x + y, y 3 − 1).
Now x + y, xy 2 + 1, and x2 y − 1 divided by the pair (x + y, y 3 − 1) and with the lex order with x > y
all have a remainder of 0. 4
The above examples illustrate that the generating set (basis) of I affects the output of the
multivariable polynomial division algorithm. We are led to think that there may be a basis that is
better than others. The problem with some bases is that the leading terms of the generators might
not divide the leading terms of all polynomials in the ideals. This section studies this interplay more
closely and shows that there always exists a “better” generating sets for ideals.

12.5.1 – Monomial Ideals


Before we can introduce Gröbner bases, we define monomial ideals.

Definition 12.5.2
An ideal I ⊆ F [x1 , x2 , . . . , xn ] is called a monomial ideal if there is a subset A ∈ Nn such
that I = (xα | α ∈ A).

For example, I = (x3 yz, xy 2 , z 4 ) is a monomial ideal in F [x, y, z]. Since an ideal is closed under
addition, monomial ideals do not consist of just monomials. The definition makes no assumption
that the set A is finite. Hilbert’s Basis Theorem tells us that any ideal I, including monomial
ideals, is generated by a finite number of polynomials but it is not obvious that a monomial ideal is
generated by a finite number of monomials. We need to characterize monomial ideals.
As a point of notation, we will generally denote a finite list of multidegrees by α(1), α(2), . . . , α(s)
to distinguish from the indices of each n-tuple in Nn . Thus, for each i with 1 ≤ i ≤ s, we have
α(i) = (α(i)1 , α(i)2 , . . . , α(i)n ).
12.5. GRÖBNER BASES 641

α2

(2, 3)

(5, 1)
α1

Figure 12.1: Monomials in (x2 y 3 , x5 y)

Proposition 12.5.3
Let I = (xα | α ∈ A) be a monomial ideal in F [x1 , x2 , . . . , xn ]. Then f ∈ I if and only if
every term of f is divisible by some monomial xα with α ∈ A.

Proof. The (⇐=) direction is obvious by properties of ideals.


(=⇒) Conversely, let f ∈ I. Then f = h1 xα(1) + h2 xα(2) + · · · + hs xα(s) for some polynomials
hi ∈ F [x1 , x2 , . . . , xn ]. In the right-hand side of this expression for f , every term (before like terms
are collected) is divisible by some monomial xα with α ∈ A. After collecting like terms, f must still
have this property. 

Proposition 12.5.4 (Dickson’s Lemma)


Let A be a subset of Nn . There exists a finite subset A0 ⊆ A such that the monomial ideal
I = (xα | α ∈ A) is equal to (xα | α ∈ A0 ).

Proof. By Hilbert’s Basis Theorem, and in particular Corollary 12.1.20, I is finitely generated with
I = (f1 , f2 , . . . , fr ) for some polynomial fi ∈ I. Let S = {xα | α ∈ A0 } be the set of all monomials
occurring in the polynomials fi with 1 ≤ i ≤ r. Thus, I ⊆ (S). Furthermore, the set A0 is finite
since each polynomial consists of a finite number of terms. By Proposition 12.5.3, each monomial in
S is in the monomial ideal I. Thus, S ⊆ I and therefore (S) ⊆ I. Hence, I = (S), so I is generated
by a finite number of monomials occurring in S. 
Dickson’s Lemma and Proposition 12.5.3 show that every monomial ideal I in F [x1 , x2 , . . . , xn ]
consists of polynomials whose terms are multiples of a certain finite set of monomials. For monomial
ideas in a polynomial ring of 2 variables, this lends itself well to a visual diagram. For example,
I = (x2 y 3 , x5 y) consists of all polynomials whose monomials are in the shaded area in Figure 12.1.
Dickson’s Lemma can be proved without reference to Hilbert’s Basis Theorem. However, because
of the order of our presentation, Dickson’s Lemma follows immediately from it. Surprisingly, this
lemma leads to a simpler characterization of monomial ideals.

Proposition 12.5.5
Let 4 be a partial order on Nn that satisfies
(1) 4 is a total order;
(2) if α 4 β, then α + γ 4 β + γ for all γ ∈ Nn .
Then 4 is a well-ordering if and only if 0 4 α for all α ∈ Nn .
642 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

Proof. Suppose that 4 is a well-ordering. Let δ be the least element in Nn . Assume that α is not
the 0 element in Nn . Then δ ≺ 0. However, by adding δ to both sides of the inequality, 2δ ≺ δ. This
contradicts the minimality of δ. We conclude that δ = 0.
Conversely, suppose that 0 4 α for all α ∈ Nn . By (2) and 0 4 α, then β 4 α + β for all
β ∈ Nn . In other words, xβ divides xα+β . Let A ⊆ Nn be any subset. By Dickson’s Lemma, the
ideal I = (xα | α ∈ A) is equal to (α(1), α(2), . . . , α(s)) for some finite list of s monomials. By
Proposition 12.5.3, every element in A is 4-less than or equal some α(i). By (1), 4 is a total order
so {α(1), α(2), . . . , α(s)} has a least element δ. Hence, all elements α ∈ A satisfy δ 4 α. Therefore,
every subset A of Nn has a least element, which proves that 4 is a well-ordering. 

12.5.2 – Gröbner Bases


Fix a monomial order on Nn . For an ideal I ⊆ F [x1 , x2 , . . . , xn ], define LT(I) as the set of all
leading terms of elements in I. In the motivating example to this section, we pointed out that an
undesirable property of some generating sets for ideals is that there are elements in the ideal whose
leading term is not divisible by a leading term of a generating element. We phrase the desirable
property positively as follows.

Definition 12.5.6
Fix a monomial order 4 on Nn . A finite subset G = {g1 , g2 , . . . , gs } of an ideal I ⊆
F [x1 , x2 , . . . , xn ] is a Gröbner basis of I with respect to 4 if, as ideals,

(LT(g1 ), LT(g2 ), . . . , LT(gs )) = (LT(I)),

where (LT(I)) is the monomial ideal generated by the leading terms of elements in I.

In general, if I = (f1 , f2 , . . . , fr ), then (LT(f1 ), LT(f2 ), . . . , LT(fr )) ⊆ (LT(I)). In Exam-


ple 12.5.1, with I = (xy 2 + 1, x2 y − 1) and the lexicographic order with x > y, we see that
x ∈/ (LT(f1 ), LT(f2 )) = (xy 2 , x2 y), whereas x ∈ (LT(I)) because x + y ∈ I and LT(x + y) = x.
This gives an example where the reverse inclusion does not hold.

Proposition 12.5.7
Fix a monomial order 4. Every ideal I in F [x1 , x2 , . . . , xn ] other than {0} has a Gröbner
basis. Furthermore, if G = {g1 , g2 , . . . , gs } is a Gröbner basis of I, then I = (g1 , g2 , . . . , gs ).

Proof. By Dickson’s Lemma (LT(I)) is generated by a finite collection of monomials. Let G =


{g1 , g2 , . . . , gs } be elements such that (LT(I)) = (LT(g1 ), LT(g2 ), . . . , LT(gs )). By definition, this set
G is a Gröbner basis of I.
Since gi ∈ I, then (g1 , g2 , . . . , gs ) ⊆ I. We show the reverse inclusion. Let f ∈ I. Multivariable
polynomial division of f by the s-tuple (g1 , g2 , . . . , gs ) with respect to 4 gives
f = g1 q1 + g2 q2 + · · · + gs qs + r,
for some qi ∈ F [x1 , x2 , . . . , xn ], where the remainder r satisfies r = 0 or else no term of r is divisible
by any LT(gi ). Note that r ∈ I since r is a linear combination of f, g1 , g2 , . . . , gs , which are all in
I. Assume that r 6= 0. Then since LT(r) ∈ (LT(I)), there exists a gi ∈ G such that LT(gi ) divides
LT(r). This contradicts the fact that r is the remainder from the polynomial division. Hence, r = 0
and thus f ∈ (g1 , g2 , . . . , gs ). This proves the reverse inclusion, so I = (g1 , g2 , . . . , gs ). 
We underscore that since the leading term function depends on a monomial order, a Gröbner
basis of an ideal only has meaning in reference to a given monomial order. Furthermore, there is no
claim that Gröbner bases are unique, even with respect to a given monomial order.
We motivated the definition of Gröbner bases by hoping to avoid the undesirable behavior of
polynomial division as illustrated in Example 12.5.1. We can now prove that this behavior does not
occur when a polynomial is divided by the elements of a Gröbner basis.
12.5. GRÖBNER BASES 643

Proposition 12.5.8
Fix a monomial order. Let G be a Gröbner basis of an ideal I ⊆ F [x1 , x2 , . . . , xn ] and let
f be a polynomial. There exists a unique r ∈ F [x1 , x2 , . . . , xn ] such that f = g + r with
g ∈ I and such that no term of r is divisible by any monomial LT(gi ).

Proof. Let G = {g1 , g2 , . . . , gs }. The existence of r with the stated property follows from the
procedure of polynomial division: the algorithm returns r, q1 , q2 , . . . , qs ∈ F [x1 , x2 , . . . , xn ] such that

f = q1 g1 + q2 g2 + · · · + qs gs + r

and the linear combination q1 g1 + q2 g2 + · · · + qs gs is in I.


Suppose that f = g1 + r1 = g2 + r2 with g1 , g2 ∈ I and that no term of r1 or of r2 is divisible by
any LT(gi ). Then r1 − r2 = g2 − g1 ∈ I. Furthermore, the set of monomials of r1 − r2 is a subset of
the union of monomials of r1 and the union of monomials of r2 . Hence, no term of r1 − r2 is divisible
by any LT(gi ). Since G is a Gröbner basis of I, then (LT(I)) = (LT(g1 ), LT(g2 ), . . . , LT(gs )). In
particular, since r1 − r2 ∈ I, then r1 − r2 = 0 or LT(r1 − r2 ) ∈ LT(I). However, no term of r1 − r2
is divisible by LT(gi ), so LT(r1 − r2 ) ∈
/ LT(I). This leaves the only possibility that r1 − r2 = 0. The
uniqueness of r follows. 

Definition 12.5.9
The unique polynomial r in Proposition 12.5.8 is called the normal form of f by the Gröbner
basis G.

Because the r described in Proposition 12.5.8 are unique, the result of the polynomial division
algorithm is independent of the order chosen for the elements of G. From a notational standpoint,
when G is a Gröbner basis, rem (f, G) is well-defined without specifying an order on the elements of
G. Furthermore, because of Proposition 12.5.8, Gröbner bases solve the Ideal Membership Problem
in the following sense.

Corollary 12.5.10
Fix a monomial order 4. Let I be an ideal in F [x1 , x2 , . . . , xn ]. Then f ∈ I if and only if
the remainder of f is 0 when divided by a Gröbner basis G of I with respect to 4.

Having proven the existence and some first nice properties about Gröbner bases, we are missing
an essential component to make these results practical: (1) How can we tell if a generating set of
an ideal is a Gröbner basis? (2) Given a set of generators for an idea I, how can we find a Gröbner
basis of I? We finish the section by addressing the first question but must leave the second question
for the following section.

12.5.3 – A Test for When a Generating Set is a Gröbner Basis


Example 12.5.1 showed that {xy 2 +1, x2 y−1} is not a Gröbner basis of the ideal I = (xy 2 +1, x2 y−1)
by pointing out that x + y ∈ I but neither x nor y is divisible by xy 2 or x2 y. We found the “simpler”
polynomial x + y by multiplying xy 2 + 1 by q1 and subtracting from it a multiple (x2 y − 1)q2 that
cancels out both leading terms. We formalize this method to simplify polynomials.
If xα and xβ are two monomials, we call the least common multiple and the greatest common
divisor of these two monomials
def max(α1 ,β1 ) max(α2 ,β2 )
lcm(xα , xβ ) = x1 x2 . . . xmax(α
n
n ,βn )
and
def min(α1 ,β1 ) min(α2 ,β2 )
gcd(xα , xβ ) = x1 x2 . . . xmin(α
n
n ,βn )
.

By usual properties of minimum and maximum, xα xβ = lcm(xα , xβ ) gcd(xα , xβ ). For any two
polynomials f, g ∈ F [x1 , x2 , . . . , xn ], the least common multiple gcd(LM(f ), LM(g)) divides both
644 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

LT(f ) and also LT(g). Furthermore,


 
LT(g) LT(g)
LT f = LT(f ) = LC(f )LC(g) lcm(LM(f ), LM(g))
gcd(LM(f ), LM(g)) gcd(LM(f ), LM(g))
 
LT(f ) LT(f )
= LT(g) = LT g .
gcd(LM(f ), LM(g)) gcd(LM(f ), LM(g))

Definition 12.5.11
Fix a monomial order on Nn . The S-polynomial of f, g ∈ F [x1 , x2 , . . . , xn ] is

LT(g) LT(f )
S(f, g) = f− g. (12.9)
gcd(LM(f ), LM(g)) gcd(LM(f ), LM(g))

By construction, S(f, g) is in any ideal that contains both f and g. Also, since both polynomials
in the difference in (12.9) have the same leading term, the difference cancels out these leading terms.
Hence, the terms of S(f, g) come from only the nonleading terms of f and g.
Example 12.5.12. Let f = 2x3 yz − 3x2 z 2 + 7yz and g = xy 2 + 7xyz 2 − 2 in R[x, y, z]. First
suppose that we use the lexicographic order with x > y > z. We note that gcd(LM(f ), LM(g)) =
gcd(x3 yz, xy 2 ) = xy. Then

S(f, g) = (y)f − (2x2 z)g = −14x3 yz 3 − 3x2 yz 2 + 4x2 z + 7y 2 z.

Suppose now that we use the graded lexicographic order with x > y > z. With this monomial order,
gcd(LM(f ), LM(g)) = gcd(x3 yz, xyz 2 ) = xyz. Then

S(f, g) = (7z)(f ) − (2x2 )g = −2x3 y 2 − 21x2 z 3 + 49yz 2 + 4x2 . 4

In the next section, the S-polynomials will play a key role in finding a Gröbner basis of an ideal.
However, they also give us a characterization for when a given set of polynomials {g1 , g2 , . . . , gs } is
a Gröbner basis of the ideal (g1 , g2 , . . . , gs ). We conclude this section with a proof and examples of
this characterization.

Lemma 12.5.13
Fix a monomial order. Let f1 , f2 , . . . , fs ∈ F [x1 , x2 , . . . , xn ] with mdeg fi = δ for all i and
let c1 , c2 , . . . , cs ∈ F . If

mdeg(c1 f1 + c2 f2 + · · · + cs fs ) ≺ δ,
Ps
then i=1 ci fi is an F -linear combination of the 2s polynomials S(fi , fj ) with 1 ≤ i <


j ≤ s. Furthermore, mdeg S(fi , fj ) ≺ δ for all 1 ≤ i < j ≤ s.

Proof. The hypotheses require that the leading monomial of each fi is the same. Since mdeg(c1 f1 +
c2 f2 + · · · + cs fs ) ≺ δ, then

c1 LC(f1 ) + c2 LC(f2 ) + · · · + cs LC(fs ) = 0. (12.10)

Furthermore, since all the fi have the same leading monomial,


LT(fi ) LC(fi )xδ
= = LC(fi ).
gcd(LM(fi ), LM(fi )) gcd(xδ , xδ )
Thus, the S-polynomial of a pair is S(fi , fj ) = LC(fj )fi − LC(fi )fj . Set
fi S(fi , fj )
fi0 = and pij = = S(fi0 , fj0 ).
LC(fi ) LC(fi )LC(fj )
12.5. GRÖBNER BASES 645

Then we have pij = fi0 − fj0 . Let us also call more simply pi = pi,i+1 so that fi0 = pi + fi+1
0
. With
this property, we have
c1 f1 + c2 f2 + · · · + cs fs
= c1 LC(f1 )f10 + c2 LC(f2 )f20 + · · · + cs−1 LC(fs−1 )fs−1
0
+ cs LC(fs )fs0
= c1 LC(f1 )(p1 + f20 ) + c2 LC(f2 )f20 + · · · + cs−1 LC(fs−1 )fs−1
0
+ cs LC(fs )fs0
= c1 LC(f1 )p1 + (c1 LC(f1 ) + c2 LC(f2 ))f20 + · · · + cs−1 LC(fs−1 )fs−1
0
+ cs LC(fs )fs0
= c1 LC(f1 )p1 + (c1 LC(f1 ) + c2 LC(f2 ))(p2 + f30 ) + · · · + cs−1 LC(fs−1 )fs−1
0
+ cs LC(fs )fs0
..
.
= c1 LC(f1 )p1 + (c1 LC(f1 ) + c2 LC(f2 ))p2 + · · · + (c1 LC(f1 ) + · · · + cs−1 LC(fs−1 ))ps−1
+ (c1 LC(f1 ) + c2 LC(f2 ) + · · · + cs LC(fs ))fs0 .
By (12.10), the last term is 0. Since pi = S(fi , fi+1 )/(LC(fi )LC(fi+1 )), the linear combination of
polynomials c1 f1 + c2 f2 + · · · + cs fs is an F -linear combination of the S-polynomials S(fi , fj ), and
more precisely of the s − 1 S-polynomials, S(fi , fi+1 ) for 1 ≤ i ≤ s − 1. 

Theorem 12.5.14 (Buchberger’s Criterion)


Let I be an ideal of F [x1 , x2 , . . . , xn ] and let 4 be a monomial order. A generating set
G = {g1 , g2 , . . . , gs } of I is a Gröbner basis of I if and only if for all i 6= j,

rem (S(gi , gj ), G) = 0,

regardless of the order chosen for the elements in G.

Proof. Suppose that G is a Gröbner basis of I. By definition ,S(gi , gj ) ∈ I for all i 6= j. By


Corollary 12.5.10, the remainder of S(gi , gj ), when divided by G is 0.
We now prove the converse. Suppose that G = {g1 , g2 , . . . , gs } is a generating set of an ideal I
and suppose that S(gi , gj ) has a remainder of 0 when divided by G (with respect to some order). In
order to show that G is a Gröbner basis of I, we must show the following containment of monomial
ideals (LT(I)) ⊆ (LT(g1 ), LT(g2 ), . . . , LT(gs )). This is equivalent to proving that for all f ∈ I, the
leading term LT(f ) is divisible by LT(gi ) for some i.
Let f ∈ I. Now f = q1 g1 + q2 g2 + · · · + qs gs for some polynomials qi ∈ F [x1 , x2 , . . . , xn ]. The
polynomials qi that express f as a linear combination of the gi polynomials are not unique. The
strategy involves finding a linear combination of f that involves the least cancellations of leading
terms upon addition of the polynomials. Consider the set
{max(mdeg(q1 g1 ), . . . , mdeg(qs gs )) f = q1 g1 + q2 g2 + · · · + qs gs }.
Since a monomial order is a well-ordering, this set has a least element δ. Let q1 , q2 , . . . , qs ∈
F [x1 , x2 , . . . , xn ] be specific polynomials such that
f = q1 g1 + q2 g2 + · · · + qs gs and δ = max(mdeg(q1 g1 ), . . . , mdeg(qs gs )). (12.11)
By Proposition 12.4.7, the multidegree of f satisfies
mdeg(f ) = mdeg(q1 g1 + q2 g2 + · · · + qs gs ) 4 max(mdeg(q1 g1 ), . . . , mdeg(qs gs )) = δ.
We prove by contradiction that mdeg(f ) = δ. Assume that mdeg(f ) ≺ δ. Then the addition
of the products qi gi must cancel some leading terms of the qi gi . Without loss of generality (after
reordering), suppose that mdeg(qi gi ) = δ for 1 ≤ i ≤ t and that mdeg(qi gi ) ≺ δ for t + 1 ≤ i ≤ s.
We decompose the combination of f further into
t
X t
X s
X
f= LT(qi )gi + (qi − LT(qi ))gi + qi gi . (12.12)
i=1 i=1 i=t+1
646 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

Now since mdeg(qi − LT(qi )) ≺ mdeg(qi ), then mdeg((qi − LT(qi ))gi ) ≺ mdeg(qi gi ) = δ. We now
consider only the first sum in (12.12), starting with the two observations that: mdeg(LT(qi )gi ) = δ
for all 1 ≤ i ≤ t; and the assumption that mdeg f ≺ δ implies that the first sum in (12.12) satisfies
the conditions of Lemma 12.5.13. Consequently, there exist constants aij ∈ F such that
t
X X
LT(qi )gi = aij S(LT(qi )gi , LT(qj )gj ). (12.13)
i=1 1≤i<j≤t

We now relate the S-polynomials in (12.13) to the S(gi , gj ) S-polynomials. Since mdeg LT(qi )gi =
mdeg LT(qj )gj = δ, then

S(LT(qi )gi , LT(qj )gj ) = LC(qj gj )LT(qi )gi − LC(qi gi )LT(qj )gj ,

and mdeg S(LT(qi )gi , LT(qj )gj ) ≺ δ. Note that xδ = LM(gi )LM(qi ). We further have

S(LT(qi )gi , LT(qj )gj )


xδ xδ
= LC(qj gj )LC(qi ) gi − LC(qj gj )LC(qi ) gi
LM(gi ) LM(gi )
LT(gj )xδ LT(gi )xδ
 
= LC(qi )LC(qj ) gi − gj
LM(gj )LM(gi ) LM(gi )LM(gj )

= LC(qi )LC(qj ) (LT(gj )gi − LT(gi )gj )
LM(gi )LM(gj )

 
LT(gj )gi LT(gi )gj
= LC(qi )LC(qj ) −
lcm(LM(gi ), LM(gi )) gcd(LM(gi ), LM(gj )) gcd(LM(gi ), LM(gj ))

= LC(qi )LC(qj ) S(gi , gj ).
lcm(LM(gi ), LM(gi ))

By the hypothesis that S(gi , gi ) has a remainder of 0 when divided by {g1 , g2 , . . . , gs } listed in
any order, we deduce that there exist polynomials bijk ∈ F [x1 , x2 , . . . , xn ] such that for all i and j
with 1 ≤ i < j ≤ t,
s
X
S(gi , gj ) = bijk gk .
k=1

The polynomials bijk arise from the division algorithm and by that algorithm, mdeg(bijk gk ) ≺
mdeg S(gi , gj ) for all i, j, k with 1 ≤ i < jt and 1 ≤ k ≤ s. Consequently,

xδ xδ
   
mdeg bijk gk 4 mdeg S(gi , gj ) ≺ δ, (12.14)
lcm(LM(gi ), LM(gi )) lcm(LM(gi ), LM(gi ))

where the last strict inequality holds because lcm(LM(gi ), LM(gi )) is precisely the leading terms that
are canceled in the difference in the S-polynomial S(gi , gj ). In the summation,
t s
!
X X xδ X
LT(qi )gi = aij LC(qi )LC(qj ) bijk gk
lcm(LM(gi ), LM(gi ))
i=1 1≤i<j≤t k=1
 
s δ
X X x
=  aij bijk  gk ,
lcm(LM(gi ), LM(gi ))
k=1 1≤i<j≤t

denote by qk0 the coefficient polynomial of gk so that this expression becomes q10 g1 + q20 g2 + · · · + qs0 gs .
By (12.14), mdeg(qi0 gi ) ≺ δ. Combining this result of with the decomposition of f in (12.12),
produces another linear combination f = q100 g1 + q200 g2 + · · · + qs00 gs in which mdeg(qi00 gi ) ≺ δ. This
contradicts the minimality definition of δ. Thus, the assumption that mdeg(f ) ≺ δ is false so we
conclude that mdeg f = δ.
12.5. GRÖBNER BASES 647

Returning to (12.11) but with the knowledge that mdeg f = δ, we have LM(f ) = LM(qi gi ) =
LM(qi )LM(gi ) for some i. Then we deduce that the leading term LT(f ) is divisible by some
LT(gi ). Thus, LT(f ) ∈ (LT(g1 ), LT(g2 ), . . . , LT(gs )). Since f was arbitrary in I, we deduce that
(LT(I)) ⊆ (LT(g1 ), LT(g2 ), . . . , LT(gs )) and hence these monomial ideals are equal. This proves that
{g1 , g2 , . . . , gs } is a Gröbner basis of I. 

Example 12.5.15. Consider Example 12.5.1 that motivated the section. We pointed out during
the course of this section that {xy 2 + 1, x2 y − 1} is not a Gröbner basis of I = (xy 2 + 1, x2 y − 1).
In that example, we also contended that I = (x + y, y 3 − 1). Let us use the lexicographic order with
x > y. For this pair, there is only one S-polynomial, namely

S(x + y, y 3 − 1) = y 3 (x + y) − x(y 3 − 1) = x + y 4 .

Upon division by the set G = {x + y, y 3 − 1}, this S-polynomial becomes

S(x + y, y 3 − 1) = 1(x + y) + y(y 3 − 1) + 0.

Since the remainder is 0, by Buchberger’s Criterion, we deduce that {x + y, y 3 − 1} is a Gröbner


basis of I with respect to the given monomial order.
We emphasize that whether a generating set is a Gröbner basis depends on the monomial order.
For example, consider the same ideal and the same set G but use the lexicographic order with y > x.
Then the S-polynomial of the two generators is

S(y + x, y 3 − 1) = y 2 (y + x) − (y 3 − 1) = y 2 x + 1.

Upon division by the set G = {x + y, y 3 − 1}, this S-polynomial becomes

y 2 x + 1 = (xy − x2 )(y + x) + 0(y 3 − 1) + x3 + 1.

Since the remainder is not 0, then G is not a Gröbner basis of I with respect to the lex order with
y > x. 4

Exercises for Section 12.5


1. Plot the diagram similar to Figure 12.1 for the monomial ideal I = (y 7 , x2 y 3 , x5 y) in R[x, y].
2. Let α, β, γ ∈ Nn . Prove that I = (xα + xβ , xγ + xβ , xα + xγ ) is a monomial ideal in R[x1 , x2 , . . . , xn ].
Conclude that not every generating set of a monomial ideal is a set of monomials.
3. A generating set {xα(1) , . . . , xα(s) } is called a minimal basis of a monomial ideal I if xα(i) is not
divisible by any xα(j) with i 6= j. Prove that every monomial ideal has a unique minimal basis.
4. Let 4 be a monomial order on F [x1 , x2 , . . . , xn ] with n ≥ 2 and suppose that α, β ∈ Nn .
(a) Prove that if xα divides xβ , then α 4 β.
(b) Prove that the converse of the previous implication statement is not true. In other words, find
α, β ∈ Nn such that α 4 β but that xα does not divide xβ .
[This shows that the partial order of divisibility is a strict subset of every monomial order.]
5. Let I be a monomial ideal in F [x1 , x2 , . . . , xn ]. Prove that F [x1 , x2 , . . . , xn ]/I, equipped with addition
and multiplication by elements in F , is a vector space with basis xα + I, where xα ∈ / I.
6. Let F be a field and let I be a monomial ideal in F [x, y].
(a) Prove that F [x, y]/I is a finite-dimensional vector space over F if and only if I contains xm and
y n for some positive integers m and n. (See Exercise 12.5.5.)
n

(b) Without loss of generality, suppose that n ≥ m. Prove that there are m monomial ideals I
m n
such that x ∈ I and that y ∈ I.
7. Let I = (x3 , y 3 , z 3 , xyz) ∈ F [x, y, z]. Find dimF F [x, y, z]/I. (See Exercise 12.5.5.)
8. Let I1 , I2 , . . . , It be monomial ideals in F [x1 , x2 , . . . , xn ]. Let Sj be the set of monomials occurring in
polynomials in Ij . Prove that I1 ∩ I2 ∩ · · · ∩ It is the ideal of polynomials consisting of monomials in
S1 ∩ S2 ∩ · · · ∩ St . Conclude that I1 ∩ I2 ∩ · · · ∩ It is again a monomial ideal.
648 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

9. In light of Exercise 12.5.8, use a diagram as in Figure 12.1, find a minimal generating set for
(a) the intersection (x8 y, x3 y 4 , xy 6 ) ∩ (x6 y 2 , y 7 );
(b) the product ideal (x8 y, x3 y 4 , xy 6 )(x6 y 2 , y 7 );
(c) the sum ideal (x8 y, x3 y 4 , xy 6 ) + (x6 y 2 , y 7 ).
10. In the polynomial ring F [x, y, z], let I1 = (x3 y 2 z, xyz 4 ) and I2 = (y 3 z, xy 2 z 3 ). Find a minimal
generating set for: a) I1 + I2 ; b) I1 I2 ; c) I1 ∩ I2 .
Let I = (xα | α ∈ A) be √
11. √ a monomial ideal in F [x1 , x2 , . . . , xn ], where A is a subset of Nn . Prove that
I is a monomial ideal I = (xs(α) | α ∈ A), where
(
1 if αi ≥ 1
s(α) = (s1 , s2 , . . . , sn ) where si =
0 if αi = 0.

12. Calculate the S-polynomials of the following pairs, with respect to the stated monomial order.
(a) a1 = x3 + y 3 − 3xy and a2 = x4 + y 4 − 1 with lexicographic order with x > y.
(b) a1 = 2xy 3 − 7y 3 + x2 and a2 = x3 − 5xy with lexicographic order with x > y.
(c) a1 = 2xy 3 − 7y 3 + x2 and a2 = x3 − 5xy with grlex order with x > y.
(d) a1 = 2xy 3 − 7y 3 + x2 and a2 = x3 − 5xy with grevlex order with x > y.
13. Calculate the S-polynomials of the following pairs, with respect to the stated monomial order.
(a) a1 = 3x2 y + 2xyz 2 and a2 = x3 yz − 4xy 2 with lex order with x > y > z.
(b) a1 = 3x2 y + 2xyz 2 and a2 = x3 yz − 4xy 2 with grlex order with x > y > z.
14. Let F be any field. Show that (xy 2 + 1, x2 y − 1) = (x + y, y 3 − 1) as ideals in F [x, y].
15. In Example 12.5.15 we showed that {x + y, y 3 − 1} is a Gröbner basis of (x + y, y 3 − 1).
(a) Show that y 3 (x + y) − x(y 3 − 1) = 1(x + y) + y(y 3 − 1).
(b) Proposition 12.5.8 shows that when G = {g1 , g2 , . . . , gs } is a Gröbner basis of the ideal (G), then
the remainder of f divided by G is unique (regardless of the order taken for the elements in G).
Use the first part of this exercise to show that the quotients q1 , q2 , . . . , qs of the remainder are
not unique (and hence depend on the order chosen for the elements of G).
16. Let G be a Gröbner basis of an ideal in F [x1 , x2 , . . . , xn ]. Use Proposition 12.5.8 to prove the following
results about remainders.
(a) rem (f + g, G) = rem (f, G) + rem (g, G) for all f, g ∈ F [x1 , x2 , . . . , xn ].
(b) rem (f g, G) = rem (rem (f, G) · rem (g, G) , G) for all f, g ∈ F [x1 , x2 , . . . , xn ].
17. Let f1 = xyz 2 + 3xz − 7, f2 = x3 − 2y 2 z + x, and f3 = 2xy + z 3 in Q[x, y, z]. Consider the ideal
I = (f1 , f2 , f3 ).
(a) Using the lex order with x > y > z, find some f ∈ I such that LT(f ) ∈
/ (LT(f1 ), LT(f2 ), LT(f3 )).
(b) Using the lex order with z > y > x, find some f ∈ I such that LT(f ) ∈
/ (LT(f1 ), LT(f2 ), LT(f3 )).
18. Suppose that I is a principal ideal in F [x1 , x2 , . . . , xn ]. Show that a set {g1 , g2 , . . . , gs } ⊆ I such that
one of the gi generates I is a Gröbner basis of I.
19. Let I be an ideal in F [x1 , x2 , . . . , xn ]. Prove that a set {g1 , g2 , . . . , gs } ⊆ I is a Gröbner basis of I if
and only if for all f ∈ I, there exists i ∈ {1, 2, . . . , s} such that LT(gi ) divides LT(f ).
20. Consider the polynomials f1 = x2 − y and f2 = x3 − z, along with the ideal I = (f1 , f2 ) ∈ R[x, y, z].
(a) Prove that {f1 , f2 } is a Gröbner basis of I with respect to the order lex with y > z > x.
(b) Prove that {f1 , f2 } is not a Gröbner basis of I with respect to the order lex with x > y > z.
21. Consider the polynomials f1 = x2 + y 3 − 2y and f2 = y 4 − 2y 2 + 1, along with the ideal I = (f1 , f2 ) ∈
R[x, y].
(a) Prove that {f1 , f2 } is a Gröbner basis of I with respect to the order lex with x > y.
(b) Prove that {f1 , f2 } is not a Gröbner basis of I with respect to the order grlex with x > y.
22. Consider the polynomials f1 = xy − xz and f2 = xz − yz, along with the ideal I = (f1 , f2 ) ∈ R[x, y, z].
Use the lex monomial ordering with x > y > z.
(a) Show that {f1 , f2 } is not a Gröbner basis of I.
(b) Let f3 = rem (S(f1 , f2 ), (f1 , f2 )). Show that {f1 , f2 , f3 } is a Gröbner basis of I.
12.6. BUCHBERGER’S ALGORITHM 649

12.6
Buchberger’s Algorithm
Proposition 12.5.7 affirms that every ideal I in F [x1 , x2 , . . . , xn ] has a Gröbner basis. The proof
(in the order presented in this book) relied on Dickson’s Lemma, which relied on Hilbert’s Basis
Theorem, the proof of which was not constructive. Consequently, the proof of the existence of a
Gröbner basis offered no algorithm to construct a basis. This section introduces such an algorithm
along with a few refinements to the concept of a Gröbner basis.

12.6.1 – Buchberger’s Algorithm


Buchberger’s Criterion (Theorem 12.5.14) suggests a strategy to find a Gröbner basis of an ideal.
Fix a monomial order 4. Let G = {f1 , f2 , . . . , fs } be a set in F [x1 , x2 , . . . , xn ] and let I be the
ideal generated by G. If G is not a Gröbner basis of I, then (LT(G)) ( (LT(I)). Hence, there exists
f ∈ I such that LT(f ) ∈ / (LT(G)). According to Buchberger’s Criterion, G is not a Gröbner basis if
and only if rem (S(gi , gj ), G) 6= 0 for some i 6= j. Then G0 = G ∪ {rem (S(gi , gj ), G)} is such that
I = (G0 ) but (LT(G)) ( (LT(G0 )). The strategy to find a Gröbner basis is to adjoin to G enough
polynomials so that (LT(G)) = (LT(I)). This inspires the following algorithm.

Algorithm 12.6.1: GroebnerBasis(G)

go ← true
whilego

 go ← false
n ← |G|




for i ← 1 to n − 1


do  j ← i + 1 to n
do for
 i , gj ), G) 6= 0
if rem (S(g




do G ← G ∪ {rem (S(gi , gj , G)}


 then


go ← true

return (G)

The algorithm takes a set (or an s-tuple) G = {g1 , g2 , . . . , gs } of polynomials and repeat-
edly adjoins nonzero polynomials of the form rem (S(gi , gj ), G). The algorithm terminates when
rem (S(gi , gj ), G) = 0 for all gi , gj ∈ G. By Buchberger’s Criterion, the output of this algorithm is
a Gröbner basis of the ideal (G). This algorithm does in fact terminate by virtue of the following
proposition.

Proposition 12.6.1
Suppose that G is a generating subset of an ideal I in F [x1 , x2 , . . . , xn ] and suppose that

G = G0 ⊆ G1 ⊆ G2 ⊆ · · ·

is a chain of subsets such that the set difference Gi − Gi−1 = {rem (S(a, b), Gi−1 )} for
some a, b ∈ Gi−1 with rem (S(a, b), Gi−1 ) 6= 0. Then this chain of subsets terminates.
Furthermore, the maximal element of this chain is a Gröbner basis of I.

Proof. Since S(a, b) ∈ (Gi−1 ) for any a, b ∈ Gi−1 , the ideal generated by Gi−1 is equal to the ideal
generated by Gi . Hence, (Gi ) = I for all i.
650 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

Since the S-polynomial S(a, b) in Gi − Gi−1 does not have a remainder of 0 when divided by
Gi−1 , then LT(rem (S(a, b), Gi−1 )) ∈ / LT(Gi−1 ). Thus, (LT(Gi−1 )) is a strict subset of the monomial
ideal (LT(Gi )). Since F [x1 , x2 , . . . , xn ] is Noetherian, the chain of monomial ideals

(LT(G0 )) ⊆ (LT(G1 )) ⊆ (LT(G2 )) ⊆ · · ·

terminates, say at (LT(Gt )). Consequently, Gt is such that rem (S(a, b), Gt ) = 0 for all a, b ∈ Gt .
By Buchberger’s Criterion, this means that Gt is a Gröbner basis of I. 

The GroebnerBasis algorithm provides a first algorithm for finding a Gröbner basis of an ideal.
We will discuss natural improvements later but we present few examples first.
Example 12.6.2. Consider the ring R[x, y] and use the graded lexicographic order with x > y.
Consider the polynomials g1 = xy 2 − 3x2 and g2 = 2y 3 − 5x. Start with G = {g1 , g2 } and follow
the steps of the algorithm. This first time through, the while loop has only one calculation in the
double for loop:
S(g1 , g2 ) = 2yg1 − xg2 = −6x2 y + 5x2 ,
which is its own remainder when divided by (g1 , g2 ). Therefore, we set g3 = −6yx2 + 5x2 and replace
G with {g1 , g2 , g3 }. The go variable was changed to true so we do another iteration of the while
loop.
At the next time through the while loop, we can calculate S(g1 , g2 ) but it will have a remainder
of 0 when divided by {g1 , g2 , g3 } (in any order since) since S(g1 , g2 ) = g3 . Next, we calculate
25
S(g1 , g3 ) = 6xg1 + yg3 = −18x3 + 5x2 y rem −18x3 + 5x2 y, {g1 , g2 , g3 } = 18x3 − x2 ,

and
6
25 2
so we set g4 = −18x3 − 6 x and replace G with {g1 , g2 , g3 , g4 }. Then we calculate

S(g2 , g3 ) = −5x2 y 2 + 15x3 .

However, the remainder of this divided by {g1 , g2 , g3 , g4 } is 0.


The go variable was set to true so we go through the while loop one more time. However, this
time all the S-polynomials S(gi , gj ) with 1 ≤ i < j ≤ 4 will have a remainder of 0 when divided by
G. Hence, by Buchberger’s Criterion,
 
25
xy 2 − 3x2 , 2y 3 − 5x, −6x2 y + 5x2 , 18x3 − x2
6
is a Gröbner basis of (xy 2 − 3x2 , 2y 3 − 5x). 4

Example 12.6.3. Consider the ring R[x, y, z] and use the lexicographic order with x > y > z.
Consider the polynomials g1 = xyz 2 − 3y 2 z and g2 = 2x2 z + 4xy + 1. Start with G = {g1 , g2 } and
follow the algorithm. This first time through the while loop has only one calculation in the double
for loop,
rem (S(g1 , g2 ), G) = rem −10xy 2 z − yz, G = −10xy 2 z − yz,


so set g3 = −10xy 2 z − yz and replace G with G = {g1 , g2 , g3 }. This finishes the while loop but the
go variable has switched to true so we continue.
This time, the nested for loops run through three calculations. The first is

rem (S(g1 , g2 ), G) = rem −10xy 2 z − yz, G = 0,




since G now includes −10xy 2 z − yz. So G does not change. Next, we calculate

rem (S(g1 , g3 ), G) = rem 30y 3 z + yz 2 , G = 30y 3 z + yz 2 .




Since this is nonzero, we set g4 = 30y 3 z + yz 2 and replace G with G = {g1 , g2 , g3 , g4 }. Then we
calculate

rem (S(g2 , g3 ), G) = rem −20xy 3 + xyz − 5y 2 , G = −20xy 3 + xyz − 5y 2 .



12.6. BUCHBERGER’S ALGORITHM 651

Since this is nonzero, we set g5 = −20xy 3 + xyz − 5y 2 and replace G with G = {g1 , g2 , g3 , g4 , g5 }.
The go variable had been switched to true so we repeat the while loop again.
This time, as we run through the 10 combinations of the nested for loops, all S-polynomials
S(gi , gj ) with 1 ≤ i < j ≤ 5 have a remainder of 0 when divided by G. Thus, {g1 , g2 , g3 , g4 , g5 } is a
Gröbner basis of the ideal (g1 , g2 ). 4
As presented so far, there is some inefficiency in the pseudocode given above for Buchberger’s
Algorithm. It is inefficient to calculate an S-polynomial S(gi , gj ) for the same pair (i, j) more than
once. First, if rem (S(gi , gi ), G) = 0 at some stage, it will remain 0 at a later stage when G is a
larger set of polynomials. Second, if gk = rem (S(gi , gi ), G) 6= 0 at some stage, then at a later stage
the set G will contain gk and hence rem (S(gi , gi ), G) will be 0 at the later stage. This inefficiency
can be remedied as follows.

Algorithm 12.6.2: GroebnerBasis2(G)

go ← true
jstart ← 1
whilego
go ← false

n ← |G|



for i ← 1 to n − 1



 j ← max(i + 1, jstart) to n
do for

do

  i , gj ), G) 6= 0
if rem (S(g
 do G ← G ∪ {rem (S(gi , gj , G)}


  then
go ← true




jstart ← n + 1

return (G)

12.6.2 – Reduced Gröbner Basis


We can improve upon Buchberger’s Algorithm by considering some simplifications to the polynomials
in the Gröbner basis for an ideal.

Proposition 12.6.4
Suppose that I ⊆ F [x1 , x2 , . . . , xn ] is an ideal with I = (a1 , a2 , . . . , as ). Then I =
(a1 , a2 , . . . , as ), where
a1 = rem (a1 , {a2 , . . . , as }) .
Furthermore, if {a1 , a2 , . . . , as } is a Gröbner basis, then so is {a1 , a2 , . . . , as }.

Proof. There exist qi ∈ F [x1 , x2 , . . . , xn ] such that


a1 = q2 a2 + · · · + qs as + a1 and equivalently a1 = a1 − q2 a2 − · · · − qs as .
Then (a1 , a2 , . . . , as ) ⊆ (a1 , a2 , . . . , as ) and the reverse inclusion also holds.
Since it is always true that (LT(a1 ), LT(a2 ), . . . , LT(as )) ⊆ (LT(I)) whenever I = (a1 , a2 , . . . , as ),
then the set {a1 , a2 , . . . , as } is a Gröbner basis if and only (LT(I)) ⊆ (LT(a1 ), LT(a2 ), . . . , LT(as )).
The second part of the proposition is trivial if LM(a1 ) = LM(a1 ). Now suppose that LM(a1 ) 6=
LM(a1 ). This occurs if and only if LM(ai ) divides LM(a1 ). However, in that case

(LT(a1 ), LT(a2 ), . . . , LT(as )) = (LT(a2 ), . . . , LT(as )).


Consequently, since {a1 , a2 , . . . , as } is a Gröbner basis,
I ⊆ (LT(a2 ), . . . , LT(as )) ⊆ (LT(a1 ), LT(a2 ), . . . , LT(as ))
and thus {a1 , a2 , . . . , as } is a Gröbner basis. 
652 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

If a1 = 0, then the set {a2 , . . . , as } is a generating set of I and is a Gröbner basis if {a1 , a2 , . . . , as }
is a Gröbner basis.
Proposition 12.6.4 inspires the following refinement to a Gröbner basis.

Definition 12.6.5
A Gröbner basis G is called reduced if
(1) LC(g) = 1 for all g ∈ G;
(2) g = rem (g, G − {g}) for all g ∈ G.

Part (2) of Definition 12.6.5 can be restated to say that for all g ∈ G, no term of g is divisible
by LM(g 0 ) for any g 0 ∈ G − {g}.
It is not hard to check that, except for needing to divide each element of the Gröbner basis by
the leading coefficient, the Gröbner bases in Examples 12.6.2 in 12.6.3 are reduced.
Proposition 12.6.4 allows for further simplifications during Buchberger’s Algorithm, as depicted
in the following example.
Example 12.6.6. Set a1 = xy 2 + 3y − 2 and a2 = x2 y + x + 1 and consider the ideal I = (a1 , a2 ) in
R[x, y]. We choose the lexicographic monomial order with x > y. We start with the generating set
G = {a1 , a2 }. Before beginning Buchberger’s Algorithm, we observe that rem (a1 , {a2 }) = a1 and
rem (a2 , {a1 }) = a2 . Hence, no term of ai is divisible by LM(aj ) for 1 ≤ i, j ≤ 2 with i 6= j.
Proceeding with Buchberger’s algorithm, the first step is to calculate
rem (S(a1 , a2 ), {a1 , a2 }) = rem (2xy − 2x − y, {a1 , a2 }) = 2xy − 2x − y.
Since this is not 0, we can set a3 = 21 (2xy − 2x − y) = xy − x − 12 y, where we divided by the leading
coefficient of the S-polynomial. Joining a3 to G gives G = {a1 , a2 , a3 }. This time, we observe that
LM(a3 ) | LT(a1 ) and also LM(a3 ) | LT(a2 ). Consequently, we can replace a1 , a2 , and a3 with some
remainders. In each row below, we replace ai with the multiple of rem (ai , G − {ai }) that is monic.
(Be aware that, in doing these repeated polynomial divisions, G is changing at each row.)
a1 a2 a3
Initially 2
xy + 3y − 2 2
x y+x+1 xy − x − 12 y
Replace a1 x + 12 y 2 + 72 y − 2 x2 y + x + 1 xy − x − 12 y
Replace a2 x + 12 y 2 + 72 y − 2 y 5 + 14y 4 + 41y 3 − 58y 2 + 2y + 12 xy − x − 12 y
Replace a3 x + 12 y 2 + 72 y − 2 y 5 + 14y 4 + 41y 3 − 58y 2 + 2y + 12 y 3 + 6y 2 − 10y + 1
Replace a2 x + 12 y 2 + 72 y − 2 0 y 3 + 6y 2 − 10y + 1

At this stage, G = {x + 12 y 2 + 72 y − 2, y 3 + 6y 2 − 10y + 1}. Since no term of one polynomial is divisible


by the leading term of any other, each is its own remainder when divided by the other. We can now
proceed with Buchberger’s Algorithm and calculate the S-polynomial of these two. However,
 
1 7
rem (S(a1 , a3 ), {a1 , a3 }) = rem −6xy 2 + 10xy − 4x + y 5 + y 4 − 2y 3 , {a1 , a3 } = 0.
2 2
By Buchberger’s Criterion, we conclude that
 
1 2 7 3 2
x + y + y − 2, y + 6y − 10y + 1
2 2
is a Gröbner basis of I and it is easy to observe that it is a reduced Gröbner basis. 4
We can obtain a reduced Gröbner basis from a generating set of an ideal with the following two
algorithms. First, we implement an algorithm that replaces a generating set A of some ideal with
another generating set A0 such that p = rem (p, A0 − {p}) for all p ∈ A0 .
12.6. BUCHBERGER’S ALGORITHM 653

Algorithm 12.6.3: ReduceSet(A)

go ← true
whilego

 go ← false
for i ←
 1 to |A|



p ← rem (ai , A − {ai })



 
if p = 0

 

 

do  then A ← A − {ai }


 do 


 
A ← (A − {ai }) ∪ {p}

 

 

else if LM(p) 6= LM(ai )

 


 

then go ← true
  
return (A)

In many programming languages, it is natural or convenient to represent a finite set as a finite


sequence. In order to prove that the above algorithm terminates, we will say that an s-tuple
` ∈ F [x1 , x2 , . . . , xn ] represents the finite set A ⊆ F [x1 , x2 , . . . , xn ] if |A| = s and ` = (a1 , a2 , . . . , as )
and A = {a1 , a2 , . . . , as }. Initially, A is represented by an s-tuple `0 . As the algorithm proceeds,
the set A is represented by a sequence (`j )j≥0 of s-tuples with

`j = (aj,1 , aj,2 , . . . , aj,s )

with the following refinement: If ever p = 0, replace the corresponding aj,i with 0. This keeps the
`j as an s-tuple of polynomials even though if a polynomial is 0 it is removed from the set A. Note
that `sk represents A after the kth time through the while loop.

Proposition 12.6.7
Fix a monomial order 4. For any finite subset A ⊆ F [x1 , x2 , . . . , xn ], the algorithm
ReduceSet applied to A terminates. Furthermore, at the end of the algorithm, p =
rem (p, A − {p}) for all p ∈ A.

Proof. The algorithm ReduceSet terminates if and only if there exists some k ∈ N such that
LM(a(k+1)s,i ) = LM(aks,i ) for all 1 ≤ i ≤ s. We need to prove that this condition occurs and that
when it does occur, the resulting set A has the property that p = rem (p, A − {p}) for all p ∈ A.
Since a(k+1)s,i is the remainder of aks,i when divided by some ordered (s−1)-tuple of polynomials,
by Proposition 12.4.8, either a(k+1)s,i = 0 or mdeg a(k+1)s,i 4 mdeg aks,i . Therefore, for each i
(corresponding to each polynomial ai ) the sequence of monomials is decreasing:

LM(ai ) = LM(a0,i ) < LM(a1,i ) < LM(a2,i ) < · · ·

Since the monomial order is a well-ordering, then each of these chains terminates. Hence, for each
i, the set
Si = {mdeg(aj,i ) | aj,i 6= 0 and j ≥ 0}
is finite. Consequently, there exists a K such that LM(aks,i ) = LM(aKs,i ) for all k ≥ K. Thus, at
the (K + 1)th through the while loop the algorithm will terminate.
Finally, after the last iteration k of the while loop, since LM(aks,i ) = LM(a(k−1)s,i ) for all
1 ≤ i ≤ s, no term of aks,i is divisible by LM(aks,i0 ) for i 6= i0 . Hence,

aks,i = rem (aks,i , {aks,i0 | i0 6= i}) .

The proposition follows. 


654 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

An algorithm that produces a reduced Gröbner basis of an ideal given a generating set of that
ideal simply needs to apply the ReduceSet procedure to the generating set G at the end or
possibly at other appropriate places in an implementation of Buchberger’s Algorithm. Inserting the
ReduceSet procedure at other places in the Buchberger Algorithm may reduce the size of G in
the middle of the algorithm, allowing for fewer calculations of S-polynomials. Though there are a
number of choices for how to specifically implement the procedure, Example 12.6.6 implemented the
following algorithm.

Algorithm 12.6.4: ReducedGroebnerBasis(G)

G ← ReduceSet(G)
while ∃{g1 , g2 } ⊆ G(rem (S(g1 , g2 ), G) 6= 0)
do G ← ReduceSet(G ∪ {rem (S(g1 , g2 ), G)})
return (G)

Reduced Gröbner bases are not desirable just for their simplicity; they benefit from the following
nice property.

Proposition 12.6.8
Let I be a nontrivial ideal in F [x1 , x2 , . . . , xn ]. For a given monomial order 4, there exists
a unique reduced Gröbner basis of I.

Proof. By Proposition 12.5.7 every ideal has a Gröbner basis G. The ReduceSet algorithm termi-
nates after a finite number of steps. Applied to G and dividing by leading coefficients if necessary,
the result is a reduced Gröbner basis of I. Hence, every ideal has a reduced Gröbner basis.
To prove uniqueness, suppose that G and G0 are two reduced Gröbner bases of I. Suppose that
G = {g1 , g2 , . . . , gs } and G0 = {g10 , g20 , . . . , gt0 }. The sets of leading monomials LM(G) and LM(G0 )
are both generating sets of the monomial ideal (LT(I)). Furthermore, since G is a reduced Gröbner
basis, for any i, the leading monomial LM(gi ) is not divisible by LM(gj ) for any j 6= i. Thus, LM(G)
is a minimal basis of (LT(I)), as defined in Exercise 12.5.3. Similarly, LM(G0 ) is a minimal basis
of (LT(I)). In Exercise 12.5.3, we showed that every monomial ideal has a unique minimal basis.
Thus, as sets of monomials LM(G) = LM(G0 ). In particular, s = t and, possibly after reordering,
the sets G and G0 are such that LT(gi ) = LM(gi ) = LM(gi0 ) = LT(gi0 ) for all 1 ≤ i ≤ s.
For any i, consider gi − gi0 . Since gi − gi0 ∈ I, the remainder rem (gi − gi0 , G) = 0 since G is
a Gröbner basis of I. In the difference, gi − gi0 , the leading terms cancel. However, since G and
G0 are reduced, none of the nonleading terms in gi or in gi are divisible by any monomials in
LT(G) = LT(G0 ). Hence,
gi − gi0 = rem (gi − gi0 , G) = 0.

Hence, we conclude that G = G0 . 

Gröbner bases solved the problem of ideal membership: If I is an ideal and G a Gröbner basis of
I with respect to some monomial ideal 4, then f ∈ I if and only if rem (f, G) = 0. The existence and
uniqueness of a reduced Gröbner basis for an ideal offers a computational solution to an otherwise
pesky problem: how to tell if I = (f1 , f2 , . . . , fs ) is equal to the ideal I 0 = (f10 , f20 , . . . , ft0 ).

Corollary 12.6.9
Fix a monomial order 4 on Nn . Two ideals in I and I 0 in F [x1 , x2 , . . . , xn ] are equal if and
only if they have the same reduced Gröbner basis.
12.6. BUCHBERGER’S ALGORITHM 655

12.6.3 – Useful CAS Commands


Because of their importance in polynomial algebra and the many applications thereof, many com-
puter algebra systems implement algorithms that are useful to calculate Gröbner bases of an ideal.
The following commands are in Maple’s Groebner package.

Maple Function
SPolynomial The command SPolynomial(a,b,T), where a and b are polynomials, and T
is a monomial order, calculates the S-polynomial of a and b with respect to
T.
Basis The command Basis(A,T), where A is a list of polynomials and T is a
monomial order, calculates a Gröbner basis of the ideal (A) that is nearly
reduced. In Maple’s implementation, if the polynomials in A have rational
coefficients, then Basis returns a set of polynomials A0 ⊆ Z[x] such that for
{p/LC(p) | p ∈ A0 } is a reduced Gröbner basis of (A).

Exercises for Section 12.6


1. Suppose that S(f, g) 6= 0.
(a) Prove that mdeg S(f, g) ≺ γ, where xγ = lcm(LM(f ), LM(g)).
(b) Conclude that the monomial ideal (LM(f ), LM(g)) is a strict subset of the ideal

(LM(f ), LM(g), LM(S(f, g))).

(c) Conclude also that if h = rem (S(f, g), (f, g)) 6= 0, then (LM(f ), LM(g)) is a strict subset of the
ideal (LM(f ), LM(g), h).
2. Using the algorithm GroebnerBasis, find the Gröbner basis for the ideal in Example 12.6.2 using
the lexicographic order with x > y.
3. By implementing various reductions, say using the ReducedGroebnerBasis algorithm, find the
reduced Gröbner basis for the ideal in Example 12.6.2 using the lexicographic order with x > y.
4. Find the reduced Gröbner basis of the ideal (xz 2 −2y 2 +5, xy −3z −1) with respect to the lexicographic
order with x > y > z.
5. Find the reduced Gröbner basis of the ideal (xz 2 −2y 2 +5, xy −3z −1) with respect to the lexicographic
order with z > x > y.
6. Consider the polynomial ring F5 [x, y]. Find the reduced Gröbner basis of the ideal (2xy + 3xy 2 +
1, 2x2 + xy 3 + 4) with respect to:
(a) the lexicographic order with x > y.
(b) the graded lexicographic order with x > y.
7. A Gröbner basis G of an ideal I is called minimal if (a) LC(g) = 1 for all g ∈ G; and (b) for all g ∈ G,
the monomial LM(g) is not in the monomial ideal (LT(G − {g})). Prove that G is a minimal Gröbner
basis if and only if LC(g) = 1 for all g ∈ G and no proper subset of G is a Gröbner basis of I.
8. Prove that G is a minimal Gröbner basis if and only if LC(g) = 1 for all g ∈ G and LT(G) is a minimal
basis of the monomial ideal (LT(G)). [See Exercises 12.6.7 and 12.5.3.]
9. Prove that the reduced Gröbner basis of an ideal I is a minimal Gröbner basis. [See Exercise 12.6.7.]
10. In this exercise, we consider the Gröbner basis corresponding to a system of linear equations. Let F
be a field and n a positive integer. Consider the system of m linear equations


 a11 x1 + a12 x2 + · · · + a1n xn − b1 = 0

a21 x1 + a22 x2 + · · · + a2n xn − b2 = 0

..



 .

am1 x1 + am2 x2 + · · · + amn xn − bm = 0.

Call fi = ai1 x1 + ai2 x2 + · · · + ain xn − bi and consider the ideal I = (f1 , f2 , . . . , fm ). Set gi to be the
polynomial corresponding to the ith row after the Gauss-Jordan elimination. (Some of the gi may be
the 0 polynomial.)
656 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

(a) Show that I = (g1 , g2 , . . . , gm ).


(b) Show that {g1 , g2 , . . . , gm } (with any 0 polynomials removed) is the reduced Gröbner basis of I
with respect to the lexicographic order with x1 > x2 > · · · > xn .
11. Consider the ideal I = (y − x3 , z − x4 ) in R[x, y, z].
(a) Show that the reduced Gröbner basis of I with respect to the lexicographic order with x > y > z
consists of exactly 2 polynomials (and give these polynomials).
(b) Show that the reduced Gröbner basis of I with respect to the lexicographic order with x > y > z
consists of exactly 5 polynomials (and give these polynomials).
12. Let F be any field.
(a) Prove that there exists only one monomial order on the polynomial ring F [x].
(b) Prove that the reduced Gröbner basis of the ideal (f (x), g(x)) is the singleton set consisting of
the monic greatest common divisor of f (x) and g(x).
(c) Explain how the steps of Buchberger’s Algorithm match up or compare to the steps of the
Euclidean Algorithm.
13. Recall that a matrix A ∈ M2×2 (F ) is called orthogonal if AA> = I. Setting
 
x y
A= ,
z w

any matrix A is orthogonal if and only if a1 = x2 + y 2 − 1, a2 = xz + yw, and a3 = z 2 + w2 − 1 are


all 0. Find by hand, the reduced Gröbner basis of the ideal I = (a1 , a2 , a3 ) in F [x, y, z, w].

12.7
Applications of Gröbner Bases
Previous sections of this chapter, developed the theory of rings of multivariable polynomials over a
field. We started from an abstract perspective of Noetherian rings and established Hilbert’s Basis
Theorem, a corollary of which is that F [x1 , x2 , . . . , xn ], where F is a field, is a Noetherian. We
found that any algorithm for dividing polynomials in a multivariable context does not solve the
ideal membership problem. This is resolved when we use a Gröbner basis as a generating set of an
ideal. With Buchberger’s Algorithm and its variants, we can: (1) find a Gröbner basis G of an ideal
I = (S), such that S ⊆ G; (2) find the reduced Gröbner basis G of an ideal with respect to any
monomial ideal.
With the theory of Gröbner bases at our disposal, along with an algorithm to compute it, we
now turn to applications that become computationally possible.
In the examples that follow, all the computations are performed using a computer algebra system.
(In some computer algebra systems, when a generating set of a polynomial ideal involves polynomials
with only rational coefficients, the implementation of Buchberger’s Algorithm provides a Gröbner
basis that is reduced, except that the polynomials are not necessarily monic but scaled by a factor so
that the coefficients are integers. Consequently, we sometimes call a Gröbner basis G reduced if one
can obtain a reduced Gröbner basis by dividing each polynomial g ∈ G by its leading coefficient.)

12.7.1 – Ideal Membership


As we introduced Gröbner bases, we pointed out that when an ideal I in F [x1 , x2 , . . . , xn ] is generated
by {f1 , f2 , . . . , fr } it is possible that f ∈ I even though the remainder rem (f, (f1 , f2 , . . . , fr )) is not
0. Proposition 12.5.8 affirms that if G is a Gröbner basis of I with respect to a monomial order 4,
then f ∈ I if and only if rem (f, G) = 0.
12.7. APPLICATIONS OF GRÖBNER BASES 657

Figure 12.2: Intersection of two ellipses

Example 12.7.1. Consider the ideal I = (x3 +yz, 2xy+xz) in R[x, y, z] and consider the polynomial
f = 2x3 y − yz 2 . We propose to work with the lexicographic monomial order with x  y  z. We
can calculate that
rem f, (x3 + yz, 2xy + xz) = −2y 2 z − yz 2 .


However, the set {x3 + yz, 2xy + xz} is not a Gröbner basis of I with respect to ≤lex . Therefore, we
cannot conclude one way or the other whether f ∈ I. The reduced Gröbner basis of I with respect
to ≤lex is G = {x3 + yz, 2xy + xz, 2y 2 z + yz 2 }. Another calculation shows that rem (f, G) = 0.
Consequently, we deduce that f ∈ I. The polynomial division of f by G gives the quotients of

f = 2y(x3 + yz) + 0(2xy + xz) − (2y 2 z + yz 2 ). 4

12.7.2 – Solving Systems of Polynomial Equations


Solving a system S of polynomial equations amounts to finding the variety V(S). As with systems
of linear equations, one possible goal involves providing a convenient way to describe the variety.
If the variety consists of a finite set of points, we may wish to find all of those points. Otherwise,
we may hope to find a parametrization of the variety. Gröbner bases, and in particular reduced
Gröbner bases, help with this problem since they express the generators of an ideal in a simplest
form.
Example 12.7.2. As a first simple example, consider the system of equations
(
x2 + 9y 2 − 9 = 0
25x2 + 4y 2 − 100 = 0.

Call a1 = x2 +9y 2 −9 and a2 = 25x2 +4y 2 −100 and let I be the ideal (a1 , a2 ) in the polynomial ring
R[x, y]. From the perspective of varieties, the solution to this system consists of the intersection of
two ellipses in the affine space R2 . (See Figure 12.2.) The reduced Gröbner basis of I with respect
to the lexicographic order with x > y is
 
2 864 2 125
G= x − ,y − .
221 221

Since V(I) = V(G), the intersection of these two ellipses corresponds to the four points
r r ! r r ! r r ! r r !
864 125 864 125 864 125 864 125
, , − , , ,− , − ,− .
221 221 221 221 221 221 221 221 4
658 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

y
2

-2 -1 1 2
x

-1

-2

Figure 12.3: Curves for Example 12.7.3

Example 12.7.3. Consider the system of equations


(
x3 + y 3 − 38 = 0
(12.15)
x2 + y 2 − 209 =0

as a subset of R2 . The solution of this system is a variety that corresponds to the intersection of the
cubic a1 = x3 + y 3 − 8/3 = 0 and the circle a2 = x2 + y 2 − 20/9 = 0. Figure 12.3 shows these two
curves and we observe that this variety consists of exactly four points. The reduced Gröbner basis
(scaled to clear denominators) of I = (a1 , a2 ) with respect to ≤lex with x > y is
G = {729y 6 − 2430y 4 − 1944y 3 + 5400y 2 − 1408, 352x + 405y 5 + 486y 4 − 450y 3 − 1620y 2 + 352y}.
We know that I = (G). This Gröbner basis has a polynomial that has terms only involving y.
Hence, all solutions (x, y) to the system of equations must satisfy
729y 6 − 2430y 4 − 1944y 3 + 5400y 2 − 1408 = 0
⇐⇒ (3y)6 − 30(3y)4 − 72(3y)3 + 600(3y)2 − 1408 = 0
⇐⇒ (3y − 2)(3y − 4)((3y)4 + 6(3y)3 − 2(3y)2 − 132(3y) − 176) = 0.
We see that two of the solutions to (12.15) have y = 32 and y = 43 . It is not hard to check that the
polynomial f (z) = z 4 + 6z 3 − 2z 2 − 132z − 176 is irreducible over Q and has two real roots and
two complex roots. We can use Newton’s method to find the real roots to f (z) numerically or we
could use the Cardano-Ferrari method to solve a quartic explicitly. Once we find four real roots to
729y 6 − 2430y 4 − 1944y 3 + 5400y 2 − 1408 = 0, the corresponding x of the solutions are obtained
from the second polynomial in our Gröbner basis, namely,
1
x=− (405y 5 + 486y 4 − 450y 3 − 1620y 2 + 352y).
352
We find that the rational solutions are 32 , 43 and 43 , 32 . There are two other points with non-
 

rational coordinates that are approximately (1.4071, −0.4922) and (−0.4922, 1.4071). 4
Example 12.7.4. The reader might suspect that the coefficients in the system of equations in Ex-
ample 12.7.3 were chosen so that (12.15) had two points in the solution that had rational coordinates.
In this example, we change the problem to make it more general and we ask a different question.
Consider instead the system of equations
(
x3 + y 3 − a = 0
(12.16)
x2 + y 2 − b = 0,
12.7. APPLICATIONS OF GRÖBNER BASES 659

where a and b are unspecified parameters. Changing the parameter a modifies the shape of the cubic
curve. Curves of the form x3 + y 3 = a all look similar √to the cubic curve in Figure 12.3,
√ with an
asymptote of x + y = 0, but having the x-intercept of ( 3 a, 0) and a y-intercept of (0, 3 a). On the
other hand, changing b simply affects the radius of the circle.
By looking at the graphs of the cubic and the circle, it appears that for a fixed a, there exist b1
and b2 such that

• for b ∈ [0, b1 ) the system (12.16) has no solutions;

• at b = b1 the system (12.16) has exactly two solutions (and the circle has the same tangent
line as the cubic at the points of intersection);

• for b ∈ (b1 , b2 ) the system (12.16) has exactly four solutions;

• at b = b2 the system (12.16) has exactly three solutions (two regular intersections and one
intersection point at which the circle and the cubic curve have the same tangent line);

• for b ∈ (b2 , ∞) the system (12.16) has exactly two solutions.

We propose to find this b1 and b2 .


The circle and the cubic curve will have a common tangent line at points that solve (12.16) and
such that the gradient of x3 + y 3 − a is parallel to the gradient of x2 + y 2 − b. This leads to a system
of four polynomial equations, namely
 3

 x + y3 − a = 0
x 2 + y 2 − b = 0

(12.17)


 3x2 − 2λx = 0
 2
3y − 2λy = 0.

The system (12.17) corresponds to a variety in the affine space R5 , described by an ideal in
R[x, y, a, b, λ]. We use the lexicographic order with x > y > λ > a > b. This has the effect of
attempting to eliminate x, then y, then λ, and so forth. With respect to this monomial order, the
reduced Gröbner basis of the ideal corresponding to (12.17) is

G ={2a4 − 3a2 b3 + b6 , 2bλ − 3a, 4λa3 − 9a2 b2 + 3b5 , −27a2 b + 8a2 λ2 + 9b4 , −81a2 + 16aλ3 + 27b3 ,
3a2 y + 3ab2 − 3yb3 − 2λa2 , 9ba + 6aλy − 9yb2 − 4aλ2 , 16λ3 y − 54ay − 18aλ + 27b2
3y 2 − 2λy, 8λ2 y − 9by + 9xb − 9a, −b2 + ax + ay, −3b + 2λy + 2λx, 2λy + 3x2 − 3b}.
(12.18)
The first polynomial listed in G is 2a4 − 3a2 b3 + b6 = (b3 − a2 )(b3 − 2a2 ). Since this√polynomial√must
3 3
be 0 when the circle and the cubic have common tangents, this occurs when b = a2 or b = 2a2 .
From our intuition from the graphs of the curves, we conclude that these are precisely the values of
b1 and b2 described above.
Note that the second polynomial in G is 2bλ − 3a. If we assume that b 6= 0, then 2bλ − 3a = 0
implies that λ = 3a 3a
2b . Furthermore, it is easy to check that under the assumption that λ = 2b , the
4 2 3 6
third, fourth, and fifth polynomials given in G are multiples of 2a − 3a b + b . 4

Example 12.7.5. We revisit Example 12.7.3 once more to illustrate yet another strategy. Recall
that our approach in Example 12.7.3 allowed us to easily find two rational solutions to (12.15) but
we “gave up” on searching for an exact solution since it appeared to involve the roots of a quartic.
We now take a different approach that is possible by virtue of the fact that the polynomials in
(12.15) are symmetric. This means that if (x0 , y0 ) is a solution to the system, then (y0 , x0 ) is also
a solution.
In Section 11.5.1, we encountered the elementary symmetric polynomials. For the case of two
variables, there are only two elementary symmetric polynomials, namely s1 = x + y and s2 = xy.
660 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

In light of this, consider the system of equations

x + y 3 − 38 = 0
 3


x2 + y 2 − 20 = 0

9 (12.19)


 x + y − s1 = 0
xy − s2 = 0

whose solution is a variety in R4 corresponding to an ideal I in the polynomial ring R[x, y, s1 , s2 ].


Using a computer algebra system, we find that with respect to the lexicographic order with x > y >
s2 > s1 the reduced Gröbner basis for this ideal is

G = {3s31 − 20s1 + 16, 18s2 − 9s21 + 20, 18y 2 − 18ys1 + 9s21 − 20, x + y − s1 }.

Since V(I) = V(G), the system (12.19) is equivalent to


 3

 3s1 − 20s1 + 16 = 0
18s − 9s2 + 20 = 0

2 1
2 2
(12.20)


 18y − 18ys1 + 9s1 − 20 = 0
x + y − s1 = 0.

Note that 3s31 − 20s1 + 16 = (s1 − 2)(3s21 + 6s1 − 8) so the first equation in (12.20) implies that s1 is
1√ 1√
2, −1 − 33, or −1− 33.
3 3
From here, we will be able to deduce s2 , y and then x. If in fact we only want to find the solutions
(x, y) of intersection, the system (12.20) allows us to avoid calculating s2 .

Case 1: s1 = 2. Then from the third equation in (12.20), y solves 18y 2 − 36y + 16 = 0. Solving
this quadratic gives y = 32 or 43 . From the fourth equation in (12.20), we see that x = s1 − y,
which leads to two solutions for (x, y), namely (2/3, 4/3) and (4/3, 2/3).
√ √ √
Case 2: s1 = −1 − 13 33. Then y solves 18y 2 + (18 + 6 33)y + (22 + 6 33) = 0. However, this
quadratic polynomial has a negative discriminant so has no real roots.
√ √ √
Case 3: s1 = −1 + 13 33. Then y solves 18y 2 + (18 − 6 33)y + (22 − 6 33) = 0. This gives us the
explicit roots for y:
√ √ √ √ √
   q 
1 1
q
−(18 − 6 33) ± (18 − 6 33)2 − 4 · 18(22 − 6 33) = −3 + 33 ± −2 + 6 33 .
36 6

This in turn leads to the remaining two real solutions of (12.15),


√ √ √ √
  q   q 
1 1
−3 + 33 + −2 + 6 33 , −3 + 33 − −2 + 6 33 and
6 6
√ √ √ √
  q   q 
1 1
−3 + 33 − −2 + 6 33 , −3 + 33 + −2 + 6 33 . 4
6 6

Example 12.7.6. Suppose that numbers a, b, and c solve the following system of equations

a + b + c = 10

a2 + b2 + c2 = 13 (12.21)
 3
 3 3
a + b + c = 20.

Theorem 11.5.3 affirms that every symmetric polynomial in the variables x1 , x2 , . . . , xn can be ex-
pressed by g(s1 , s2 , . . . , sn ), where g is a polynomial in n variables and si is the ith symmetric
12.7. APPLICATIONS OF GRÖBNER BASES 661

polynomial in x1 , x2 , . . . , xn . The system of equations involves symmetric polynomials of different


degrees. We might ask the question whether this is enough information to determine a4 + b4 + c4 ,
another symmetric polynomial in a, b, c, even without determining a, b, and c. We can address this
question by considering the system of polynomials


 a + b + c − 10 = 0
a2 + b2 + c2 − 13 = 0

(12.22)


 a3 + b3 + c3 − 20 = 0
 4
a + b4 + c4 − t = 0.

To determine the value of a4 + b4 + c4 corresponds to finding the value of t. Calculating the reduced
Gröbner basis of

{a + b + c − 10, a2 + b2 + c2 − 13, a3 + b3 + c3 − 20, a4 + b4 + c4 − t}

with respect to the lexicographic order with a > b > c > t amounts to attempting to successively
eliminate the variables x, y, z, and t. This reduced Gröbner basis is

G = {6t − 4307, −650 + 261c − 60c2 + 6c3 , 87 − 20c − 20b + 2cb + 2c2 + 2b2 , a + b + c − 10}.

The variety corresponding to (12.22) is equal to the variety V(G). Hence, whenever a, b, c solve
(12.21), then t solves 6t − 4307 = 0. Thus, we conclude that

4307
a4 + b4 + c4 = . 4
6

Example 12.7.7. Recall the concept of eigenvalues and eigenvectors associated to an n × n matrix
A ∈ Mn (F ). An element λ ∈ F , or possibly in a field extension of F , is called an eigenvalue of A
if there exists a nonzero vector ~v such that A~v = λ~v . Such a vector ~v is called an eigenvector of
eigenvalue λ. Eigenvalues are obtained as the roots of the characteristic polynomial of A, namely
det(xI − A).
From the perspective of systems of polynomial equations, if A is a given matrix, then the equation
A~v −λ~v = ~0 consists of a system of n equations in the n+1 variables λ, v1 , v2 , . . . , vn . These equations
are nonlinear but rather quadratic, since the ith equation has a term λvi .
As a specific example, consider the matrix
 
−5 4 4
A =  12 −8 −6 .
−24 17 15

The equation A~v = λ~v is tantamount to the system



−5v1 + 4v2 + 4v2 − λv1 = 0

12v1 − 8v2 − 6v2 − λv2 = 0

−24v1 + 17v2 + 15v2 − λv3 = 0.

We use the lexicographic monomial order with v1 > v2 > v3 > λ. Whether by hand or using a
computer algebra system, we find that the reduced Gröbner basis associated to the ideal generated
by these three polynomials in R[v1 , v2 , v3 , λ] is {g1 , g2 , g3 }, where

g1
 = λ3 v3 − 2λ2 v3 − 5λv3 + 6v3
g2 = 5v + 2 + 2λ2 v3 − 3λv3 − 9v3
= 60v1 + 17λ2 v3 − 23λv3 − 114v3 .

g3

662 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

This Gröbner basis gives us a good understanding of the solution to this system but from a direction
different from that usually present in linear algebra. Notice that
 
3 2
g1 = 0
 v3 (λ − 2λ − 5λ + 6) = 0

1
g2 = 0 =⇒ v2 = 5 (−2λ2 + 3λ + 9)v3 (12.23)
1 2
 
g3 = 0 v1 = 60 (−17λ + 23λ + 114)v3 .
 

In this system, we see that solutions to the eigenvalue/eigenvector problem come from v3 = 0 or
λ3 − 2λ2 − 5λ + 6 = 0. Note that if v3 = 0, then the other two equations give v2 = v1 = 0. Hence, we
find that ~0 is a solution regardless of λ. The equation in λ is (λ + 2)(λ − 1)(λ − 3) = 0, so the matrix
has three distinct eigenvalues: −2, 1, and 3. However, our approach using Gröbner bases gives us
much more information than just the eigenvalues. The second and third equations in (12.23) give
formulas for coordinates for the eigenvectors, namely
1 2

60 (−17λ + 23λ + 114)
~v = t  51 (−2λ2 + 3λ + 9) 
1

whenever λ is −2, 1, or 3. 4

12.7.3 – Implicitization Problem


In a number of situations, a curve or a surface in Rn may be naturally described by a parametrization,
rather than with a system of polynomial equations. However, a subset of the affine space F n , where
F is a field, is not a variety unless it is the zero set of some set of polynomials. As described at
the beginning of Section 12.4, given a parametrized subset Z of F n , the problem of finding a subset
S ∈ F [x1 , x2 , . . . , xn ] such that Z = V(S) is called the implicitization problem.
Given a parametrization,

x = f1 (t1 , t2 , . . . , tm )/g1 (t1 , t2 , . . . , tm )
 1


..
 .

x = f (t , t , . . . , t )/g (t , t , . . . , t ),
n n 1 2 m n 1 2 m

where the fi and gi are rational functions, Gröbner bases provide a strategy to solve the impliciti-
zation problem. Consider the system of polynomial equations

x1 g1 (t1 , t2 , . . . , tm ) − f1 (t1 , t2 , . . . , tm ) = 0

..

 .

x g (t , t , . . . , t ) − f (t , t , . . . , t ) = 0
n n 1 2 m n 1 2 m

in the polynomial ring F [x1 , . . . , xn , t1 , . . . , tm ]. Calculating the reduced Gröbner basis of the set of
these polynomials with the lexicographic order t1 > · · · > tm > x1 > · · · > xn has the effect of finding
a system of polynomials that eliminate the variables in the order t1 > · · · > tm > x1 > · · · > xn .
Hence, if the parametrized set Z is a variety, we suspect that the set S of polynomials that define it
will appear in the Gröbner basis as the polynomials that do not involve the parametrizing variables
t1 , t2 , . . . , tm .

Example 12.7.8. Consider the curve Z in R3 given by the parametrization



2
x(t) = t

y(t) = t3 − t
z(t) = t3 − 3t


12.7. APPLICATIONS OF GRÖBNER BASES 663

with t ∈ R. In order to express this as a variety, consider the system of equations



2
x − t = 0

y − t3 + t = 0 (12.24)
z − t3 + 3t = 0

in R[x, y, z, t] along with the ideal I generated by these three polynomials. The reduced Gröbner
basis of I with respect to the lexicographic order with t > x > y > z consists of the three polynomials

p1 = 2t − y + z
p2 = 4x − y 2 + 2yz − z 2
p3 = y 3 − 3y 2 z + 3yz 2 − 12y − z 3 + 4z.

Hence, the variety in R4 corresponding to (12.24) solves p2 = 0 and p3 = 0. We conclude that the
curve Z in R3 is the variety V(p2 , p3 ) and, furthermore, the parameter that gives a particular point
(x, y, z) is given by t = 12 (y − z). 4

When parametrizing curves or surfaces, it is often convenient to use the sine or cosine function.
For example, a simple parametrization of the unit circle is ~x : [0, 2π) → R2 with ~x(t) = (cos t, sin t).
However,
1 − u2
 
2u
~r(u) = , with u ∈ R (12.25)
1 + u2 1 + u2
also parametrizes the unit circle, though it misses the point (−1, 0). The vector function ~r(u) gives
the intersection of the circle with the line through (1, u) and (0, −1). The point (−1, 0) arises as the
limit limu→∞ ~r(u).

u
y

~r(u)

The parametrization (12.25) corresponds to the equations


( (
x = (1 − u2 )/(1 + u2 ) (1 + u2 )x − (1 − u2 ) = 0
=⇒
y = 2u/(1 + u2 ) (1 + u2 )y − 2u = 0.

Now consider the ideal I = ((1 + u2 )x − (1 − u2 ), (1 + u2 )y − 2u) in R[x, y, u]. Calculating the
reduced Gröbner basis of I with respect to the lexicographic order with u > x > y has the effect of
attempting to eliminate the variable u. This basis is

{x2 + y 2 − 1, uy − 1 + x, ux − y + u}.

The first polynomial in this basis gives the equation of the unit circle. So we might conclude that
(12.25) corresponds to the variety expressed by the single equation x2 + y 2 − 1 = 0. However, we
664 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

Figure 12.4: Lissajous figure

need take care to consider the meaning of the remaining two equations. The last two equations
should be used to solve for the parameter u. We get two solutions

1−x y
u= and u= .
y 1−x

It is important to consider what these equation mean for when u is not defined. The first, yu = 1−x,
does not allow us to solve for u if y = 0. The second u(1 − x) = y does not allow us to solve for
u when x = −1. Hence, for any point on the variety V(x2 + y 2 − 1), it is possible to solve for u,
except over the subvariety V(x − 1, y). We conclude that the parametrization in (12.25) gives the
unit circle except for the point (−1, 0).

Example 12.7.9. As a last example, consider the Lissajous curve Z depicted in Figure 12.4 and
parametrized by
~r(t) = (cos 3t, sin 2t) for t ∈ [0, 2π].
In order to use the above strategy to find an equation that gives the Lissajous figure as a variety, we
first must write the functions cos 3t and sin 2t as polynomials in cos t and sin t. It is easy to show
that cos 3t = 4 cos3 t − 3 cos t and the sin 2t = 2 sin t cos t. Following the above strategy, we now
replace cos t with (1 − u2 )/(1 + u2 ) and sin t with 2u/(1 + u2 ), with the assumption that u ∈ R. This
gives the parametrization
2 3
(1−u2 )
x = 4 (1−u )
(
(1+u2 )3 − 3 (1+u2 )
2
(2u) (1−u )
y = 2 (1+u 2 ) (1+u2 ) .

From the above discussion of this alternative parametrization for the unit circle, we expect this
parametrization to miss the point on Z corresponding to t = π. The reduced Gröbner basis of

I = x(1 + u2 )3 − 4(1 − u2 )3 + 3(1 − u2 )(1 + u2 )2 , y(1 + u2 )2 − 4u(1 − u2 )




with respect to the lexicographic order with u > x > y is

{9y 2 − 4x2 − 24y 4 + 4x4 + 16y 6 , −12y 2 + 16y 5 t − 8y 2 x2 + 16y 4 − 16y 3 t + 8y 2 x + 3yt + 2x + 6x2 − 8x3 ,
8y 4 − 2x3 + 4y 2 x − 6y 2 − xyt + 4y 3 xt + 2x2 , 3y + 8y 4 t − 4yx2 − 4y 3 − 8xy 2 t − 6y 2 t + 2xt + 2x2 t,
10y 2 t + yt2 − 4xt + 4xy − 8y 4 t + 4yx2 + 8xy 2 t − 7y + 8y 3 , xt2 − 1 + 4yt + t2 + x}.

The first equation in the above list defines a variety V(9y 2 − 4x2 − 24y 4 + 4x4 + 16y 6 ) such that
Z is a subset of this variety (and in fact is equal to this variety). 4
12.7. APPLICATIONS OF GRÖBNER BASES 665

Exercises for Section 12.7

In the following exercises, it is expected that the reader will obtain a relevant Gröbner basis using a computer
algebra system.
1. Consider the ideal I = (xy +z 2 , x2 −3xz +2y) in the ring R[x, y, z]. Decide if the following polynomials
are in the ideal.
(a) 9x3 y − 3x3 z + 2x2 y − 4y 2
(b) 2xy + 4yz + x3 + 9yx2
2. Consider the ideal I = (x3 + xy 2 + 2x, 3x2 + y 3 − 1) in the ring R[x, y]. Decide if the following
polynomials are in the ideal.
(a) y 3 − 3y 2 − 7
(b) 3x5 + 3x3 y 2 − x3 − 2xy 3 − xy 2
3. Solve the system of equations 
2 2 2
x + y + z = 9

x2 − 2y 2 = 1

x − z 2 = 2.

4. Recall that a critical point of a function f (x, y) is a point (x, y) such that the gradient ∇f = (fx , fy )
is undefined or zero. Find the critical points of f (x, y) = x3 + 3xy 2 + 2y 3 − x.
5. We propose to explicitly solve the system of equations
(
x3 + xy 2 + 2x = 0
3x2 + y 3 − 1 = 0.

(a) Find the reduced Gröbner basis G with respect to the lexicographic order with x > y.
(b) Notice that G contains a polynomial of degree 6 in just the variable y. Show that this sextic
polynomial factors into two cubics over Q.
(c) Use methods developed elsewhere in this textbook to get the explicit real roots.
(d) Use this information to find all the solutions to the above system of equations. [Hint: There are
exactly three real roots.]
2
6. Find the intersection of the sphere x2 + y 2 + z 2 = 4 with the ellipse x2 + y 2 + z9 = 1. Use a reduced
Gröbner basis calculation to find parametrizations of the two components of this intersection. Explain
your choice of monomial order.
2 2
7. Find the intersection of the sphere x2 + y 2 + z 2 = 4 with the ellipse x2 + y4 + z9 = 1. Use a reduced
Gröbner basis calculation to find parametrizations of the two components of this intersection. Explain
your choice of monomial order.
8. Continue working with Example 12.7.4.
(a) Explicitly find the points (x, y) of intersection of the cubic curve and the circle that have the
same tangent lines. (These points should be given in terms of a and b.)

3
(b) Explain from the polynomials
√ in the reduced Gröbner basis (12.18) why b1 = a2 has two
3 2
solution points and b2 = 2a has only one solution point.
(c) Explain geometrically why these results make sense.
9. Consider the following strategy using systems of polynomial equations to find the tangent line to a
curve at a point. Suppose that a curve Z in R2 is defined by a polynomial equation f (x, y) = 0. The
tangent line is the solution to a polynomial p = ax + by + c = 0 such that the gradient (with respect
to x and y) satisfies ∇f = λ∇p. Hence, a point (x, y) on Z has the tangent line ax + by + c = 0 if


 f (x0 , y0 ) = 0

ax + by + c = 0
0 0
fx (x0 , y0 ) = λpx (x0 , y0 )



fy (x0 , y0 ) = λpy (x0 , y0 ).
Explain why it is useful to calculate the Gröbner basis with the lexicographic order of λ > a > b >
c > x0 > y0 to find the coefficients a, b, and c implicitly in terms of x0 , y0 . Apply this strategy to the
parabola y = x2 + 1 and show that the resulting tangent line is what we expect from calculus.
666 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

10. Use Gröbner bases to find the formula for the tangent line to the curve y 2 = x3 − x at any given point
(x0 , y0 ). [See Exercise 12.7.9.]
11. Explain a method using Gröbner bases to find the equation of the tangent plane to a surface f (x, y, z) =
0 at a given point (x0 , y0 , z0 ). Demonstrate this method on the surface x3 + y 3 + z 3 − 4 = 0.
12. Find the distance from the point (5, 3) to the parabola y = x2 using polynomial equations. [Hint: Use
the polynomial equations y − x2 and (x − 5)2 + (y − 3)2 − d2 , as well as two equations using gradients
that express the condition that at a point (x, y) where two curves intersect, those curves have the
same tangent line. This becomes a system of equations in 4 variables, x, y, λ, d. Find a Gröbner basis
that includes an equation only in d. Solve this equation to find the desired distance.]
13. Use Gröbner basis techniques to find the lines that are bitangent to the parabolas y = x2 and 2y =
x2 − 4x + 8.
14. Use Gröbner basis techniques to find the inverse of a 2 × 2 matrix. In other words, recover when a
matrix is invertible and obtain formulas for the entries of an inverse matrix.
15. Write x5 + y 5 + z 5 as a polynomial of the elementary symmetric polynomials s1 , s2 , s3 and explain
your method.
16. A torus of large ring radius R = 4 and cross-section radius r = 1 can be parametrized by
~
X(u, v) = ((2 + cos u) cos v, (2 + cos u) sin v, sin u),

for (u, v) ∈ [0, 2π]2 . Show that this torus is a variety in R3 and give an explicit equation for the torus
using the techniques described in Section 12.7.3.
17. A space cardioid is the curve parametrized by

~r(t) = ((1 + cos t) cos t, (1 + cos t) sin t, sin t) for t ∈ [0, 2π].

Express the space cardioid as a variety in R3 and fully justify your result.

12.8
A Brief Introduction to Algebraic Geometry
12.8.1 – What Is Algebraic Geometry?
Geometry comes in a number of flavors.
Euclidean geometry deals with points, lines, triangles, circles, polygons, conics, rays, planes,
spheres, and various other subsets of points in Rn and concerns itself with results that can be proven
from Euclid’s five famous postulates. Though Euclidean geometry stands as a great achievement
of antiquity, the postulates and common notions in Euclid’s Elements involved some assumptions
that required clarification (e.g., axioms of betweenness). The work of those who proposed systems
of axioms that removed the original logical gaps still fell under the label of Euclidean geometry.
When Lobachevsky and Bolyai discovered non-Euclidean geometry, they discarding Euclid’s con-
troversial Fifth Postulate (Parallel Postulate) and proved that there exist consistent geometries that
have an alternative postulate. Nonetheless, classical non-Euclidean geometry continued to see a sim-
ilar proof style as Euclidean geometry. Along the way and subsequently, other types of geometries
emerged: projective geometry, finite geometries, inversive geometry, transformational geometry, etc.
Cartesian coordinates introduced an alternative method to the object of interest in Euclidean
geometry. Using Cartesian coordinates, along with algebra or analysis, to establish geometric results
became known as analytic geometry. In contrast, geometry that did not employ coordinates became
first known as pure geometry, and later as synthetic geometry.
Analytic geometry opened a door to differential geometry. In broad strokes, differential geometry
studies that which can be known about sets of points (often considered as subsets in Rn but not
necessarily) using calculus and analysis. It is not possible to “do calculus” on an arbitrary set of
points, so differential geometry concerns itself with differentiable manifolds, sets that can be locally
12.8. A BRIEF INTRODUCTION TO ALGEBRAIC GEOMETRY 667

parametrized by functions that are in some sense differentiable. In particular, curves and surfaces are
one- and two-dimensional manifolds. Riemannian geometry is a subbranch of differential geometry
in which one studies contexts where it is possible to “do calculus” and have a concept of metric (or
distance). The objects of study in Riemannian geometry are called Riemannian manifolds.
Algebraic geometry also descends from analytic geometry but in a different way. Instead of
starting from parametrized surfaces, algebraic geometry starts with algebraic varieties, which are by
nature defined in relation to some set of variables. At its beginning, the purpose of this flavor of
geometry is to study properties of algebraic varieties (and the more general object of schemes) using
techniques and theory from algebra, and in particular ring theory.
The problems that motivate investigations in algebraic geometry are not unlike those in differ-
ential geometry. On the one side, one studies local properties of varieties, namely properties of the
variety that have meaning in arbitrarily small neighborhoods of the variety around a given point.
On the other side lie global properties, properties that are true of the variety as a whole.
Commutative algebra is a branch of algebra that narrows its focus to the study of commutative
rings and modules thereof. Since algebraic varieties are zero sets of ideals in the commutative
ring F [x1 , x2 , . . . , xn ], commutative algebra is an essential support to algebraic geometry. Concepts
developed in commutative algebra—such as localizations, valuations, graded rings, regular rings,
flatness, derivations, and many others—have consequences for geometric information. Conversely,
geometric concepts such as tangent spaces, singularities, coordinate functions, incidence, geometric
classification problems, and so on motivated research in commutative algebra.
The field of algebraic number theory is not unlike algebraic geometry in the perspective of using
modern algebra to study number theory. However, since rings of numbers are generally commuta-
tive, algebraic number theory also borrows heavily from commutative algebra. Consequently, these
three branches—commutative algebra, algebraic geometry, and algebraic number theory—developed
together. Hence, many concepts that arise naturally in on field have an interpretation for objects of
study in one of the other two.
Algebraic geometry is a vast field of mathematics. Consequently, just as with some of the other
“brief introduction” sections in this book, this section only intends to whet the reader’s appetite
with a few introductory concepts. The literature offers a number of excellent books on algebraic
geometry: [6, 22, 37, 38, 44].

12.8.2 – The Zariski Topology


One of the key concepts in Euclidean geometry is that of distance between points. However, more
general than the notion of distance is that of “nearness” developed in point set topology.
Example 12.8.1 (Euclidean Topology). Let Rn and let d(P, Q) be the distance between two
points P, Q ∈ Rn . Recall that the open ball of radius r > 0 around a point P ∈ Rn is the set

Br (P ) = {Q ∈ Rn | d(P, Q) < r}.

A subset U ⊆ Rn is called open if for all p ∈ U , there exists r > 0 such that Br (p) ⊆ U . Similarly,
a subset F ⊆ Rn is called closed if F is an open subset.
Notice that Rn and ∅ are both open and closed. Furthermore, open sets satisfy the following two
properties. (1) If U1 Sand U2 are open, then U1 ∩ U2 is open. (2) If {Ui }i∈I is a collection of open
sets, then the union i∈I Ui is again open.
Let D be an open subset of Rn . Recall that a function f : D → Rm is called continuous at ~c
if for all ε > 0, there exists δ > 0, such that k~x − ~ck < δ implies kf (~x) − f (~c)k < ε. We also call
f continuous if it is continuous at all points ~c ∈ D. Using open balls, we can restate the definition
of continuity to say that for all ε > 0, there exists δ > 0 such that f (Bδ (~c)) ⊆ Bε (f (~c)). This
restatement also implies that Bδ (~c) ⊆ f −1 (Bε (f (~c))). Finally, this latter state is also equivalent to
saying that for all open sets U 0 ∈ Rm , the set f −1 (U 0 ) is open. 4

The properties of open sets in Rn and the last equivalent formulation of continuous turn out to
lead to a wealth of concepts. As with many structures in algebra, it is natural to label this collection
of useful properties and to study them independently from a specific instance.
668 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

Definition 12.8.2
A topological space is a pair (X, τ ), where X is a set and τ is a subset of the power set
P(X) such that
(1) X ∈ τ and ∅ ∈ τ ;
(2) the intersection of any two subsets U1 , U2 ∈ τ is again in τ ;
(3) the union of any collection {Ui }i∈I of subsets in τ is again in τ .

The set of subsets τ is called the topology of this topological space. A set U ∈ τ is called
open in the τ topology and a set F ⊆ X is called closed if F is open.

These definitions justify calling the motivating Example 12.8.1 the Euclidean topology. Further-
more, in the Euclidean topology on R, the common definition of open or closed as applied to an
interval is precisely when it is open or closed in the Euclidean topology.
By induction, the axiom (2) of intersections implies that the intersection of a finite number of
open sets is again open. However, it is not true that the intersection of an arbitrary collection of
open sets is open. For example,
∞  
\ 1 1
− ,1 + = [0, 1],
n=1
n n
which is not an open set in R, even though (−1/n, 1 + 1/n) is an open interval for all n ∈ N∗ .

Definition 12.8.3
Let (X, τ ) be a topological space and let x ∈ X. A neighborhood of x is any set V ⊆ X
such that there exists an open set U ∈ τ such that x ∈ U ⊆ V .

Using the Euclidean topology as inspiration, a neighborhood of x ∈ Rn is any set that contains
an open ball Bδ (x) for some δ > 0. Consequently, if we use the intuitive notion of “near x” to mean
less than some positive distance δ away from x, then a neighborhood of x is any set that contains
the set of points that are near x. The concept of neighborhood of a point makes precise this intuitive
notion of nearness.
Just like in algebraic structures, one does not typically care about arbitrary functions from one
topological space to another. Instead, we consider functions that “preserve the structure.”

Definition 12.8.4
Let (X, τ ) and (Y, τ 0 ) be two topological spaces. A function f : X → Y is called continuous
if f −1 (U ) ∈ τ for all U ∈ τ 0 .

The term continuous is inspired from, and directly generalizes the same term in analysis. This
definition can also be restated to say that f is continuous if and only if for all x ∈ X and for all
neighborhoods V of f (x), the set f −1 (V ) is a neighborhood of x. In this intuitive sense, continuous
functions preserve nearness.
Example 12.8.5 (Finite Complement). Let X be any set and consider the subset τ ⊆ P(X)
defined as
τ = {A ∈ P(X) | X − A is finite} ∪ {X, ∅}.
By definition, X and ∅ are in τ . For A, B ∈ τ , by DeMorgan’s laws, X − (A ∩ B) = A ∩ B = A ∪ B.
Since A and B are both finite, then their union is finite. Let {Ai }i∈I be a collection of sets in τ .
Then [ \
Ai = Ai .
i∈I i∈I

If i0 ∈ I, then Ai0 is finite and so the above intersection must be finite with a cardinality less than
or equal to |Ai0 |. Thus, the union of {Ai }i∈I is again in τ .
12.8. A BRIEF INTRODUCTION TO ALGEBRAIC GEOMETRY 669

We have shown that τ is a topology. This is called the finite complement topology on any set X.
The open subsets in this topology are ∅ and any set whose complement is finite. 4
Example 12.8.6 (Restricted Topology). Let (X, τ ) is a topological space and let Y ⊆ X be any
subset. Define
τ |Y = {V ∈ P(Y ) | ∃U ∈ τ, V = Y ∩ U }.
It is obvious that both ∅ and Y are in τ |Y . Let V1 , V2 ∈ τ |Y . Then V1 = Y ∩ U1 and V2 = Y ∩ U2
for some U1 , U2 ∈ τ . By associativity and idempotence, V1 ∩ V2 = Y ∩ (U1 ∩ U2 ), so V1 ∩ V2 ∈ τ |Y .
Furthermore, if {Vi }i∈I is a collection of sets in τ |Y with Vi = Y ∩ Ui with i ∈ I, then
!
[ [ [
Vi = (Y ∩ Ui ) = Y ∩ Ui ,
i∈I i∈I i∈I
S
which is in τ |Y because the union Ui ∈ τ .
This establishes that τ |Y is a topology on Y , called the restriction of τ to Y or the subset topology.
A set that is “closed in Y ” is a subset F ⊆ Y such that complement in Y is open in Y , namely
Y − F = Y ∩ U where U ∈ τ . Hence, F = Y ∪ U . Since F ⊆ Y ,
F = Y ∩ F = Y ∩ (Y ∪ U ) = (Y ∩ Y )) ∪ (Y ∩ U ) = Y ∩ U .
Hence, as subset F of Y is closed in the restricted topology if and only if F = Y ∩ F 0 , where F 0 is
a subset of X that is closed in τ . 4

Proposition 12.8.7
Let (X, τ ) be a topological space.
(1) Both X and ∅ are closed.

(2) If F1 and F2 are closed sets then F1 ∪ F2 is closed.


T
(3) If {Fi }i∈I is a collection of closed sets in X, then the intersection i∈I Fi is closed.
Furthermore, if Γ is a collection of subsets of X satisfying the above three axioms, then
τ = {F | F ∈ Γ} is a topology.

Proof. All three parts follow from the definition of open sets and DeMorgan laws. 
Proposition 12.8.7 makes it possible to define a topology on a set by specifying closed subsets
instead of the open sets.
Let K be a field and consider the affine space AnK . In Exercise 12.3.2 we proved that the union
of two varieties is again a variety and that the intersection of an arbitrary collection of varieties is
again a variety. Furthermore, K n = V(0) and ∅ = V(1). Thus, affine varieties satisfy the three
conditions in Proposition 12.8.7.

Definition 12.8.8
The Zariski topology on AnK is the topology in which the closed sets are precisely affine
varieties in AnK .

In fact, it is the Zariski topology that motivates the alternate notation AnK in contrast to the
vector space notation K n .
Example 12.8.9. Let K be a field and consider the Zariski topology on A1K . An affine variety in
A1K consists of a solution set of a polynomial in K[x]. However, a polynomial can have only a finite
number of roots. Conversely, for any finite subset of K, there exists a polynomial that has precisely
that finite set as roots. Consequently, the closed sets in the Zariski topology on A1K are finite sets
of points. Thus, this topology is the finite complement topology. 4
670 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

Definition 12.8.10
Let (X, τ ) be a topological space. A nonempty subset Y ⊆ X is irreducible if it cannot be
expressed as the union Y = Y1 ∪ Y2 of two proper subsets Y1 and Y2 that are closed in τ |Y .

In every topology (X, τ ), the singleton sets {x} are irreducible sets. Without much intuition,
we might suspect (erroneously) that the singleton sets are the only irreducible sets in a topology.
Indeed, this is the case for the Euclidean topology on R. Suppose that Y contains two distinct points
a, b with a < b. Set Y1 = Y ∩ (−∞, (a + b)/2] and Y2 = Y ∩ [(a + b)/2, +∞). By Example 12.8.6, Y1
and Y2 are closed in Y . Furthermore, both subsets are proper because b ∈ / Y1 and a ∈/ Y2 .
However, in the finite complement topology τF C on R, the whole set R is itself irreducible. The
closed subsets of R in τF C are finite subsets. Since R is not the union of two finite subsets, R is
irreducible in τF C . In fact, a set Y is irreducible in the finite complement topology if and only if Y
is a singleton set or infinite.
In the Zariski topology, the concept of irreducible has a ring theoretic interpretation.

Proposition 12.8.11
Let V be an affine variety in AnK (assuming the Zariski topology). Then V is irreducible if
and only if I(V ) is a prime ideal.

Proof. By Exercise 12.3.2, V(IJ) = V(I) ∪ V(J) for two ideals I, J ⊆ K[x1 , x2 , . . . , xn ] and I(V1 ∪
V2 ) = I(V1 ) ∩ I(V2 ) for any two affine varieties V1 , V2 ⊆ AnK .
Suppose that V is not irreducible. Then V = V1 ∪ V2 , where V1 and V2 are affine varieties with
V1 ( V and V2 ( V . Setting I1 = I(V1 ) and I2 = I(V2 ), we have I = I(V ) = I1 ∩ I2 and also I ( I1
and I ( I2 . However, since I1 I2 ⊆ I and I1 * I and I2 * I, then I(V ) is not prime.
Conversely, suppose that I(V ) is not a prime ideal. Then there exist ideals I1 and I2 such that
I1 I2 ⊆ I(V ) but I1 * I and I2 * I. Then V(I) ⊆ V(I1 I2 ) = V(I1 ) ∪ V(I2 ) so in particular,

V = (V ∩ V(I1 )) ∪ (V ∩ V(I2 ))

where V ∩ V(Ii ) is closed in V . But V(Ii ) * V(I) = V so V(Ii ) ∩ V 6= V , which implies that
V ∩ V(I2 ) and V ∩ V(I2 ) are proper subsets. Thus, V is not irreducible. 

It is often possible to study properties of a geometric object from functions on that object.
Polynomial functions in the ring K[x1 , x2 , . . . , xn ], when evaluated on an affine variety V , form a
ring of polynomial functions from ψ : V → K. However, we consider equivalent two such polynomial
functions ψ and φ if φ(c) = ψ(c) for all c ∈ V . This condition is tantamount to φ(x) − φ(x) ∈ I(V ),
which in turn is the same as φ(x) = ψ(x) in K[x1 , x2 , . . . , xn ]/I(V ).

Definition 12.8.12
Let V be an affine variety in AnK . The coordinate ring of V , denoted K[V ] is the quotient
ring K[x1 , x2 , . . . , xn ]/I(V ).

Some properties of affine varieties are readily apparent in the coordinate ring. If V is a vari-
ety consisting of a single point, then K[V ] is a field. By the strong form of the Nullstellensatz,
when K is algebraically closed, a variety has K[V ] = K if and only if V consists of a point. By
Proposition 12.8.11, V is an irreducible affine variety if and only if K[V ] is an integral domain.

12.8.3 – The Preference of Ideals


Affine varieties are natural objects to study. The ideal-variety correspondence makes ring theory
a powerful tool to study solution sets of such systems of equations. An ideal I defines an affine
variety V(I). However, some information is lost in the V function so the ideal itself carries more
information than its corresponding variety.
12.8. A BRIEF INTRODUCTION TO ALGEBRAIC GEOMETRY 671

For example, consider the intersection of the variety V(y − x2 ) and V(y − 2x + 1). Note that
y = 2x − 1 is the tangent line to y − x2 at (1, 1). From a set-theoretic perspective, the intersection
is the point {(1, 1)}. Consider the ideals I = (y − x2 ) and J = (y − 2x + 1) in R[x, y]. Under the
ideal-variety correspondence, the intersection of the varieties satisfies

V(I + J) = V(I) ∩ V(J).

Consider the ideal I + J instead of the set-theoretic intersection of the varieties. This sum ideal
is I + J = (y − 2x + 1, (x − 1)2 ). In fact, {y − 2x + 1, (x − 1)2 } is a Gröbner basis of I + J with
respect to the lexicographic order with y > x. Furthermore, using this Gröbner basis, we can show
that neither x −√1 nor y − 1 is in I + J. By the strong form of the Nullstellensatz, √ we know that
I(V(I + J)) = I + J = (x − 1, y − 1). The ideal I + J is a proper ideal of I + J and in this
distinction carries more information.
We interpret the generators {y − 2x + 1, (x − 1)2 } by saying that the intersection of the parabola
y − x2 = 0 and the line y − 2x + 1 = 0 is a “point with a tangent vector.” The quotient ring
K[x1 , x2 , . . . , xn ]/(I + J) is K[x]/((x − 1)2 ). In Example 5.6.6, we observed that writing elements
in R = K[x]/((x − 1)2 ) as a + b(x − 1) shows that elements in R add and multiply as the tangent
lines of representative sum and product functions. Consequently, we can interpret the information
in K[x]/((x − 1)2 ) as containing not just the possible values of a function at a point but something
akin to the value and the derivative of a function at a point.
This observation (along with considerably more algebraic machinery) leads to a developed concept
of the Zariski tangent space to a variety at a point.

12.8.4 – The Prime Spectrum


In the philosophy of algebraic geometry, the inclination to remain closer to rings makes it desirable
to find some way to associate a topology to other rings besides just the few special polynomial rings
K[x1 , x2 , . . . , xn ]. The answer is to consider the set of prime ideals as the set of points.

Definition 12.8.13
Let R be a commutative. The prime spectrum of R, denoted Spec R as the set of prime
ideals of R.

Definition 12.8.14
For any subset S ⊆ R, define the variety in Spec R associated to S as V (S) = {P ∈
Spec R | S ⊆ P }.

Proposition 12.8.15
Let R be a commutative ring. Then varieties in Spec R satisfy the following properties.

(1) If I is the ideal generated by S, then V (S) = V (I) = V ( I).
(2) Both the empty set and all of Spec R are varieties: V (0) = Spec R and V (1) = ∅.
!
[ \
(3) If {Si }i∈I is a collection of subsets in R, then V Si = V (Si ).
i∈I i∈I

(4) V (I ∩ J) = V (I) ∪ V (J) for any two ideals I and J in R.

Proof. (Left as an exercise for the reader. See Exercise 12.8.4.) 

This proposition shows that the collection of varieties V (S) form the closed sets of a topology
on Spec R.
672 CHAPTER 12. MULTIVARIABLE POLYNOMIAL RINGS

Definition 12.8.16
The topology defined on Spec R by taking the varieties V (S) as the closed subsets is called
the Zariski topology on Spec R.

The Zariski topology on the affine space AnK and on Spec R use the same name because they are
defined in such similar manner. Suppose that K is algebraically closed and let R = K[x1 , x2 , . . . , xn ].
Then the points in K n correspond to maximal ideals in R. These maximal ideals are contained in
Spec R, but Spec R includes many more elements, namely all prime ideals of R. The prime ideals
of K[x1 , x2 , . . . , xn ] correspond to irreducible varieties in AnK . Consequently, there is a bijective
correspondence between the elements in Spec R and the irreducible affine varieties in AnK .

Exercises for Section 12.8


1. Let (X, τ ) be a topological space. Show that any nonempty open subset of an irreducible set is
irreducible.
2. Show that f (x, y) = x2 + (y 2 − 1)2 is an irreducible polynomial in R[x, y] but that V(f ) is not an
irreducible variety in A2R .
3. Let V1 and V2 be two affine varieties in An
K such that V1 ⊆ V2 . Prove that K[V2 ] is a subring of K[V1 ].

4. Prove Proposition 12.8.15.



5. Prove that if I is a proper ideal in C[x1 , x2 , . . . , xn ], then I is the intersection of all maximal ideals
containing I.
6. Let R and S be commutative rings and let ϕ : R → S be a homomorphism. Prove that if Q ∈ Spec(S),
then ϕ−1 (Q) ∈ Spec(R). Prove that the resulting function ϕ−1 : Spec(S) → Spec(R) is a continuous
function.
7. Let K be a field and let V be an affine variety in An
K . Prove that V can be written as a finite union
of irreducible varieties V = V1 ∪ V2 ∪ · · · ∪ Vs .

12.9
Projects
Project I. Jordan Canonical Form and Gröbner Bases. Repeat the calculations in Ex-
ample 12.7.7 but with different matrices with different Jordan canonical forms. Do some
calculations with matrices that have nontrivial Jordan canonical blocks and matrices with
eigenspaces that are more than one-dimensional. From these calculations, deduce some pat-
terns that connect the Jordan canonical form and the generalized eigenspaces with the result
of the Gröbner basis calculations. Prove as much as you can pertaining to this connection.
Project II. Euclidean Geometry and Gröbner Bases. Use Gröbner bases methods to ap-
proach standard concepts in Euclidean geometry. For example, given two points, can you
find the equation for the bisector line? Given the coordinates of a triangle, can you give a
method to find the center and the radius of the circumscribed circle or the inscribed circle to
the triangle? Can you prove certain results from simply doing a Gröbner basis? For example,
can you prove that if C1 is a circle of center A intersects a circle C2 of center B in two points
D and E, then the line AB is perpendicular to DE? What other calculations and proofs can
you obtain?
Project III. Tangency and Gröbner Bases. Consider curves in the plane, curves in space,
or surfaces in space. Section 12.7 hinted at methods for finding tangent lines to curves in
the plane using Gröbner bases. Clearly explain this strategy and see if you can extend it with
examples or with theory to more general situations such as curves in space or surfaces in space.
In some examples, match the results obtained using Gröbner bases to methods that employ
12.9. PROJECTS 673

calculus. If you focus on curves in the plane, can you connect the results from Gröbner bases
techniques to the formula of slope arising from implicit differentiation?
Project IV. Distance between Lines. Use Gröbner bases techniques to obtain a formula of
the distance between two nonintersecting lines in R3 . If there are three lines in R3 , no two of
which intersect, there exists a sphere of least radius that is tangent to all three lines. Can you
find such a sphere for some interesting examples? Can you generalize your examples? Is this
sphere always unique?
Project V. Solving Nonlinear Recursive Functions. Consider a recursive relation on F = Fp
defined by
an = h(an , . . . , a2 , a1 )
for some h ∈ Fp [x1 , x2 , . . . , xn ]. A sequence satisfying the recurrence relation is completely
defined once we specify values for ai with 1 ≤ i ≤ n. Suppose that for a sequence (an )n≥1 ,
we know the terms an+1 , an+2 , . . . but not the initial values ai with 1 ≤ i ≤ n. Show how
Gröbner bases techniques allow us to solve for a1 , a2 , . . . , an , knowing subsequent terms in the
sequence. Give some interesting explicit examples.
Project VI. Envelopes of Families of Curves. Suppose that f (λ, x, y) ∈ R[λ, x, y] represents
a family of curves Cλ in R2 defined by f (λ, x, y) = 0. An envelope of a family of curves is
curve Γ such that for P ∈ Γ, there exists λ such that P ∈ Cλ and the tangent line to Γ at P
is the same as the tangent line to Cλ at P .
In [56], the author shows that the family of curves in R2 parametrized by (x, y) = (t cos α, (1 −
t) sin α) with parameter α and t ∈ R, has an envelope parametrized by (cos3 u, sin3 u). Note
that this particular example of family of curves can be rewritten as the lines

2λ(1 + λ2 )x + (1 − λ4 )y − (1 − λ2 )2 = 0,

turning this example into a family of curves described by a polynomial equation.


Discuss a strategy to find the envelope of a family of curves in R2 using Gröbner bases. Show
how the strategy works with the above example. Illustrate the strategy on at least two other
simple examples.
13. Categories

Throughout this book, we studied algebra with an emphasis on the concept of an algebraic structure.
As mentioned in the preface, the term “algebraic structure,” though not uncommon, does not have
a mathematically precise definition that would make it possible to say whether something is an
algebraic structure or not. Categories formalize and generalize what we previously called (for lack
of a better term) an algebraic structure.
Categories take one step further in abstraction. The objects of interest are no longer a single
group or a single ring, but the class of all groups or the class of all commutative rings. Consequently,
the theory of categories underscores the unity between different structures of interest in mathematics.
For the purposes of this textbook, this chapter serves as a culmination point. However, the theory
of categories is a rich theory with applications in every branch of mathematics. Consequently, this
chapter only offers an introduction that draws from and generalizes many of the constructions in
this textbook. Section 13.1 introduces the concept of a category and presents many examples. Then
Section 13.2 defines functors and shows examples again taken from this earlier parts of this book.
For further reading, we suggest the classic text [43].
Categories do not stand as the be all and end all of mathematical reasoning. Though categories
provide a consistent framework to develop theory for each new mathematical context, each category
presets its own intrinsically interesting theorems and areas of investigation (e.g., the Jordan-Hölder
program, the classification of finite simple groups, the study of divisibility in rings, the Jordan
canonical form, the Fundamental Theorem of Algebra, and countless other results).

13.1
Introduction to Categories
13.1.1 – Axioms for Categories

Definition 13.1.1 (Part 1)


A category C consists of the following data:
(1) a class of objects Ob(C);
(2) a class of arrows (also called morphisms) Arr(C). Each arrow f ∈ Arr(C) has a
domain in Ob(C) and a codomain in Ob(C). We write f : X → Y to refer to an arrow
f with domain X and codomain Y and we say that the arrow f goes from X to Y .
The class of morphisms from X to Y is denoted by HomC (X, Y ) (or by Hom(X, Y )
if the category is understood from context).
(3) a composition operation ◦ that defines a mapping

◦ : Hom(Y, Z) × Hom(X, Y ) −→ Hom(X, Z)


(g, f ) 7−→ g ◦ f

for any three objects X, Y, Z ∈ Ob(C).

675
676 CHAPTER 13. CATEGORIES

Definition 13.1.2 (Part 2)


The data of a category satisfy the following axioms:
(1) (Identity) For each object X of Ob(C) there is a morphism idX ∈ Hom(X, X) such
that

(left-identity) f ◦ idX = f ∀f ∈ Hom(X, Y ) and


(right-identity) idX ◦g = g ∀g ∈ Hom(Y, X).

(2) (Associativity) For all objects X, Y, Z, W in Ob(C), all arrows f : X → Y , g : Y → Z,


and h : Z → W , satisfy (h ◦ g) ◦ f = h ◦ (g ◦ f ).

Many authors denote a specific category by a boldfaced code or acronym that evokes the English
terminology that designates that category. This textbook follows this habit of notation. However,
these codes or acronyms are not universally standard.
Example 13.1.3 (Sets). In the category Set of sets, the objects consist of sets and the arrows
consist of functions between sets. In fact, the data of a category appears modeled after sets and
functions between sets, including the notion of composition of functions and an identity function
on each set. The fact that function composition is associative in the sense required by the category
axioms requires a small proof given in Proposition 1.1.15.
We point out one minor technicality with functions to or from the empty set. Recall that a
function f : A → B is defined as a relation from A to B such that for all a ∈ A, there exists a
unique b ∈ B with f (a) = b. For all sets X, this definition allows for one and only one function
f : ∅ → X, namely the empty function or the empty relation. On the other hand, because of this
definition, if X is nonempty, there exist no functions f : X → ∅, not even the empty function.
Hence, Hom(X, ∅) = ∅. 4

Example 13.1.4 (Groups). In the category Grp of groups, the objects consist of all groups (a
set G along with a binary operation ∗ on G satisfying the group axioms) and the arrows consist of
group homomorphisms. It is easy to check that these collections of objects and of arrows constitute
a category. 4

Example 13.1.5 (Rings). In the category Ring of rings, the objects consist of all rings and the
arrows consist of ring homomorphisms. 4

Taking a cue from set theory and group theory, we label the following properties of arrows.

Definition 13.1.6
In a category, an arrow/morphism f : X → Y is called
(1) an isomorphism or invertible if there exists a morphism g : Y → X such that f ◦ g =
idY and g ◦ f = idX ; the arrow g is called an inverse of f ;
(2) an endomorphism if X = Y , and the class of endomorphisms on X is denoted by
End(X);

(3) an automorphism if it is an isomorphism and an endomorphism, and the class of


automorphisms is denoted by Aut(X).

As examples, recall that in the category of sets, we called an isomorphism between sets a bijection
and, in set theory terms, an automorphism is a permutation.
These initial examples begin to illustrate how the formalism of the various algebraic structures
that formed the beginning of this textbook fall under the consistent framework of categories. In
the terminology for categories, the reader should recognize the terms morphism (shortened from
homomorphism), isomorphism, endomorphism, and automorphism.
13.1. INTRODUCTION TO CATEGORIES 677

Proposition 13.1.7
If f : X → Y is an invertible arrow, then it has a unique inverse arrow.

Proof. Suppose that g1 and g2 are two inverses of the arrow f . Then

g2 = idX ◦g2 = (g1 ◦ f ) ◦ g2 = g1 ◦ (f ◦ g2 ) = g1 ◦ idY = g1 . 

We call this unique inverse the inverse of f and denote it as f −1 .


The alternate term “arrow” is sometimes preferable to “morphism” in order to step back from
the strict algebraic context. Furthermore, the term “arrow” insinuates a graphical depiction of
the relationships within morphisms between objects in a category. For example, the composition
operation can be depicted by the diagram

f g

X Z
g◦f

which exhibits all relevant domains and codomains.


In a diagram of arrows, a sequence of consecutive arrows (the codomain of one arrow is the
domain of the next) is called a path. A diagram of arrows is called commutative if any two paths
starting from one object X and ending at an object Y correspond to equal functions in Hom(X, Y ).
For example, the axiom that the composition of arrows is associative can be restated by saying that
the following diagram commutes.

g◦f

f g h
X Y Z W

h◦g

Arrow diagrams can effectively depict many situations in category theory and, at times, even
offer visual proofs of certain relationships.

Definition 13.1.8
A category B is called a subcategory of a category C if every object in Ob(B) is in Ob(C)
and if for any two objects X and Y in Ob(B) every arrow f : X → Y in the category B is
also an arrow in the category C. A subcategory is called a full subcategory if HomB (X, Y ) =
HomC (X, Y ) for all objects X, Y in Ob(B).

The category AbGrp of abelian groups is a full subcategory of Grp. It is important to note
that though an abelian group has more axioms (requirements) than an arbitrary group, no extra
data is necessary to describe an abelian group. Furthermore, if G and H are abelian groups and if
ϕ : G → H is a group homomorphism, then ϕ is a homomorphism of abelian groups.
Consider the category RingId in which the objects are rings with an identity 1 6= 0 but in
which the arrows are ring homomorphisms ϕ : R → S such that ϕ(1R ) = 1S . The class of objects
Ob(RingId) is a subclass of Ob(Ring) but the class of arrows in RingId is a strict subclass of
Arr(Ring). (The ring homomorphism ϕ : Z → Z/6Z defined by ϕ(n) = 3̄n̄ does not map 1 to 1.)
Hence, RingId is a subcategory of Ring but not a full subcategory.
In contrast, consider the category Set∗ of pointed sets. The objects of Set∗ consist of a pair
(X, x0 ) where X is a set and x0 is a selected element in X. A morphism in Set∗ from (X, x0 )
678 CHAPTER 13. CATEGORIES

to (Y, y0 ) is defined as a function f : X → Y between sets such that f (x0 ) = y0 . The category
Set∗ is not a subcategory of Set for two reasons. First, the data of a pointed set consists of more
information than just a set; the object (X, x0 ) such that x0 ∈ X is not a set. Second, though part
of the data of an arrow from a pointed set (X, x0 ) to (Y, y0 ) consists of a function f : X → Y , not
every arrow in HomSet (X, Y ) is in HomSet∗ (X, Y ).

13.1.2 – A Comment on the Set Theoretic Aspects of Category Theory


The reader should observe that in the axioms for categories, we avoided calling the objects of a
category “the set of objects” or the “set” of morphisms. Instead, we used a more generic term of
class.
To see the purpose for this, consider the category of sets. The class of sets do not themselves
form a set; in other words, Ob(Set) is not a set. The Zermelo-Fraenkel (and Choice) axioms (ZFC)
of set theory disallow a universal set for two reasons. First, in ZFC, the existence of a set of all sets
U would permit the definition a paradoxical set V = {x ∈ U | x ∈ / x}. The question “Is V ∈ V ?”
is a version of Russell’s paradox: If V ∈ V , then V ∈ / V and vice versa, which is a contradiction.
Second, one of the axioms in ZFC is that every set has a power set. If U is the set of all sets, then
P(U ) = U , but this contradicts Cantor’s Theorem: |P(S)| > |S| for all sets S.
Using the more general concept of a class avoids these logical pitfalls but removes some structure
of sets. In light of this issue, we classify categories according to the following labels.

Definition 13.1.9
A category C is called a small category if Ob(C) and Arr(C) are sets and is called large
otherwise. A large category C is called locally small if for any two objects X and Y , the
set of arrows HomC (X, Y ) is a set.

The category Set is a large category but is locally small; in fact, for any two sets X and Y , each
function f : X → Y corresponds to a subset of X × Y , so Hom(X, Y ) is a subset of P(X × Y ).
Questions about whether a category is large or small or locally small often involve challenging
questions in the foundations of set theory, which are beyond the scope of this text.

13.1.3 – Examples of Categories


There is no better way to communicate how many different types of categories arise naturally in
mathematical investigation except to give a number of examples.

Example 13.1.10 (Vector Spaces). Let F be a field. Vector spaces over F form a category VecF
where the objects are vector spaces and the arrows of VecF are linear transformations between
vector spaces. The terms isomorphisms, endomorphisms, and automorphisms in general categories
are consistent with the terminology from linear algebra. 4

Example 13.1.11 (Posets). Posets form a category Poset. Objects consist of posets, i.e., a pair
(S, 4) where S is a set and 4 is a partial order on S. (See Definition 1.4.1.) The arrows Arr(Poset)
consist of monotonic functions between posets. (See Definition 1.4.17.) 4

Example 13.1.12 (Left R-Modules). Let R be a ring. Left R-modules form a category denoted
by LModR . The arrows are left R-module homomorphisms. Note that if R and S are nonisomorphic
rings, then the category of left R-modules and left S-modules are distinct categories. Not only might
the objects be different but the homomorphisms satisfy different rules in that they are linear with
respect to different scalars. 4

Example 13.1.13 (Group Actions). Let G be a group. Group actions can be viewed as a cate-
gory SetG in which an object of Ob(SetG ) is a set S along with a pairing G × S → S satisfying the
axioms of group actions (Definition 8.1.1). Morphisms of group actions are G-invariant functions or
homomorphisms of G-sets, defined in Definition 8.1.16. 4
13.1. INTRODUCTION TO CATEGORIES 679

The following three examples discuss categories occurring regularly in calculus and analysis.
Though it is simple to define these categories, the study of their properties constitutes a fundamental
theme for branches of analysis and topology.
Example 13.1.14. Consider the category whose objects consist of open subsets of R and whose
arrows consist of continuous functions between open subsets of R. Much of the study of continuous
functions occurs in this category. In order for this data to form a category, the composition of two
continuous functions must be continuous, which is a nontrivial theorem. When we study differen-
tiable functions, we work in the subcategory consisting again of open subsets of R but in which the
morphisms consist of continuously differentiable functions between open subsets of R. 4
Example 13.1.15 (Metric Spaces). At an abstract level, much of geometry deals with a set
equipped with a notion of distance. Recall the notion of a metric space from Section 3.9.1. The
category MetSp of metric spaces consists of objects that are metric spaces and the arrows are
isometries between metric spaces. 4
Example 13.1.16 (Topological Spaces). Section 12.8.2 introduced the concept of a topological
space as background behind the Zariski topology. Topological spaces along with continuous functions
between them as the arrows form a category, often labeled Top. As a point of terminology, in
topology, an isomorphisms between two topological spaces is called a homeomorphism. 4
The subsequent examples of categories illustrate the flexibility in what a category can be.
Example 13.1.17 (Empty). The empty category, sometimes denoted 0 is the category consisting
of no objects and no arrows. 4
Example 13.1.18 (Category of a Poset). Every poset is itself a category in the following sense.
Let S = (S, 4) be a poset. The objects Ob(S) consist of the elements of the set S and there exists a
single arrow x → y between two elements x, y ∈ S if and only if x 4 y. In other words, HomS (x, y)
is the empty set if x 64 y but HomS (x, y) contains a single arrow whenever x 4 y. 4
Example 13.1.19 (Ordinal Numbers). Suppose that we denote n = {1, 2, . . . , n}. Under the
category of a poset as described in the previous example, the sets 1, 2, . . ., and N are categories
when equipped with the usual inequality ≤ partial order. For example, 5 = {1, 2, 3, 4, 5} is a category
with 5 objects and a unique arrow f : a → b in Hom(a, b) if a ≤ b. For each a ∈ n, the unique arrow
in Hom(a, a) is the identity arrow. 4

Definition 13.1.20
A category C is called discrete if for any two objects X and Y in Ob(C),
(
{idX } if X = Y
HomC (X, Y ) =
∅ 6 Y.
if X =

Example 13.1.21 (Directed Graphs). Every directed graph defines a category in the following
sense. Recall Definition 10.11.1 for a directed graph, also called a quiver. In the category Cat(Q)
associated to a directed graph Q = (V, E, h, t) the objects are the vertices (elements of V ) and the
arrows consist of all the paths in Q. Recall that the paths consist of stationary paths ev , one for
each vertex v ∈ V (these are the identity morphisms) and all sequences of arrows strung together
head to tail. For example, the category associated to the following directed graph
Y

f g

X Z
h
680 CHAPTER 13. CATEGORIES

has three objects, namely X, Y , and Z and exactly 7 arrows, namely eX , eY , eZ , f, g, h, gf . The
existence of the composition gf is implied by the axioms for categories so from the given diagram
we assume that h and gf are distinct arrows with gf not explicitly pictured, except as a path of two
directed edges. If an arrow has an inverse, it is not uncommon to depict an arrow f and its inverse
f −1 as a double arrow edge. 4

Example 13.1.22. Let F be a field. Consider the category C of vector spaces over a field F
equipped with a bilinear form. The objects of C are obvious from our description: a pair (V, h , i),
where V is a vector space over the field F and where h , i : V × V → F is a bilinear form. However,
the arrows of this category are not explicitly defined. Though the definition of a category does not
impose how to define the arrows of C it is natural to define the morphisms as follows. A morphism
from (V, h , iV ) to (W, h , iW ) is a linear transformation T : V → W such that

hv1 , v2 iV = hT (v1 ), T (v2 )iW for all v1 , v2 ∈ V.

As a particular example, if · is the dot product on Rn , the automorphisms T of (Rn , ·) are precisely
the orthogonal linear transformations on Rn . If the matrix of T with respect to the standard basis
is A, then
v1 · v2 = T (v1 ) · T (v2 ) ⇐⇒ [v1 ]> [v2 ] = [v1 ]> A> A[v2 ]

for all v1 , v2 ∈ Rn . This implies that A> A = I, which is the definition of an orthogonal matrix. 4

Example 13.1.23 (Morphisms of a Category). Let C be any category. Define the category
Mor(C) of morphisms of C as follows. The objects in Mor(C) are the arrows of C, i.e., Ob(Mor(C)) =
Arr(C). An arrow ϕ in Mor(C) from f : A → B to g : C → D consists of a pair (ϕdom : A →
C, ϕcod : B → D) that make the following diagram commutative.

ϕdom
A C

f g

B ϕcod D

In other words, ϕcod ◦ f = g ◦ ϕdom . We leave it as an exercise to prove that Mor(C) is a category,
namely, that this definition of arrows on Mor(C) implies the existence of an identity arrow and a
composition operation that is associative. (See Exercise 13.1.23.) 4

13.1.4 – Monic, Epic, Initial, Terminal

Definition 13.1.24
An arrow a : X → Y in a category C is called
(1) monic if whenever two arrows f, f 0 : U → X satisfy a ◦ f = a ◦ f 0 , then f = f 0 ;

(2) epic if whenever two arrows g, g 0 : Y → Z satisfy g ◦ a = g 0 ◦ a, then g = g 0 .

In other words, an arrow is monic when it is left cancellable and it is epic when it is right
cancellable. The etymology for each of these adjectives come from the Greek prefixes of mono-,
which means “alone” or “single,” and epi-, which means “over” or “onto.” These reason for the
use of these adjectives comes from the following characterization of monic and epic arrows in the
category of sets.
13.1. INTRODUCTION TO CATEGORIES 681

Proposition 13.1.25
In Set, monic morphisms are precisely injective functions and epic morphisms are surjective
functions.

Proof. First suppose that m : X → Y be a monic arrow in Set. Let x, x0 ∈ X with m(x) = m(x0 ).
Consider the a set U = {1} of a single element and the two functions f, f 0 : U → X such that
f (1) = x and f 0 (1) = x0 . Clearly m ◦ f = m ◦ f 0 since it is equal as a function on all inputs. Since m
is monic, then f = f 0 and hence x = f (1) = f 0 (1) = x0 . Thus, m is injective. Conversely, suppose
that m : X → Y is an injective function. Consider two injective functions such that m ◦ f = m ◦ f 0 .
Then m(f (x)) = m(f 0 (x)) for all x ∈ X. Since m is injective, we deduce that f (x) = f 0 (x) for all
x ∈ X, which means that f = f 0 . Thus, m is monic.
For epics, first suppose that e : X → Y be an epic arrow in Set. Consider the two functions
g, g 0 : Y → {1, 2} defined as g(y) = 1 and
(
0 1 if y ∈ Im e
g (y) = ξIm e =
2 if y ∈ / Im e.

By construction, g ◦ e(x) = g 0 ◦ e(x) for all x ∈ X, so g ◦ e = g 0 ◦ e. Since e is epic, then g = g 0 .


This is only possible if Im e = Y , which means that e is surjective. Conversely, suppose that e is a
surjective function and suppose that g ◦ e(x) = g 0 ◦ e(x) for all x ∈ X. Let y ∈ Y be arbitrary and
let x ∈ X such that e(x) = y. Then g ◦ e(x) = g 0 ◦ e(x) implies that g(y) = g 0 (y). Thus, g = g 0 .
Hence, e is epic. 

It is not uncommon to use the symbol a : X ,→ Y in diagrams to indicate a monic arrow.


Especially in the context of algebraic structures, we sometimes call a monic arrow a monomorphism.
In the same way, it is not uncommon to use the symbol a : X  Y in diagrams to indicate an epic
arrow. In the context of algebraic structures, we sometimes call an epic arrow an epimorphism.

Definition 13.1.26
Let C be a category.
(1) An object I in Ob(C) is called initial if for each object X in Ob(C), there exists
exactly one morphism f : I → X.

(2) An object T in Ob(C) is called terminal if for each object X in Ob(C), there exists
exactly one morphism g : X → T .

Example 13.1.27. As mentioned in Example 13.1.3, there exists exactly one function ∅ → X for
all sets X. Hence, ∅ is an initial object in Set. If a set A contains at least one element a, then
g, g 0 : A → {1, 2} that satisfy g(a) = 1 and g 0 (a) = 2 present two distinct functions with domain A.
Hence, ∅ is the only initial object in Set.
For terminal objects, note that ∅ cannot be terminal since Hom(X, ∅) = ∅ if X is not empty.
However, all singleton sets {a} are terminal since Hom(X, {a}) consist of the constant function
f (x) = a whenever X 6= ∅ and Hom(∅, {a}) is the empty function. Conversely, if A contains more
than one element, then Hom(X, A) contains at least two constant functions. Thus, the terminal
objects in Set are precisely the singleton sets. 4

Example 13.1.28. As another example, consider the category ID of integral domains, which is a
full subcategory of Ring. In ID, the ring (Z, +, ×) is an initial object. Let R be an integral domain
and suppose that ϕ : Z → R is a ring homomorphism. By properties of ring homomorphisms,
ϕ(0) = 0. By Exercise 5.4.19, ϕ(1) = 1R . But then, for all positive n, ϕ(n) = n · 1R , and for all
negative n, ϕ(n) = −(|n| · 1R ). Thus, there exists a unique ring homomorphism from Z to R. 4
682 CHAPTER 13. CATEGORIES

Exercises for Section 13.1


1. Explain why Grp is not a subcategory of Set.
2. Prove that Mor(C) as defined in Example 13.1.23 is a category.
3. Prove that commutative rings CRing form a full subcategory of Ring. Prove also that the category
Field of fields is a full subcategory of Ring.
4. Let C have objects that are open sets in R and morphisms that are differentiable functions between
open sets. Prove that C is a category.
5. Let F be a field. Define SubVecF as consisting of objects that are a pair (U, V ) where V is an
vector space over F and U is a subspace of V , and arrows T : (U1 , V2 ) → (U2 , V2 ) such T is a linear
transformation T : V1 → V2 satisfying T (U1 ) ⊆ U2 .
(a) Prove that SubVecF is a category.
(b) Describe isomorphisms and automorphisms in this category.
6. Show that a category of exactly one object is equivalent to a monoid.
7. Prove that the composition of two monic arrows is again a monic arrow. Also prove that the compo-
sition of two epic arrows is again epic.
8. In each of the following categories, prove that an arrow is monic if and only if it is an injection
homomorphism: (a) Grp; (b) Ring; (c) LModR .
9. In each of the following two categories, determine the initial and terminal objects, if they exist: (a)
LModR ; (b) Set∗ .
10. Using a directed graph notation, list all categories with (a) 3 arrows; (b) 4 arrows.
11. Show that in the category Set∗ of pointed sets, all singleton sets are both initial and terminal objects.
12. Let Q = (V, E, h, t) be a directed graph (or quiver). In the category Cat(Q), describe necessary and
sufficient conditions for: (a) objects (vertices) to be initial; (b) objects to be terminal; (c) arrows to
be monic; (d) arrows to be epic.
13. Let Fieldp be the category of fields of characteristic p (where p is a prime or 0). Show that Fp is the
unique initial object of Fieldp if p is a prime. Show that Q is the unique initial object in Field0 .

13.2
Functors
There are many constructions in mathematics in which we associate one object to another object
for the purposes of studying properties of the first object. These objects will almost always exist in
the context of certain categories. This general principle of mapping an object in one cateogry to an
object in another is codified with the concept of functors.

13.2.1 – Definition of a Functor

Definition 13.2.1
Let A and B be two categories. A covariant functor from A to B is a rule F : A → B that
to each object X in Ob(A) associates a unique object F (X) in Ob(B), and to each arrow
f : X → Y in Arr(A) associates a unique arrow F (f ) : F (X) → F (Y ) such that
(1) (identity) F (idX ) = idF (X) for each object X in Ob(A);
(2) (composition) if f : X → Y and g : Y → Z are two arrows in Arr(A), then

F (g ◦ f ) = F (g) ◦ F (f ).
13.2. FUNCTORS 683

Example 13.2.2. Consider the category Set of sets. The power set rule is a covariant functor
P : Set → Set that to each set X associates the power set P(X) and to each function f : X → Y ,
associates the function P(f ) : P(X) → P(Y ) defined by
P(f )(A) = {f (a) ∈ Y | a ∈ A}
for all A ∈ P(X). In this specific instance, P(f )(A) is often written more simply just as f (A). It
is easy to see that P(idX ) = idP(X) . Furthermore, if f : X → Y and g : Y → Z are functions and
A ⊆ X, then
P(g ◦ f )(A) = {g ◦ f (a) ∈ Z | a ∈ A} = P(g)({f (a) ∈ Y | a ∈ A}) = (P(g) ◦ P(f ))(A). 4

Example 13.2.3. Consider the category Grp of groups. Suppose we simply ignored the group
structure and only considered the set theoretic structure of groups and homomorphisms. This
mental process is called the forgetful functor from Grp to Set, which forgets the group structure
and only remembers the set structure.
The forgetful functor exists from many algebraic structures to Set. 4
The label of covariant means that the functor preserves the direction of arrows when mapping
from one category to another. Plenty of constructions are similar but reverse the order. These are
called contravariant functors.

Definition 13.2.4
Let A and B be two categories. A contravariant functor from A to B is a rule F : A → B
that to each object X in Ob(A) associates a unique object F (X) in Ob(B), and to each
arrow f : X → Y in Arr(A) associates a unique arrow F (f ) : F (Y ) → F (X) such that

(1) F (idX ) = idF (X) for each object X in Ob(A);


(2) if f : X → Y and g : Y → Z are two arrows in Arr(A), then

F (g ◦ f ) = F (f ) ◦ F (g).

Example 13.2.5. Let V and W be vector spaces over a field F and let T : V → W be a linear
transformation. Recall that the dual of a vector space V is the vector space V ∗ of linear transfor-
mations HomF (V, F ). The dual of T is the linear transformation T ∗ : W ∗ → V ∗ that for all µ ∈ W ∗
returns the element T ∗ (µ) ∈ V ∗ , which is defined by
T ∗ (µ)(v) = µ(T (v)) for all v ∈ V.

It is an easy proof to show that T is indeed a linear transformation. Consider the identity function
idV : V → V . For any functional λ ∈ V ∗ ,
id∗V (λ)(v) = λ(idV (v)) = λ(v)
so id∗V (λ) = λ and hence id∗V = idV ∗ . If S : U → V and T : V → W are linear transformations, then
for all u ∈ U and all µ ∈ W ∗ ,
(T ◦ S) ∗ (µ)(u) = µ((T ◦ S)(u)) = (µ ◦ T )(S(u)) = T ∗ (µ)(S(u))
= S ∗ (T ∗ (µ))(u) = (S ∗ ◦ T ∗ )(µ)(u).
Thus, (T ◦ S)∗ = S ∗ ◦ T ∗ . This shows that taking the dual of a vector space is a contravariant
functor VecF → VecF . 4

Proposition 13.2.6
A functor, whether covariant or contravariant, from a category A to a category B transforms
an isomorphism in A to an isomorphism in B.
684 CHAPTER 13. CATEGORIES

Proof. We prove the case for covariant functors since the proof for contravariant functors is similar.
Let F : A → B be a covariant functor and let f : X → Y be an isomorphism in A with inverse
g : Y → X. By definition of the inverse, g ◦ f = idX and f ◦ g = idY . Using both axioms for
functors, we deduce that
F (g) ◦ F (f ) = F (g ◦ f ) = F (idX ) = idF (X) ,
F (f ) ◦ F (g) = F (f ◦ g) = F (idY ) = idF (Y ) .
Thus, F (f ) : F (X) → F (Y ) is an isomorphism. 
With the definition of functors between categories, the class of all categories is itself a category.
More precisely CatCo, in which objects are categories and arrows are covariant functors is a category
as well as CatCon, in which objects are categories and arrows are contravariant functors.
Applying categorical terms to CatCo or CatCon, a covariant (resp. contravariant) functor F :
A → B between categories is called an isomorphism if there exists a covariant (resp. contravariant)
functor G : B → A such that G(F (X)) = X for all X ∈ Ob(A), G(F (f )) = f for all f ∈ Arr(A),
F (G(Y )) = Y for all Y ∈ Ob(B), F (G(h)) = h for all h ∈ Arr(B). If there exists an isomorphism
between two categories, we say that the categories are isomorphic.

13.2.2 – Further Examples of Functors


We revisit a number of constructions encountered throughout this book to check whether they
correspond to functors.
Example 13.2.7. The group of units of a ring is a construction that for any ring with an identity
returns a group. This has the smell of a functor but we need to check details. One issue is that
the ring needs an identity. So we define a category RingW1 of rings with an identity in which the
morphisms are ring homomorphisms ϕ : R → S such that ϕ(1R ) = 1S . The process of defining
the group of units is a functor U : RingW1 → Grp in which U (R) is the multiplicative group of
units in R. Note that if r1 r1 = 1R , then ϕ(r1 )ϕ(r2 ) = 1S so units in R are mapped to units in
S. Consequently, if ϕ : R → S is a ring homomorphism, then U (ϕ) : U (R) → U (S) is the group
homomorphism with U (ϕ)(r) = ϕ(r). It is easy to prove that this mapping satisfies the axioms of a
functor. 4
Example 13.2.8. Consider the process that takes a vector space V over a field F and returns the
general linear group GLF (V ). This might also look like a functor because to each vector space
it associates a group. However, we need to consider how to map linear transformations to group
homomorphisms.
g
V V

T ? T

W W
?

In the above diagram, given a linear transformation T : V → W and an invertible linear transfor-
mation g : V → V , there is not a natural way to define an invertible linear transformation W → W
based on T and g. In the diagram, there is no natural function from W to V , especially if T is not
an isomorphism.
With group homomorphisms, it is always possible to define a trivial homomorphism from one
group to another. In the same vein, we could try to construct a functor (that would not be too
interesting) by defining GL(T ) : GL(V ) → GL(W ) as the group homomorphism with GL(T )(g) = 1
in GL(W ) for all linear transformations T and all g ∈ GL(V ). However, this violates the first axiom
of functors that requires that GL(idV ) = idGL(V ) , so that GL(idV )(g) = g instead of 1 in GL(V ).
Hence, the process of constructing the general linear group is not a functor. 4
13.2. FUNCTORS 685

Example 13.2.9. Fix a positive integer n. In contrast to the previous example, consider the rule
of taking a ring R and returning the matrix ring Mn (R) as described in Section 5.3. For any ring
homomorphism ϕ : R → S define Mn (ϕ) : Mn (R) → Mn (S) by
 
ϕ(a11 ) ϕ(a12 ) · · · ϕ(a1n )
 ϕ(a21 ) ϕ(a22 ) · · · ϕ(a2n ) 
Mn (ϕ)(A) =  .
 
.. .. .. 
 .. . . . 
ϕ(an1 ) ϕ(an2 ) · · · ϕ(ann ),

for all matrices A = (aij ). We need to prove first that Mn (ϕ) is a ring homomorphism. It is easy to
see that
Mn (ϕ)(A + B) = Mn (ϕ)(A) + Mn (ϕ)(B).
Suppose that C = AB, where A, B ∈ Mn (R), then the (i, j)th entry of C is
n
X
cij = aik bkj .
k=1

Thus, then (i, j)th entry of Mn (ϕ)(C) is


n
! n
X X
ϕ(cij ) = ϕ aik bkj = ϕ(aik )ϕ(bkj ),
k=1 k=1

which is the (i, j)th entry of Mn (ϕ)(A)Mn (ϕ)(B). Thus, Mn (ϕ) is a ring homomorphism. This
rule will also satisfy the two axioms of a functor so that the process Mn is a covariant functor from
Ring to Ring. For n ≥ 2, this functor is not an isomorphism of categories since Mn (R) is not
commutative for any ring with |R| ≥ 2. 4

Example 13.2.10. A group G can by viewed as a category of one object O in which every arrow
is invertible. The elements of G are the arrows of the category and the composition of the arrows
gives the group operation. Let K be a field and consider a covariant functor F : G → VectK . The
functor F maps the object F (O) to a vector space V over K and every group element, which we view
as an arrow g : O → O, is mapped to a linear transformation F (g) : V → V . Since functors map
isomorphisms to isomorphisms, then F (g) is an invertible linear transformation. Thus, a functor
from G to VectK is precisely a representation of G, as discussed in Section 8.6. 4

Example 13.2.11. Let E be a field and consider the category SubField(E) of subfields of E,
which is a category by virtue of the poset structure by containment. For a subfield K of E, consider
the rule of constructing the group Aut(E/K) of automorphisms of E that fix K. An arrow L → K
in SubField(E) corresponds so containment L ⊆ K. For each arrow L ⊆ K, we can define the
morphism of injection ϕL⊆K : Aut(E/K) → Aut(E/L) because every automorphism of E that fixes
K also fixes L.
For each field extension K ⊆ E in the SubField(E), the group homomorphism ϕK⊆K :
Aut(E/K) → Aut(E/K) is the identity. This is the first axiom of functors. The second axiom
of functors also holds since the composition of injective group homomorphisms is another injective
group homomorphism. Hence, the rule that constructs the automorphism group F(K) = Aut(E/K)
is a contravariant functor from SubField(E) to Grp. 4

Definition 13.2.12
Let A, B, and C be categories. Suppose that for any two objects A in A and B in B, a rule
F returns an object F (A, B) in C such that F (A, ) is a functor from B to C and F ( , B) is
a functor from A to C. Such a rule is called a bifunctor from A × B to C.
686 CHAPTER 13. CATEGORIES

Example 13.2.13. Let R be a ring. By Proposition 10.4.11, for any left R-modules M and N , the
set HomR (M, N ) of left R-module homomorphisms from M to N is another left R-module. We will
show that the rule HomR ( , ) is a bifunctor that is covariant in the second entry and contravariant
in the first.
For a fixed left R-module M , consider the functor FM (X) = HomR (M, X) for any left R-module
X. If ϕ : X → X 0 is a module homomorphism, then we define

FM (ϕ) : HomR (M, X) −→ HomR (M, X 0 )


g 7−→ ϕ ◦ g.

It is easy to verify the identity and composition axioms for a covariant functor.
On the other hand, for a fixed left R-module N , consider the functor F N (Y ) = HomR (Y, N ) for
any left R-module Y . If ψ : Y → Y 0 is a module homomorphism, then we define

F N (ψ) : HomR (Y 0 , N ) −→ HomR (Y, N )


f 7−→ f ◦ ψ.

Again, it is easy to verify the identity and composition axioms for a covariant functor. 4

13.2.3 – The Category of Functors


Let A and B be two categories. We can view the covariant (resp. contravariant) functors as a
category CoFunc(A, B) (resp. ConFunc(A, B)). The morphisms in this category, which are also
called natural transformations, are defined as follows. If F, G : A → B are two functors, then a
morphism between functors H : F → G is a rule that to each object X of A associates a morphism
HX : F (X) → G(X) such that for any arrow f : X → Y , the following diagram is commutative:

HX
F (X) G(X)

F (f ) G(f )

F (Y ) G(Y )
HY

In this way, we can speak of an isomorphism of functors.

Example 13.2.14. Let Q = (V, E, h, t) be a directed graph and consider the associated category
Cat(Q) as described in Example 13.1.21. Consider the category C of covariant functors from Cat(Q)
to the category of vectors spaces VectK , where K is a field. We will show that the category C is
isomorphic as categories to LModK[Q] , the category of modules over the path algebra K[Q].
The data for a functor F : Cat(Q) → VectK gives a K-vector space F (v) for every v ∈ V
and gives a linear transformation F (a) : F (t(a)) → F (h(a)) for each directed edge a ∈ E. By
Proposition 10.11.4, this is precisely the data for a left K[Q]-module. Now let F and G be two
functors Cat(Q) → VectK . In the category of functors, a morphism f from F to G is a rule h that
to each vertex v ∈ V (objects in Cat(Q)) associates a linear transformation fv : F (v) → G(v) such
that for each directed edge a ∈ E, the following diagram is commutative:

ft(a)
F (t(a)) G(t(a))

F (a) G(a)

F (h(a)) G(h(a))
fh(a)
13.2. FUNCTORS 687

By Proposition 10.11.6, this is precisely the data of a homomorphism between K[Q]-modules. Con-
sequently, there is a category isomorphism between the category of functors from Cat(Q) to VectK
and LModK[Q] . 4

Exercises for Section 13.2


1. Prove that the rule that takes a ring (R, +, ×) and returns the group (R, +) is a functor from Ring →
AbGrp (where AbGrp is the category of all abelian groups).
2. Let ID be the category of integral domains (a full subcategory of rings) and let Field be the category
of fields. Let R be an integral domain. Prove that the process F (R) = D−1 R, where D = R − {0},
which gives the field of fractions of R, is a functor F : ID → Field.
3. Prove that the rule of taking a commutative ring R and returning the new commutative ring R[x] is
a functor CRing → CRing.
4. Consider the rule that for each group G returns the set Sub(G) of subgroups of G. Note that Sub(G)
is a poset with the partial order ≤ of subgroup. Prove that this rule is a functor from Grp → Poset.
Clearly explain how this functor maps group homomorphisms to monotonic functions.
5. Prove that the Grothendieck construction described in Subsection 3.11.3 is a functor from the category
of monoids to the category of groups.
6. Consider the category Grp of groups.
(a) Consider the construction which to each group G associates its center Z(G). Show that this does
not define a functor in AbGrp.
(b) Consider the construction F which to each group G associates F (G) = G/G0 . Prove that this is
a functor and describe what this rule should do to group homomorphisms.
7. Consider the rule F that to each ring R returns the set of ideals. Prove that F : Ring → Set
is a contravariant functor, where for all ring homomorphisms ϕ : R → S, the functor morphism is
F (ϕ) = ϕ−1 : F (S) → F (R).
8. Show that the direct sum is a bifunctor Grp × Grp → Grp.
9. Explain clearly how the functor to construct matrix rings, described in Example 13.2.9, is a bifunctor
N∗ × Ring → Ring.
10. Let FinGrp be the category of finite groups. Prove that the process of constructing a group ring is
a bifunctor CRing × FinGrp → Ring.
11. Let R be a commutative ring. Prove that tensor product operation on R-modules is a bifunctor
ModR × ModR → ModR .
12. Prove that the category of functors from 2 to Grp is isomorphic to the category of morphisms of
groups.
13. Prove that Spec, the process of passing to the prime spectrum of a commutative ring, is a contravariant
functor from CRing to Top. [Hint: Find the proposition(s) and exercise(s) in this text that establish
this.]
A. Appendices

A.1
The Algebra of Complex Numbers
A.1.1 – Complex Numbers
When studying solutions to polynomials, one quickly encounters equations that do not have real
roots. One of the simplest examples is the equation x2 + 1 = 0. Since the square of every real
number is nonnegative, there exists no real number such that x2 = −1. The complex numbers begin
with “imagining” that there exists a number i such that i2 = −1. Historically, the strangeness of the
mental leap led to calling i the imaginary unit. Algebraists then assumed that for all other algebraic
properties, i interacted with the real numbers just like any other real number. The powers of the
imaginary unit i are
i1 = i, i2 = −1, i3 = −i, i4 = 1, i5 = i, . . .
and so forth.
An expression of the form bi, where b is a real number, is called an imaginary number and a
complex number is an expression of the form a + bi, where a, b ∈ R. The set of complex numbers is
denoted by C. It is not uncommon to denote a complex variable by a letter z. If z = a + bi, we call
a the real part of z, denoted a = <(z), and we call b the imaginary part of z, denoted by b = =(z).
With this definition, every quadratic equation has a root. For example, applied to 2x2 +5x+4 = 0,
the quadratic formula gives the following solutions:
√ √ √
−b ± b2 − 4ac −5 ± 25 − 32 5 7
x= = =− ± i.
2a 4 4 4
The quadratic formula shows that whenever a + bi is a root of an equation with real coefficients,
then a − bi is also a root. So in some sense, these two complex numbers are closely related. If
z = a + bi, we call the number a − bi, the conjugate of z, and denote it by z.
Since a complex number involves two independent real numbers, C is usually depicted by a
Cartesian plane with <(z) on the x-axis and =(z) as the y-axis. This is called the complex plane.
Figure A.1 shows a few complex numbers.
Just as polar coordinates are useful for analytic geometry, so they are also useful in the study of
complex numbers. The absolute value |z|, also called the√modulus, of a complex number z = a + bi
is the distance r from the origin to z, namely |z| = r = a2 + b2 . The argument of z is the angle θ
of polar coordinates of the point (a, b). Using polar coordinates of a complex number, we write

z = r(cos θ + i sin θ). (A.1)

with r ≥ 0. As with polar coordinates, though we typically consider θ ∈ [0, 2π), the argument θ can
be any real number.
For example, the absolute value and the argument of 3 + 2i are

 
p
2 2 −1 2
|3 + 2i| = 3 + 2 = 13 and θ = tan .
3
The absolute value and the argument of −12 + 5i are
 
p
−1 5
| − 12 + 5i| = 122 + 52 = 13 and θ = tan − + π.
12

689
690 APPENDIX A. APPENDICES

=(z)
1 + 3i

−π + 1.4i

<(z)
− 45 − 74 i

Figure A.1: Cartesian representation of C

Without discussing the issue of convergence of series, using the power series of known functions,
observe that
∞ ∞
X (iθ)k X θk
eiθ = = ik
k! k!
k=0 k=0
   
k k
X θ  X θ 
= (−1)k/2  + i  (−1)(k−1)/2 
 
k! k!
k≥0 k≥0
k even k odd
∞ ∞
! !
X θ2n n
X θ2n+1
n
= (−1) +i (−1)
n=0
(2n)! n=0
(2n + 1)!
= cos θ + i sin θ.

Consequently, it is common to write the polar form (A.1) of a complex number as

z = reiθ ,

where r = |z| and θ is the argument of z. A few examples of complex numbers in polar form are
−1 = e−iπ or i = eiπ/2 .

A.1.2 – Operations in C
The addition of two complex numbers is defined as
def
(a + bi) + (c + di) = (a + c) + (b + d)i.

Since the real part and the imaginary part act as x-coordinates and y-coordinates for vectors in
R2 and since the addition of complex numbers is done component-wise, the addition of complex
numbers is identical to the addition of vectors in R2 . The subtraction of two complex numbers is
(a + bi) − (c + di) = (a − c) + (b − d)i.
The product of two complex numbers is expressed in Cartesian coordinates as
def
(a + bi)(c + di) = ac + adi + bci + bd(−1) = (ac − bd) + (ad + bc)i.

With this definition, if z = a + bi, then

z z = (a + bi)(a − bi) = a2 + b2 = |z|2 .


A.2. LISTS OF GROUPS 691

Besides this identity, the formula for multiplication does not readily appear to have interesting
geometric interpretation. However, the polar coordinate expression of complex numbers leads im-
mediately to an interpretation of the product. We have
(r1 eiθ1 )(r2 eiθ2 ) = r1 r2 ei(θ1 +θ2 ) ,
so the product of two complex numbers z1 and z2 has absolute value |z1 z2 | = |z1 ||z2 | and has an
argument that is the sum of the arguments of z1 and z2 .
For division of two complex numbers, let z1 = a + bi = r1 eθ1 i and z2 = c + di = r2 eθ2 i 6= 0 be
two complex numbers. Then the two expressions of division are
z1 a + bi (a + bi)(c − di)
= =
z2 c + di c2 + d 2
r1
= e(θ1 −θ2 )i .
r2
Using the complex conjugate, we can express the inverse of a complex number as
1 z z
z −1 = = = 2.
z zz |z|
From the multiplication operation, if z = reiθ , where r ≥ 0 and θ is an angle, then for all integers
n ∈ Z, the powers of z are z n = rn einθ . The argument θ is equivalent to any angle θ + 2πk for any
k ∈ Z. Consequently, all of the following complex numbers

n
rei(θ+2πk)/n for k = 0, 1, 2, . . . , n − 1, (A.2)
have an nth power equal to z = reiθ . Consequently, the n numbers in (A.2) are the nth roots of z.
As one example, consider the cube roots of i. We write i = eiπ/2 so the cube roots of i are
√ √
3 1 5iπ/6 3 1
e iπ/6
= + i, e i(π/6+2π/3)
=e =− + i, ei(π/6+4π/3) = e3iπ/2 = −i.
2 2 2 2
As another example, we calculate the square roots of z = 1 + 3i from the polar form
√ −1
z = 10ei tan (3) .
Hence, the square roots of z are
√ −1 √ −1 √ −1
10ei tan (3)/2 10ei tan (3)/2+iπ
= − 10ei tan (3)/2 .
4 4 4
and
Combining various trigonometric identities we get
  s     s  
1 −1 1 1 1 1 1
cos tan r = 1+ √ and sin tan−1 r = 1− √ .
2 2 1 + r2 2 2 1 + r2
Then the two square roots of 1 + 3i are
s  s  !
√ √ √
 q q 
4 1 1 1 1 1
± 10 1+ √ +i 1− √ = ±√ 10 + 1 + i 10 − 1 .
2 1 + 32 2 1 + 32 2

A.2
Lists of Groups
This section provides various lists of groups according to a classifying property.

A.2.1 – Groups Classified by Order with |G| ≤ 24


The following table lists all the groups of small order and indicates where the group (or family of
groups) first appears in the text.
692 APPENDIX A. APPENDICES

Order Abelian Y/N Groups and Notes


1 Abelian {1}
2 Abelian Z2 (see Examples 3.2.15 and 3.8.2 for Zn )
3 Abelian Z3
4 Abelian Z4 , Z2 ⊕ Z2 (∼ = V4 , called the Klein-4 group)
5 Abelian Z5
6 Abelian Z6
Nonabelian D3 (see Section 3.1 for Dn )
7 Abelian Z7
8 Abelian Z8 , Z4 ⊕ Z2 , Z2 ⊕ Z2 ⊕ Z2
Nonabelian D4 , Q8 (see Example 3.3.12)
9 Abelian Z9 , Z 3 ⊕ Z3
10 Abelian Z10
Nonabelian D5
11 Abelian Z11
12 Abelian Z12 , Z6 ⊕ Z2
Nonabelian D6 , A4 (see Example 3.5.6 for An ), Z3 o Z4 (Exercise 4.3.15)
13 Abelian Z13
14 Abelian Z14
Nonabelian D7
15 Abelian Z15
16 Abelian Z16 , Z8 ⊕ Z2 , Z4 ⊕ Z4 , Z4 ⊕ Z2 ⊕ Z2 , Z2 ⊕ Z2 ⊕ Z2 ⊕ Z2
Nonabelian D4 ⊕ Z2 , Q8 ⊕ Z2 , D8 , QD16 (Exercise 3.8.9),
modular group M16 (Exercise 4.3.17), (Z4 ⊕ Z2 ) o Z2 , Z4 o Z4
(D4 ⊕ Z4 )/h(r2 , z 2 )i, generalized quaternion Q16
17 Abelian Z17
18 Abelian Z18 , Z6 ⊕ Z3
Nonabelian D9 , D3 ⊕ Z3 , (Z3 ⊕ Z3 ) o Z2
19 Abelian Z19
20 Abelian Z20 , Z10 ⊕ Z2
Nonabelian D10 , Z5 o Z4 , F20 (Exercise 4.3.16)
21 Abelian Z21
Nonabelian Z7 o Z3 (G2 in Example 3.8.6)
22 Abelian Z22
Nonabelian D11
23 Abelian Z23
24 Abelian Z24 , Z12 ⊕ Z2 , Z6 ⊕ Z2 ⊕ Z2
Nonabelian S4 , D12 , D3 ⊕ Z4 , D4 ⊕ Z3 , Q8 ⊕ Z3 , D6 ⊕ Z2 , A4 ⊕ Z2 ,
(Z3 o Z4 ) ⊕ Z2 , Z3 o D4 , Z3 o Z8 , SL2 (F3 )
ha, b, c | a6 = b2 = c2 = abc = 1i

Useful classification results that support the above table are


• groups of order 4 (Example 3.3.10);
• groups of order 8 (Example 3.3.11);
• groups of order p, where p is prime (Proposition 4.1.14): Zp ;
• groups of order 2p, where p is prime (Exercise 4.1.35): Z2p and Dp ;
• Fundamental Theorem of Finitely Generated Abelian Groups: All abelian groups of a given
order are determined by Theorem 4.5.11 (and Theorem 4.5.18).
List of Notations

Symbol Explanation and page reference


BASIC SETS
∅ the empty set
N the set of nonnegative integers
Z the set of integers
Q the set of rational numbers
R the set of real numbers
C the set of complex numbers

S where S is any of the above sets, S − {0}
Q≥0 , R≥0 the set of nonnegative rational (respectively real) numbers
[a, b] the interval of real numbers x satisfying a ≤ x ≤ b
Fun(A, B) the set of functions from A to B
idS the identity function on a set S, i.e., idS (s) = s for all s ∈ S
C 0 (I, R) the set of continuous real-valued functions from the interval I
C n (I, R) the set of real-valued functions from the interval I whose first n deriva-
tives are continuous
SET THEORY
x∈A x is an element of the set A, 1
x∈
/A x is not an element of the set A, 1
A⊆B A is a subset of B, 3
A(B A is a proper subset of B, 3
A*B A is not a subset of B, 3
A∪B the union of A and B, 3
A∩B the intersection of A and B, 4
[
Ai the union of an arbitrary collection of sets Ai , indexed by the set I, 6
i∈I
\
Ai the intersection of an arbitrary collection of sets Ai , indexed by the set
i∈I I, 6
A the complement of A, 4
A−B the set difference of B from B, 4
A4B the symmetric difference of A and B, 4
P(S) the power set of a set S, 6
f :A→B f is a function from a set A to a set B, 7
g◦f the composition of the function g with the function f , 8
−1
f the inverse function of a bijective function f , 9
|A| the cardinality (number of elements if A is finite) of a set, 10
A×B Cartesian product of sets A and B, 14
[a]∼ the ∼-equivalence class of a, 22

693
694 APPENDIX A. APPENDICES

S/ ∼ the quotient set of S by ∼, the set of ∼ equivalence classes in S, 22


4 (generic) partial order symbol, 28
4T the partial order symbol restricted to a subset, 31
glb(A) the greatest lower bound of a subset A, 31
lub(A) the least upper bound of a subset A, 31
NUMBER THEORY
a|b a divides b, 44, 267
gcd(a, b) the greatest common divisor of a and b, 46
lcm(a, b) the least common multiple of a and b, 49
ordp (n) the order of the prime p in the integer n, 51, 295
φ(n) Euler’s totient function (Euler’s φ-function), 51
a ≡ b (mod n) a is congruent to b modulo n, 54
Z/nZ the set of congruence classes in modular arithmetic modulo n, 55
a the congruence class of a (in Z/nZ), 54
U (n) the set of invertible elements in Z/nZ, 57
Fp the finite field on p (prime) elements, 59
LINEAR ALGEBRA
Span(S) the subspace of a vector space spanned by the set S
[v]B the column vector of coordinates of a vector v in a finite-dimensional
vector space V with respect to an ordered basis B
[T ]B
B0 the matrix of a linear transformation T : V → W with respect to an
ordered basis B of V and an ordered basis B 0 of W
SPECIFIC GROUPS
Dn the dihedral group on n elements, 79
GLn (F ) the general linear group on n × n matrices with coefficients in the field
F , 79
Zn the cyclic group on n elements, 81
V4 the Klein-4 group, isomorphic to Z2 ⊕ Z2 , 88
Q8 the quaternion group, 89
Sn the symmetric group on n elements, 91
An the alternating group on n elements, 101
SLn (F ) the special linear group on n × n matrices with coefficients in the field
F , 102
GROUP THEORY
|g| the order of a group element, 84
G⊕H the direct sum group of the groups G and H, 80
inv(σ) the number of inversions of the permutation σ, 97
H≤G H is a subgroup of the group G, 101
Z(G) the center of the group G, 103
CA (G) the centralizer of the subset A in the group G, 103
NA (G) the normalizer of the subset A in the group G, 104
hSi the subgroup generated by the subset S, 105
Tor(G) the torsion subgroup of an abelian group G, 107
A.2. LISTS OF GROUPS 695

Ker ϕ the kernel of a homomorphism (groups, rings), 115, 235


Im ϕ the image of a homomorphism (groups, rings), 115, 235
Aut(G) the automorphism group of a group G, 119
Inn(G) the group of inner automorphisms of G, 123
hS | Ri group presentation with generators in S and relations R, 124
(Q, Σ, T ) a state machine triple, 155
|G : H| index of the subgroup H in G, number of left cosets of H, 164
HK product of subgroups HK = {hk | h ∈ H and k ∈ K} 167
N EG N is a normal subgroup of G, 171
G/N the quotient group of G by a normal subgroup N , 179
0
G the commutator subgroup of a group G, 185
Sylp (G) the set of Sylow p-subgroups of a group G, 409
np (G) the number of Sylow p-subgroups of a group G, 409
[x, y] the commutator of two elements x, y ∈ G, 434
[H, K] the commutator of two subgroups H, K ≤ G, 434
G0 the commutator subgroup of a group G, 434
H oϕ K the semidirect product of H with K by ϕ, 446
L oρ K the wreath product of K on L by ρ, 451
RING THEORY
char(R) the characteristic of a ring, 209
R⊕S the direct sum of rings R and S, 210
U (R) the group of units in a ring R, 210
N (R) the subset of nilpotent elements in a ring R, 211
C(R) the center of a ring R, 213
R[S] the ring generated by the ring and elements in the set S, 216
R[x] the ring of polynomials in x with coefficients in the ring R, 218
R(x) the ring of rational expressions in x with coefficients in the ring R, 278
deg p(x) the degree of the polynomial p(x), 218
LT(p(x)) the leading term of the polynomial p(x), 218
LC(p(x)) the leading term of the polynomial p(x), 218
R[x1 , x2 , . . . , xn ] the multivariable polynomial ring with coefficients in the ring R and
with variables x1 , x2 , . . . , xn , 221
R[G] the group ring of the ring R and group G, 221
R[[x]] the ring of formal power series with coefficients in a ring R, 224
R((x)) the ring of formal Laurent series with coefficients in a ring R, 240, 280
Mn (R) the ring of n × n matrices with entries in the ring R, 225
GLn (R) the group of units U (Mn (R)), where R is a ring, 227
det A the determinant of a matrix A, 227
Supp(f ) the support of a function into an abelian group, 237
Funf s (S, R) set of functions from S to R of finite support, 237
RA, AR, RAR left (resp. right, two-sided) ideal general by A, 243
(r1 , r2 , . . . , rn ) the ideal in a ring generated by the elements r1 , r2 , . . . , rn , 242
I +J the sum of ideals I and J, 245
696 APPENDIX A. APPENDICES

IJ the product of ideals I and J, 245



I the radical of the ideal I, 246
NR the nilradical of the commutative ring R, 249
(I : J) the fraction ideal of ideals I by J, 246
R/I the quotient ring of the ring R by the ideal I, 250
a'b a and b are associates in a commutative ring, 268
N (α) the norm of a ring element: definition, 270; of Hamilton’s quaternions,
212; quadratic norms, 270
D−1 R the ring of fractions of R with denominators D, 276
OK the ring of algebraic integers in a field extension K of Q, 312
FIELD THEORY
F (α) the least field extension of a field F that includes an element α, 322
[K : F ] the degree of the extension of K over F , [K : F ] = dimF K, 324
K/F K is a field extension of F , 329
mα,F (x) the minimal polynomial of α over F , 329
Alg(L/F ) the set of subfield of L algebraic over F , 335
K1 K2 the composite field of K1 and K2 , 335
ζn the complex root of unity e2πi/n , 356
Φn (x) the nth cyclotomic polynomial, 357
µ(n) the Möbius function on positive integers, 360
F the algebraic closure of a field F , 369
Q the field of algebraic numbers, 311
Fq the finite field of q = pn elements, 374
Dx (p(x)) the derivative polynomial of p(x), 371
σp the Frobenius endomorphism on a field of characteristic p, 373
GROUP ACTIONS
g · x, gx the action of the group element g on an element x, 380
G X the group G acts on the set X, 380
Gx the stabilizer in G of the element x, 391
g
X the subset of X fixed by the element g, 393
G{B} , G(B) the setwise and pointwise stabilizers of subset B ⊆ X, 397
VECTOR SPACES
HomF (V, W ) the set (vector space) of all F -linear transformations from V to W , 482

V the dual of a vector space V , 483
End(V ) the set of endomorphisms on V , i.e., linear transformations V → V ,
485
GL(V ) the general linear group of the vector space V , 483
MODULES
Ann(I) the annihilator of the ideal I, 492
Ann(N ) the annihilator of the submodule N , 492
Tor(M ) the subset of torsion elements in a module M , 493
HomR (M, N ) the set of R-module homomorphism from M to N , 499
EndR (M ) the set of R-module endomorphisms from M to itself, 499
A.2. LISTS OF GROUPS 697

Ca(x) the companion matrix associated to the polynomial a(x), 526


Jλ,m the m × m Jordan block matrix of eigenvalue λ, 533
GALOIS THEORY
Aut(K/F ) the group of automorphisms of a field K that fix a subfield F , 557
Gal(K/F ) the Galois group of a Galois field extension, 562
Fix(K, H) the subfield of K fixed by the subgroup H ≤ Aut(K), 560
NK/F (α) the norm of α from K to F , 573
TrK/F (α) the trace of α from K to F , 574
MULTIVARIABLE POLYNOMIALS
mdeg xα the multidegree of the monomial xα 1 α2 αn
1 x2 · · · xn , 620
V(S) the affine variety of the subset S ⊆ F [x1 , x2 , . . . , xn ], 621
I(Z) the ideal of polynomials vanishing on a set Z ⊆ F n , 624
LT(p) the leading term of p ∈ F [x1 , x2 , . . . , xn ] with respect to some monomial
order, 634
LC(p) the leading coefficient of p ∈ F [x1 , x2 , . . . , xn ] with respect to some
monomial order, 634
LM(p) the leading monomial of p ∈ F [x1 , x2 , . . . , xn ] with respect to some
monomial order, 634
rem (f, G) the remainder of f when divided by the s-tuple of polynomials G, 636
S(f, g) the S-polynomial of f, g ∈ F [x1 , x2 , . . . , xn ] with respect to some mono-
mial order, 644
CATEGORIES
Ob(C) the collection of objects in the category C, 675
Arr(C) the collection of arrows in the category C, 675
HomC (X, Y ) the set of arrows (morphisms) from X to Y , 675
Bibliography

[1] William A. Adams and Philippe Loustaunau. An Introduction to Gröbner Bases, volume 3 of
Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, 2005.
[2] George E. Andrews. The Theory of Partitions. Encyclopedia of Mathematics and its Applica-
tions. Cambridge University Press, Cambridge, U.K., 1998.
[3] Vladimir I. Arnold. Ordinary Differential Equations. MIT Press, Cambridge, MA, 1973.
[4] Michael Aschbacher and Stephen D. Smith. The Classification of Quasithin Groups. I. Struc-
ture of Strongly Quasithin K-groups, volume 111 of Mathematical Surveys and Monographs.
American Mathematical Society, Providence, RI, 2004.
[5] Michael Aschbacher and Stephen D. Smith. The Classification of Quasithin Groups. II. Main
Theorems: The Classification of Simple QTKE-groups, volume 112 of Mathematical Surveys
and Monographs. American Mathematical Society, Providence, RI, 2004.
[6] Michael F. Atiyah and Ian G. MacDonald. Introduction to Commutative Algebra. Perseus
Books, Reading, MA, 1969.
[7] Michale J. Bardzell and Kathleen M. Shannon. The PascGalois triangle: A tool for visualizing
abstract algebra. In Allen C. Hibbard and Ellen J. Maycock, editors, Innovations in Teach-
ing Abstract Algebra, number 60 in MAA Notes, pages 115–123. Mathematical Association of
America, Washington, DC, 2002.
[8] Nathan Bliss, Ben Fulan, Stephen Lovett, and Jeff Sommars. Strong divisibility, cyclotomic
polynomials, and iterated polynomials. The American Mathematical Monthly, 120(6):519–536,
2013.
[9] N. N. Bogolyubov, G. K. Mikhailov, and A. P. Yushkevich, editors. Euler and Modern Science.
The MAA Tercentenary Euler Celebration. Mathematical Association of America, Washington,
DC, 2007.
[10] William W. Boone. The word problem. Proceedings of the National Academy of Sciences,
44:1061–1065, 1958.
[11] William E. Boyce and Richard C. DiPrima. Elementary Differential Equations. John Wiley
and Sons, Inc., New York, 9th edition, 2009.
[12] William Burnside. Theory of Groups of Finite Order. Cambridge University Press, Cambridge,
2nd edition, 1911.
[13] John Clough. A rudimentary geometric model for contextual transposition and inversion. Jour-
nal of Music Theory, 42(2):297–306, 1998.
[14] John Conway, Rob Curtis, Simon Norton, Richard Parker, and Robert Wilson. Atlas of Finite
Simple Groups. Oxford University Press, Oxford, 1986.
[15] David A. Cox. Galois Theory. John Wiley & Sons, Inc., Hoboken, NJ, 2004.
[16] David A. Cox, John B. Little, and Donal O’Shea. Ideals, Varieties, and Algorithms. Springer-
Verlag, New York, 2nd edition, 1997.
[17] David A. Cox, John B. Little, and Donal O’Shea. Using Algebraic Geometry, volume 185 of
Graduate Texts in Mathematics. Springer-Verlag, New York, 2nd edition, 2005.

699
700 BIBLIOGRAPHY

[18] Alissa S. Crans, Thomas M. Fiore, and Ramon Satyendra. Musical actions of a dihedral group.
The American Mathematical Monthly, 116(6):479–495, 2009.

[19] Richard Dedekind. Über der Theorie der ganzen algebraischen Zahlen. Friedr. Vieweg & Sohn,
Braunschweig, 1964. With a foreword by B. van der Waerden.

[20] John D. Dixon and Brian Mortimer. Permutation Groups, volume 163 of Graduate Texts in
Mathematics. Springer-Verlag, New York, 1996.

[21] Murray Eisenberg. Axiomatic Theory of Sets and Classes. Holt, Rinehart and Winston, New
York, 1971.

[22] David Eisenbud. Commutative Algebra with a View Toward Algebraic Geometry, volume 150
of Graduate Texts in Mathematics. Springer-Verlag, New York, 1995.

[23] Walter Feit and John G. Thompson. Solvability of groups of odd order. Pacific Journal of
Mathematics, 13:775–1029, 1963.

[24] Allen Forte. The Structure of Atonal Music. Yale University Press, New Haven, CT, 1973.

[25] Abraham A. Fraenkel and Yehoshua Bar-Hillel. Foundations of Set Theory. North-Holland
Publishing Company, Amsterdam, 1958.

[26] Ralf Fröberg. An Introduction to Gröbner Bases. Wiley, New York, 1997.

[27] William Fulton. Young Tableaux: With Applications to Representation Theory and Geome-
try, volume 35 of London Mathematical Society Student Texts. Cambridge University Press,
Cambridge, U.K., 1996.

[28] Carl Friedrich Gauss. Disquisitiones Arithmeticae. Springer-Verlag, New York, 1986. Translated
and with a preface by Arthur A. Clarke, Revised by William C. Waterhouse, Cornelius Greither,
and A. W. Grootendorst and with a preface by Waterhouse.

[29] Aleksandr Gelfond. Sur le septième problème de Hilbert. Bulletin de l’Académie des Sciences
de l’URSS, 4:623–634, 1934.

[30] Daniel Gorenstein. The classification the finite simple groups. I. simple groups and local analysis.
Bulletin of the AMS. New Series, 1(1):43–199, 1979.

[31] Daniel Gorenstein. Classifying the finite simple groups. Bulletin of the AMS, 14(1):1–98, 1986.

[32] Branko Grunbaum and Geoffrey Shephard. Tilings and Patterns. W.H. Freeman, New York,
1990.

[33] Paul R. Halmos. Naive Set Theory. D. Van Nostrand, Princeton, NJ, 1960.

[34] Ján Haluska. The Mathematical Theory of Tone Systems. CRC Press, New York, 2003.

[35] G. H. Hardy and E. M. Wright. An Introduction to the Theory of Numbers. Oxford University
Press, New York, 6th edition, 2008.

[36] Glyn Harmon. Prime-Detecting Sieves. Princeton University Press, Princeton, NJ, 2007.

[37] Joe Harris. Algebraic Geometry: A First Course, volume 133 of Graduate Texts in Mathematics.
Springer-Verlag, New York, 1992.

[38] Robin Hartshorne. Algebraic Geometry, volume 52 of Graduate Texts in Mathematics. Springer-
Verlag, New York, 1977.

[39] Horst Herrlich. Axiom of Choice. Springer-Verlag, Berlin, 2006.


BIBLIOGRAPHY 701

[40] James E. Humphreys. Reflection Groups and Coxeter Groups. Cambridge University Press,
Cambridge, U.K., 1992.
[41] T. Y. Lam and K. H. Leung. On the cyclotomic polynomial φpq (x). The American Mathematical
Monthly, 103(7):562–564, 1996.
[42] I. G. Macdonald. Symmetric Functions and Hall Polynomials. Oxford Mathematical Mono-
graphs. Oxford University Press, New York, 1999.
[43] Saunders MacLane. Categories for the Working Mathematician. Graduate Texts in Mathemat-
ics. Springer-Verlag, New York, 1971.
[44] Hideyuki Matsumura. Commutative Ring Theory, volume 8 of Cambridge Studies in Advanced
Mathematics. Cambridge University Press, New York, 1986.
[45] James McKay. Another proof of Cauchy’s group theorem. The American Mathematical Monthly,
66(2):119, 1959.
[46] Richard A. Mollins. Algebraic Number Theory. Chapman & Hall, Boca Raton, FL, 1999.
[47] J. Donald Monk. Introduction to Set Theory. McGraw-Hill, New York, 1969.
[48] Patrick Morandi. Field and Galois Theory. Graduate Texts in Mathematics. Springer-Verlag,
New York, 1996.
[49] Peter M. Neumann. A lemma that is not Burnside’s. Mathematical Scientist, 4(2):133–141,
1979.
[50] P. S. Novikov. On the algorithmic unsolvability of the word problem in group theory. Proceedings
of the Steklov Institute of Mathematics, 44:1–143, 1955. In Russian.
[51] Graham Priest. An Introduction to non-Classical Logic. Cambridge University Press, Cam-
bridge, U.K., 2001.
[52] R. Remmert. The fundamental theorem of algebra. In Numbers, volume 123 of Graduate Texts
in Mathematics, chapter 4. Springer-Verlag, New York, 1990.
[53] Kenneth H. Rosen. Elementary Number Theory and Its Applications. Addison Wesley, New
York, 5th edition, 2005.
[54] R. L. Roth. On extensions of Q by square roots. The American Mathematical Monthly,
78(4):392–393, 1971.
[55] Herman Rubin and Jean E. Rubin. Equivalents of the Axiom of Choice. North-Holland Pub-
lishing, Amsterdam, 1963.
[56] John R. Rutter. Geometry of Curves. Chapman and Hall/CRC, Boca Raton, FL, 2000.
[57] C. Edward Sandifer. How Euler Did It. The MAA Tercentenary Euler Celebration. Mathemat-
ical Association of America, Washington, DC, 2007.
[58] Stewart Shapiro. Philosophy of Mathematics: Structure and Ontology. Oxford University Press,
Oxford, U.K., 1997.
[59] Robert R. Stoll. Introduction to Set Theory and Logic. W.H. Freeman, San Francisco, 1963.
[60] C. L. F. von Lindemann. Über die Zahl π. Mathematische Annalen, 20:213–225, 1882.
[61] Robert A. Wilson. The Finite Simple Groups, volume 251 of Graduate Texts in Mathematics.
Springer-Verlag, New York, 2009.
[62] Martin M. Zuckerman. Sets and Transfinite Numbers. Macmillan Publishing, New York, 1974.
Mathematics

LOVETT
ABSTRACT ALGEBRA ABSTRACT ALGEBRA
STRUCTURES AND APPLICATIONS
STRUCTURES AND APPLICATIONS

Abstract Algebra: Structures and Applications helps you understand the abstraction of modern algebra. It

ABSTRACT ALGEBRA
emphasizes the more general concept of an algebraic structure while simultaneously covering applications.

The book presents the core topics of structures in a consistent order:

• Definition of structure
• Motivation
• Examples
• General properties
• Important objects
• Description
• Subobjects
• Morphisms
• Subclasses
• Quotient objects
• Action structures
• Applications

The text uses the general concept of an algebraic structure as a unifying principle and introduces other
algebraic structures besides the three standard ones (groups, rings, and fields). Examples, exercises,
investigative projects, and entire sections illustrate how abstract algebra is applied to areas of science and
other branches of mathematics.

Features
• Emphasizes the general concept of an algebraic structure as a unifying principle instead of just focusing
on groups, rings, and fields
• Describes the application of algebra in numerous fields, such as cryptography and geometry
• Includes brief introductions to other branches of algebra that encourage you to investigate further
• Provides standard exercises as well as project ideas that challenge you to write investigative or
expository mathematical papers
• Contains many examples that illustrate useful strategies for solving the exercises

STEPHEN LOVETT
K23698

w w w. c rc p r e s s . c o m

You might also like