IntroCompOrg Preview
IntroCompOrg Preview
Organization
with x86-64 Assembly Language & GNU/Linux
Robert G. Plantz
Sonoma State University
bob.cs.sonoma.edu
December 2013
Copyright Notice
Copyright ©2008, ©2009, ©2010, ©2011, ©2012, ©2013 by Robert G. Plantz. All rights reserved.
The author has used his best efforts in preparing this book. The author makes no warranty of any kind,
expressed or implied, with regard to the programs or the documentation contained in this book. The author
shall not be liable in any event from incidental or consequential damages in connection with, or arising out of,
the furnishing, performance, or use of these programs.
All products or services mentioned in this book are the trademarks or service marks of their respective
companies or organizations. Eclipse is a trademark of Eclipse Foundation, Inc.
Preface xiv
1 Introduction 1
1.1 Computer Subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 How the Subsystems Interact . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Computer Arithmetic 30
3.1 Addition and Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2 Arithmetic Errors — Unsigned Integers . . . . . . . . . . . . . . . . . . . . . 36
3.3 Arithmetic Errors — Signed Integers . . . . . . . . . . . . . . . . . . . . . . . 37
3.4 Overflow and Signed Decimal Integers . . . . . . . . . . . . . . . . . . . . . . 42
3.4.1 The Meaning of CF and OF . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.5 C/C++ Basic Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.5.1 C/C++ Shift Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.5.2 C/C++ Bit Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.5.3 C/C++ Data Type Conversions . . . . . . . . . . . . . . . . . . . . . . . 53
3.6 Other Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.6.1 BCD Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.6.2 Gray Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4 Logic Gates 61
4.1 Boolean Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2 Canonical (Standard) Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.3 Boolean Function Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.3.1 Minimization Using Algebraic Manipulations . . . . . . . . . . . . . . 68
4.3.2 Minimization Using Graphic Tools . . . . . . . . . . . . . . . . . . . . . 70
4.4 Crash Course in Electronics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.4.1 Power Supplies and Batteries . . . . . . . . . . . . . . . . . . . . . . . . 78
4.4.2 Resistors, Capacitors, and Inductors . . . . . . . . . . . . . . . . . . . 78
i
CONTENTS ii
5 Logic Circuits 91
5.1 Combinational Logic Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.1.1 Adder Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.1.2 Ripple-Carry Addition/Subtraction Circuits . . . . . . . . . . . . . . . 94
5.1.3 Decoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.1.4 Multiplexers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.2 Programmable Logic Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.2.1 Programmable Logic Array (PLA) . . . . . . . . . . . . . . . . . . . . . 101
5.2.2 Read Only Memory (ROM) . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.2.3 Programmable Array Logic (PAL) . . . . . . . . . . . . . . . . . . . . . 103
5.3 Sequential Logic Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.3.1 Clock Pulses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.3.2 Latches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.3.3 Flip-Flops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.4 Designing Sequential Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.5 Memory Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.5.1 Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.5.2 Shift Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.5.3 Static Random Access Memory (SRAM) . . . . . . . . . . . . . . . . . 124
5.5.4 Dynamic Random Access Memory (DRAM) . . . . . . . . . . . . . . . 126
5.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
16 Input/Output 394
16.1Memory Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
16.2I/O Device Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
16.3Bus Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
16.4I/O Interfacing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
16.5I/O Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
16.6Programming Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
16.7Interrupt-Driven I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
16.8I/O Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
16.9Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
Bibliography 539
Index 540
List of Figures
vi
LIST OF FIGURES vii
9.1 Assembler listing file for the function shown in Listing 9.7. . . . . . . . . . 223
9.2 General format of instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
9.3 REX prefix byte. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
9.4 ModRM byte. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
9.5 SIB byte. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
9.6 Machine code for the mov from a register to a register instruction. . . . . . 227
9.7 Machine code for the mov immediate data to a register instruction. . . . . 228
9.8 Machine code for the add immediate data to the A register . . . . . . . . . 229
9.9 Machine code for the add immediate data to a register . . . . . . . . . . . . 229
9.10Machine code for the add immediate data to a register instruction. . . . . 230
9.11Machine code for the add register to register instruction. . . . . . . . . . . 230
11.1Arguments and local variables in the stack frame, sumInts function. . . . 272
11.2Arguments 7 – 9 are passed on the stack to the sumNine function. . . . . . 277
11.3Arguments and local variables in the stack frame, sumNine function. . . . 278
11.4Overall layout of the stack frame. . . . . . . . . . . . . . . . . . . . . . . . . . 282
11.5Calling function’s stack frame, 32-bit mode. . . . . . . . . . . . . . . . . . . . 286
13.1Memory allocation for the variables x and y from the C program in Listing
13.6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
7.1 Effect on other bits in a register when less than 64 bits are changed. . . . 157
ix
LIST OF TABLES x
12.1Bit patterns (in binary) of the ASCII numerals and the corresponding 32-bit
ints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
12.2Register usage for the mul instruction. . . . . . . . . . . . . . . . . . . . . . . 309
12.3Register usage for the div instruction. . . . . . . . . . . . . . . . . . . . . . . 315
12.4Instructions to set up the dividend registers. . . . . . . . . . . . . . . . . . . 317
xi
LISTINGS xii
This book introduces the concepts of how computer hardware works from a program-
mer‘s point of view. A programmer‘s job is to design a sequence of instructions that
will cause the hardware to perform operations that solve a problem. This book looks at
these instructions by exploring how C/C++ language constructs are implemented at the
instruction set architecture level.
The specific architecture presented in this book is the x86-64 that has evolved over
the years from the Intel 8086 processor. The GNU programming environment is used,
and the operating system kernel is Linux.
The basic guidelines I followed in creating this book are:
• One should avoid writing in assembly language except when absolutely necessary.
It may seem strange that I would recommend against assembly language program-
ming in a book largely devoted to the subject. Well, C was introduced in 1978 specifically
for low-level programming. C code is much easier to write and to maintain than assembly
language. C compilers have evolved to a point where they produce better machine code
than all but the best assembly language programmers can. In addition, the hardware
technology has increased such that there is seldom any significant advantage in writing
the most efficient machine code. In short, it is hardly ever worth the effort to write in
assembly language.
You might well ask why you should study assembly language, given that I think you
should avoid writing in it. I believe very strongly that the best programmers have a
good understanding of how computer hardware works. I think this principle holds in
most fields: the best drivers understand how automobiles work; the best musicians
understand how their instrument works; etc.
So this is not a book on how to write programs in assembly language. Most of the
programs you will be asked to write will be in assembly language, but they are very
simple programs intended to illustrate the concepts. I believe that this book will help
you to become a better programmer in any programming language, even if you never
write another line of assembly language.
Two issues arise immediately when studying assembly language:
• I/O interaction with a user through even the keyboard and screen is a very complex
problem, well beyond the programming expertise of a beginner.
• There is an almost endless variety of instructions that can be used.
xiv
PREFACE xv
There are several ways to deal with these problems in a textbook. Some books use a
simple operating system for I/O, e.g., MS-DOS. Others provide libraries of I/O functions
that are specific for the examples in the book. Several textbooks deal with the instruction
set issue by presenting a simplified “idealized” architecture with a small number of
instructions that is intended to illustrate the concepts.
In keeping with the “real world” criterion of this book, it deals with these two issues
by:
1. showing you how to call the I/O functions already available in the C Standard
Library, and
Assumed Background
You should have taken an introductory class in programming, preferably in C, C++, or
Java. The high-level language used in this book is C, however all the C programming
is simple. I am confident that the C programming examples in Chapters 2 and 3 will
provide sufficient C programming concepts to make the rest of the book very usable,
regardless of the language you learned in your introductory class.
I believe that more experienced programmers who wish to write for the x86-64
architecture can also benefit from reading this book. In principle, these programmers
can learn everything they need to know from reading the appropriate manuals. However,
I have found that it is usually helpful to have an overview of a new architecture before
tackling the manuals. This book should provide that overview. In this sense, I believe
that this book can provide a good “introduction” to using the manuals.
Additional Resources
I maintain additional resources related to this book, including an errata, on my website,
bob.cs.sonoma.edu. I welcome your feedback ([email protected]), especially any
errors or confusing writing that you see in the book. I use such feedback, mostly from
students, to constantly improve the book.
Development Environment
Most developers use an Integrated Development Environment (IDE), which hides the
process of building a program from source code. In this book we use the component
programs individually so that you can see what is taking place.
The examples in this book were compiled or assembled on a computer running Ubuntu
12.04. The development programs used were:
• gcc version 4.7.0
• as version 2.22
PREFACE xvii
In most cases compilation was done with no optimization (-O0) because the goal is to
study concepts, not create the most efficient code.
The examples should work in any x86_64 GNU development environment with gcc
and as (binutils) installed. However, the machine code generated by the compiler
may differ depending on its specific configuration and version. You will begin looking at
compiler-generated assembly language in Chapter 7. What you see in your environment
may differ from the examples in this book, but the differences should be consistent as
you continue through the rest of the book.
You should also keep in mind that the programs used for development may have bugs.
Yes, nobody is perfect. For example, when I upgraded my Ubuntu system from 9.04 to
9.10, the GNU assembler was upgraded from 2.19 to 2.20. The newer version had a bug
that caused the line numbering in a particular listing file to start from 0 instead of 1. (It
affected the C source code in Listing 7.6 on page 160; the numbers have been corrected
in this listing.) Fortunately, this bug did not affect the quality of the final program, but it
could cause some confusion to the programmer.
Bit-level logical and shift operations are covered in Chapter 12. The multiplication
and division instructions are also discussed.
Arrays and structs are discussed in Chapter 13. This chapter includes a discussion
of how simple C++ objects are implemented at both the C and the assembly language
level.
Until this point in the book we have been using integers. In Chapter 14 we introduce
formats for storing fractional values, including some IEEE 754 formats. In 64-bit mode
the gcc compiler uses SSE2 instructions for floating point, but x87 instructions are used
in 32-bit mode. The chapter gives an introduction to both instruction sets.
Exceptions and interrupts are discussed in Chapter 15. Chapter 16 is an introduction
to hardware level I/O. Since most students will never do I/O at this level, this is another
chapter that could be skipped.
A summary of the instructions used in this book is provided in Appendix A.5. At this
point, there is only a list of the instructions. Eventually, there will be a description of
each of them.
Appendix B is a highly simplified discussion of the fundamental concepts of the make
facility.
Appendix C provides a very brief tutorial on using gdb for assembly language pro-
grams.
Appendix D gives a very brief introduction to the gcc syntax for embedding assembly
language in a C function.
Almost all the solutions to the chapter exercises are provided in Appendix E. These
can be useful for students who wish to use the exercises for self study; if you find
yourself getting stuck on a problem, peek at the solution for some hints. Instructors are
encouraged to discuss these solutions with their students. There is much to be learned
from looking at another person’s solution and thinking about how you might do it better.
The Bibliography lists a small fraction of the many books I have consulted when
learning this material. I urge you to look at this list of books. I believe that you will want
at least some of them in your reference library.
Suggested Usage
• Our course at Sonoma State University covers each chapter approximately in the
book’s order. The programming exercises in Chapters 2 and 3 get the students
used to using the lab right from the beginning of the course. Hardware simulators
are used in the lab for Chapters 4 and 5.
Acknowledgements
I would like to thank the many students who have taken assembly language from me.
They have asked many questions that caused me to think about the subject and how I
can better explain it. They are the main reason I have written this book.
Three students deserve special thanks, David Tran, Zack Gold, and Jim O’Hara. They
used this book in a class taught by Mike Lyle at Santa Rosa Junior College, David in
Fall 2010, Zack in Fall 2011, and Jim in Fall 2013. All three caught many of my typos
and errors and gave me many helpful suggestions for clarifying my writing. I am very
grateful for their careful reading of the book and the time they spent providing me with
comments. It is definitely a better book as a result of their diligence.
I wish to thank Richard Gordon, Lynn Stauffer, Allan B. Cruse, Michael Lyle, Suzanne
Rivoire, and Tia Watts for their thorough proofreading and critique of the previous
versions of this book. By teaching from this book they have caught many of my errors
and provided many excellent suggestions for clarifying the presentation.
I appreciate the work of those who volunteer their time to develop and maintain the
software I used to create this book: GNU, Linux, LATEX 2ε , etc.
In addition, I would like to thank my partner, João Barretto, for encouraging me to
write this book and putting up with my many hours spent at my computer.
Finally, I am sure there are typos and errors left in this book, even with all the
feedback I have received from students and colleagues and my efforts to correct what
they found. But I hope it is in good enough shape that you will find reading the book
relatively comfortable and that it will provide you some insight into how computers are
organized.
Chapter 1
Introduction
Unlike most assembly language books, this one does not emphasize writing programs
in assembly language. Higher-level languages, e.g., C, C++, Java, are much better for
that. You should avoid writing in assembly language whenever possible.
You may wonder why you should study assembly language at all. The usual reasons
given are:
1. Assembly language is more efficient. This does not always hold. Modern compilers
are excellent at optimizing the machine code that is generated. Only a very good
assembly language programmer can do better, and only in some situations. Assem-
bly language programming is very tedious, even for the best programmers. Hence,
it is very expensive. The possible gains in efficiency are seldom worth the added
expense.
2. There are situations where it must be used. This is more difficult to evaluate. How
do you know whether assembly language is required or not?
Both these reasons presuppose that you know the assembly language equivalent of
the translation that your compiler does. Otherwise, you would have no way of deciding
whether you can write a more efficient program in assembly language, and you would
not know the machine level limitations of your higher-level language. So this book
begins with the fundamental high-level language concepts and “looks under the hood”
to see how they are implemented at the assembly language level.
There is a more important reason for reading this book. The interface to the hardware
from a programmer’s view is the instruction set architecture (ISA). This book is a
description of the ISA of the x86 architecture as it is used by the C/C++ programming
languages. Higher-level languages tend to hide the ISA from the programmer, but good
programmers need to understand it. This understanding is bound to make you a better
programmer, even if you never write a single assembly language statement after reading
this book.
Some of you will enjoy assembly language programming and wish to carry on. If
your interests take you into systems programming, e.g., writing parts of an operating
system, writing a compiler, or even designing another higher-level language, an under-
standing of assembly language is required. There are many challenging opportunities in
programming embedded systems, and much of the work in this area demands at least
an understanding of the ISA. This book serves as an introduction to assembly language
programming and prepares you to move on to the intermediate and advanced levels.
In his book The Design and Evolution of C++[32] Bjarne Stroustrup nicely lists the
purposes of a programming language:
1
2
This book assumes that you are familiar with these programming concepts in C, C++,
and/or Java.
1.1. COMPUTER SUBSYSTEMS 3
Data Bus
Address Bus
Control Bus
Figure 1.1: Subsystems of a computer. The CPU, Memory, and I/O subsystems communi-
cate with one another via the three buses.
Central Processing Unit (CPU) controls most of the activities of the computer, per-
forms the arithmetic and logical operations, and contains a small amount of very
fast memory.
Memory provides storage for the instructions for the CPU and the data they manipulate.
Input/Output (I/O) communicates with the outside world and with mass storage de-
vices (e.g., disks).
When you create a new program, you use an editor program to write your new
program in a high-level language, for example, C, C++, or Java. The editor program sees
the source code for your new program as data, which is typically stored in a file on the
disk. Then you use a compiler program to translate the high-level language statements
into machine instructions that are stored in a disk file. Just as with the editor program,
the compiler program sees both your source code and the resulting machine code as
data.
When it comes time to execute the program, the instructions are read from the ma-
chine code disk file into memory. At this point, the program is a sequence of instructions
stored in memory. Most programs include some constant data that are also stored in
memory. The CPU executes the program by fetching each instruction from memory and
executing it. The data are also fetched as needed by the program.
This computer model — both the program instructions and data are stored in a
memory unit that is separate from the processing unit — is referred to as the von
Neumann architecture. It was described in 1945 by John von Neumann [35], although
other computer science pioneers of the day were working with the same concepts. This
is in contrast to a fixed-program computer, e.g., a calculator. A compiler illustrates one
of the benefits of the von Neumann architecture. It is a program that treats the source
file as data, which it translates into an executable binary file that is also treated as data.
But the executable binary file can also be run as a program.
1.2. HOW THE SUBSYSTEMS INTERACT 4
A downside of the von Neumann architecture is that a program can be written to view
itself as data, thus enabling a self-modifying program. GNU/Linux, like most modern,
general purpose operating systems, prohibits applications from modifying themselves.
Most programs also access I/O devices, and each access must also be programmed.
I/O devices vary widely. Some are meant to interact with humans, for example, a
keyboard, a mouse, a screen. Others are meant for machine readable I/O. For example,
a program can store a file on a disk or read a file from a network. These devices all
have very different behavior, and their timing characteristics differ drastically from one
another. Since I/O device programming is difficult, and every program makes use of
them, the software to handle I/O devices is included in the operating system. GNU/Linux
provides a rich set of functions that an applications programmer can use to perform I/O
actions, and we will call upon these services of GNU/Linux to perform our I/O operations.
Before tackling I/O programming, you need to gain a thorough understanding of how
the CPU executes programs and interacts with memory.
The goal of this book is study how programs are executed by the computer. We will
focus on how the program and data are stored in memory and how the CPU executes
instructions. We leave I/O programming to more advanced books.
instructed to read a piece of data from an input device, the particular device is specified
on the address bus and a “read” signal is placed on the control bus. The device responds
by placing the data item on the data bus. And the CPU can send data to an output device
by placing the data item on the data bus, specifying the device on the address bus, and
placing a “write” signal on the control bus. Since the timing of various I/O devices varies
drastically from CPU and memory timing, special programming techniques must be
used. Chapter 16 provides an introduction to I/O programming techniques.
These few paragraphs are intended to provide you a very general overall view of how
computer hardware works. The rest of the book will explore many of these concepts in
more depth. Most of the discussion is at the ISA level, but we will also take a peek at the
hardware implementation. In Chapter 4 we will even look at some transistor circuits.
The goal of the book is to provide you with an introduction to computer architecture as
seen from a software point of view.
Chapter 2
In this chapter, we begin exploring how data is encoded for storage in memory and write
some programs in C to explore these concepts. One way to look at a modern computer
is that it is made up of:
• Billions of two-state switches. Each of the switches is always in one state or the
other, and it stays in that state until the control unit changes its state or the power
is turned off.
• A control unit that can:
– Detect the state of each switch.
– Change the state of that switch and/or other switches.
There is also provision for communicating with the world outside the computer — input
and output.
Decimal digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
Two number systems are useful when talking about the states of switches — the binary
system, which is based on two,
Binary digits: 0, 1
and the hexadecimal system, which is based on sixteen.
Hexadecimal digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f
6
2.1. BITS AND GROUPS OF BITS 7
When it is not clear from the context, we will indicate the base of a number in
this text with a subscript. For example, 10010 is written in decimal, 10016 is written in
hexadecimal, and 1002 is written in binary.
Hexadecimal digits are especially convenient when we need to specify the state of a
group of, say, 16 or 32 switches. In place of each group of four bits, we can write one
hexadecimal digit. For example,
and
0000 0001 0010 0011 1010 1011 1100 11012 = 0123 abcd16 (2.2)
A single bit has limited usefulness when we want to store data. We usually need to
use a group of bits to store a data item. This grouping of bits is so common that most
modern computers only allow a program to access bits in groups of eight. Each of these
groups is called a byte.
byte: A contiguous group of bits, usually eight.
2.2. MATHEMATICAL EQUIVALENCE OF BINARY AND DECIMAL 8
Historically, the number of bits in a byte has varied depending on the hardware and the
operating system. For example, the CDC 6000 series of scientific mainframe computers
used a six-bit byte. Nearly everyone uses “byte” to mean eight bits today.
Another important reason to learn hexadecimal is that the programming language
may not allow you to specify a value in binary. Prefixing a number with 0x (zero, lower-
case ex) in C/C++ means that the number is expressed in hexadecimal. There is no
C/C++ syntax for writing a number in binary. The syntax for specifying bit patterns in
C/C++ is shown in Table 2.2. (The 32-bit pattern for the decimal value 123 will become
clear after you read Sections 2.2 and 2.3.) Although the GNU assembler, as, includes a
notation for specifying bit patterns in binary, it is usually more convenient to use the
C/C++ notation.
Table 2.2: C/C++ syntax for specifying literal numbers. Octal bits grouped by three for
readability.
1 × 100 + 2 × 10 + 3 × 1 (2.3)
or
1 × 102 × 101 + 3 × 100 (2.4)
The right-most digit (3 in Equation2.4) is the least significant digit because it “counts”
the least in the total value of this number. The left-most digit (1 in this example) is the
most significant digit because it “counts” the most in the total value of this number.
The base or radix of the decimal number system is ten. There are ten symbols for
representing the digits: 0, 1, . . . , 9. Moving a digit one place to the left increases its
value by a factor of ten, and moving it one place to the right decreases its value by a
factor of ten. The positional notation generalizes to any radix, r:
where there are n digits in the number and each di = 0, 1, . . . , r-1. The radix in the
binary number system is 2, so there are only two symbols for representing the digits: di
= 0, 1. We can specialize Equation 2.5 for the binary number system as
1 × 27 + 0 × 26 + 1 × 25 + 0 × 24 + 0 × 23 + 1 × 22 + 0 × 21 + 1 × 20 (2.7)
This example illustrates the method for converting a number from the binary number
system to the decimal number system. It is stated in Algorithm 2.1.
Be careful to distinguish the binary number system from writing the state of a bit in
binary. Each switch in the computer can be represented by a bit (binary digit), but the
entity that it represents may not even be a number, much less a number in the binary
number system. For example, the bit pattern 0011 0010 represents the character “2” in
the ASCII code for characters. But in the binary number system 0011 00102 = 5010 .
See Exercises 2-8 and 2-9 for converting hexadecimal to decimal.
r1
(N/4) += dn−1 × 2n−3 + dn−2 × 2n−4 + . . . + d1 × 2−1 (2.12)
2
From Equation 2.12 we see that d1 = r1 . It follows that the binary representation of a
number can be produced from right (low-order bit) to left (high-order bit) by applying
the algorithm shown in Algorithm 2.2.
Example 2-a
Convert 12310 to binary.
Solution:
123 ÷ 2 = 61 + 1/2 ⇒ d0 =1
61 ÷ 2 = 30 + 1/2 ⇒ d1 =1
30 ÷ 2 = 15 + 0/2 ⇒ d2 =0
15 ÷ 2 = 7 + 1/2 ⇒ d3 =1
7 ÷ 2 = 3 + 1/2 ⇒ d4 =1
3 ÷ 2 = 1 + 1/2 ⇒ d5 =1
1 ÷ 2 = 0 + 1/2 ⇒ d6 =1
0 ÷ 2 = 0 + 0/2 ⇒ d7 =0
So
12310 = d7 d6 d5 d4 d3 d2 d1 d0
= 011110112
= 7b16
There are times in some programs when it is more natural to specify a bit pattern
rather than a decimal number. We have seen that it is possible to easily convert between
the number bases, so you could convert the bit pattern to a decimal value, then use that.
It is usually much easier to think of the bits in groups of four, then convert the pattern
to hexadecimal.
For example, if your algorithm required the use of zeros alternating with ones:
0101 0101 0101 0101 0101 0101 0101 0101
this can be converted to the decimal value
1431655765
or the hexadecimal value (shown here in C/C++ syntax)
0x55555555
Once you have memorized Table 2.1, it is clearly much easier to work with hexadecimal
for bit patterns.
The discussion in these two sections has dealt only with unsigned integers. The
representation of signed integers depends upon some architectural features of the CPU
and will be discussed in Chapter 3 when we discuss computer arithmetic.
2.4. MEMORY — A PLACE TO STORE DATA (AND OTHER THINGS) 11
memory[123]
as specifying the 124th byte in memory. (Don’t forget that array indexing starts with 0.)
We generally do not use array notation and simply use the index number, calling it the
address or location of the byte.
The address of a particular byte never changes. That is, the 957th byte from the
beginning of memory will always remain the 957th byte. However, the state of each of
the bits — either 0 or 1 — in any given byte can be changed.
Computer scientists typically express the address of each byte in memory in hexadec-
imal. So we would say that the 957th byte is at address 0x3bc.
From the discussion of hexadecimal in Section 2.1 (page 6) we can see that the first
sixteen bytes in memory have the addresses 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, and f.
Using the notation
address: contents (bit-pattern-at-the-address)
we show the (possible) contents (the state of the bits) of each of the first sixteen bytes
of memory in Figure 2.1.
Figure 2.1: Possible contents of the first sixteen bytes of memory; addresses shown in
hexadecimal, contents shown in binary. Note that the addresses are shown
as 32-bit values. (The contents shown here are arbitrary.)
The state of each bit is indicated by a binary digit (bit) and is arbitrary in Figure 2.1.
The bits have been grouped by four for readability. The grouping of the memory bits also
shows that we can use two hexadecimal digits to indicate the state of the bits in each
byte, as shown in Figure 2.2. For example, the contents of memory location 0000000b
are 3c. That means the eight bits that make up the twelfth byte in memory are set to
the bit pattern 0011 1100.
Once a bit (switch) in memory is set to either zero or one, it stays in that state until
the control unit actively changes it or the power is turned off. There is an exception.
Computers also contain memory in which the bits are permanently set. Such memory is
called Read Only Memory or ROM.
2.4. MEMORY — A PLACE TO STORE DATA (AND OTHER THINGS) 12
Figure 2.2: Repeat of Figure 2.1 with contents shown in hex. Two hexadecimal charac-
ters are required to specify one byte.
Read Only Memory (ROM) : Each bit is permanently set to either zero or one. The
control unit can read the state of each bit but cannot change it.
You have probably heard the term “RAM” used for memory that can be changed by
the control unit. RAM stands for Random Access Memory. The terminology used here is
inconsistent. “Random access” means that it takes the same amount of time to access
any byte in the memory. This is in contrast to memory that is sequentially accessible,
e.g., tape. The length of time it takes to access a byte on tape depends upon the physical
location of the byte with respect to the current tape position.
Random Access Memory (RAM) : The control unit can read the state of each bit and
can change it.
A bit can be used to store data. For example, we could use a single bit to indicate
whether a student passes a course or not. We might use 0 for “not passed” and 1 for
“passed.” A single bit allows only two possible values of a data item. We cannot for
example, use a single bit to store a course letter grade — A, B, C, D, or F.
How many bits would we need to store a letter grade? Consider all possible combina-
tions of two bits:
00
01
10
11
Since there are only four possible bit combinations, we cannot represent all five letter
grades with only two bits. Let’s add another bit and look at all possible bit combinations:
000
001
010
011
100
101
110
111
There are eight possible bit patterns, which is more than sufficient to store any one of
the five letter grades. For example, we may choose to use the code
2.5. USING C PROGRAMS TO EXPLORE DATA FORMATS 13
We will use the C programming language to illustrate these concepts because it takes
care of the memory allocation problem, yet still allows us to get reasonably close to
2.5. USING C PROGRAMS TO EXPLORE DATA FORMATS 14
Hello, world.
If there are additional arguments, the format string must specify how each of these
arguments is to be converted for display. This is accomplished by inserting a conversion
code within the format string at the point where the argument value is to be displayed.
Each conversion code is introduced by the ’%’ character. For example, Listing 2.1 shows
how to display both an int variable and a float variable.
1 /*
2 * intAndFloat.c
3 * Using printf to display an integer and a float.
4 * Bob Plantz - 4 June 2009
5 */
6 #include <stdio.h>
7
8 int main(void)
9 {
10 int anInt = 19088743;
11 float aFloat = 19088.743;
12
15 return 0;
16 }
Listing 2.1: Using printf to display numbers.
Compiling and running the program in Listing 2.1 on my computer gave (user input
is boldface):
1 The text string is a null-terminated array of characters as described in Section 2.7 (page 21). This is not
This is not a book about how to use the GNU development environment, so I usually do not
show the compile command. I am showing it here to help get you started. You should use
the man gcc command to learn about the command line options.
Some common conversion codes are d or i for integer, f for float, and x for hexadeci-
mal. The conversion codes may include other characters to specify properties like the
field width of the display, whether the value is left or right justified within the field, etc.
We will not cover the details here. You should read man page 3 for printf to learn more.
scanf is used to read from the keyboard. The format string typically includes only
conversion codes that specify how to convert each value as it is entered from the
keyboard and stored in the following arguments. Since the values will be stored in
variables, it is necessary to pass the address of the variable to scanf. For example, we
can store keyboard-entered values in x (an int variable) and y (a float variable) thusly
scanf("%i %f", &x, &y);
The use of printf and scanf are illustrated in the C program in Listing 2.2, which
will allow us to explore the mathematical equivalence of the decimal and hexadecimal
number systems.
1 /*
2 * echoDecHex.c
3 * Asks user to enter a number in decimal and one
4 * in hexadecimal then echoes both in both bases
5 * Bob Plantz - 4 June 2009
6 */
7
8 #include <stdio.h>
9
10 int main(void)
11 {
12 int x;
13 unsigned int y;
14
15 while(1)
16 {
17 printf("Enter a decimal integer (0 to quit): ");
18 scanf("%i", &x);
19 if (x == 0) break;
20
29 printf("End of program.\n");
30
31 return 0;
32 }
Listing 2.2: C program showing the mathematical equivalence of the decimal and hex-
adecimal number systems.
• The “%#010x” conversion factor is more interesting. (If you are at a computer
read section 3 of the man page for printf as you follow through this description.)
The basic conversion is specified by the “x” character; it causes the value to be
displayed in hexadecimal. The “#” character causes an “alternate form” to be used
for the display, which is the C syntax for hexadecimal numbers; that is, the value
is prefaced by 0x when it is displayed. The ‘0’ character immediately after the ‘#’
character causes ‘0’ to be used as the fill character. The number “10” causes the
display to occupy at least ten characters (the field width).
• Look carefully at the output from this program above. The bit patterns used to
store the data input by the user, shown in hexadecimal, show that the unsigned
ints are stored in the binary number system (see Section 2.2, page 8 and Section
2.3, page 9). That is, 12310 is stored as 0000007b16 .
The program in Listing 2.2 demonstrates a very important concept — hexadecimal
is used as a human convenience for stating bit patterns. A number is not inherently
2.6. EXAMINING MEMORY WITH GDB 17
The “-g” option is required. It tells the compiler to include debugger informa-
tion in the executable program.
The li command lists ten lines of source code. The display ends with the (gdb)
prompt. Pushing the return key will repeat the previous command, and li is
smart enough to display the next (up to) ten lines.
(gdb) br 13
Breakpoint 1 at 0x40050b: file intAndFloat.c, line 13.
I set a breakpoint at line 13. When the program is executing, if it ever gets
to this statement, execution will pause before the statement is executed, and
control will return to gdb.
(gdb) run
Starting program: /home/bob/intAndFloat
The run command causes the program to start execution from the beginning.
When it reaches our breakpoint, control returns to gdb.
The print command displays the value currently stored in the named variable.
There is a round off error in the float value. As mentioned above, this will be
explained in Chapter 14.
2.6. EXAMINING MEMORY WITH GDB 19
Defaults for format and size letters are those previously used.
Default count is 1. Default address is following last thing printed
with this command or "print".
The x command is used to examine memory. Its help message is very brief,
but it tells you everything you need to know.
(gdb) x/1dw 0x7fffffffe058
0x7fffffffe058: 19088743
(gdb) x/1fw 0x7fffffffe05c
0x7fffffffe05c: 19088.7422
The x command can be used to display the values in their stored data type.
(gdb) x/1xw 0x7fffffffe058
0x7fffffffe058: 0x01234567
(gdb) x/4xb 0x7fffffffe058
0x7fffffffe058: 0x67 0x45 0x23 0x01
2.6. EXAMINING MEMORY WITH GDB 20
The display of the aFloat variable in hexadecimal simply looks wrong. This is
due to the storage format of floats, which is very different from ints. It will
be explained in Chapter 14.
The byte by byte display of the aFloat variable in hexadecimal also shows
that it is stored in little endian order.
(gdb) cont
Continuing.
The integer is 19088743 and the float is 19088.742188
[Inferior 1 (process 3221) exited normally]
(gdb) q
bob$
Finally, I continue to the end of the program. Notice that gdb is still running
and I have to quit the gdb program.
This example illustrates a property of the x86 processors. Data is stored in memory
with the least significant byte in the lowest-numbered address. This is called little endian
storage. Look again at the display of the four bytes beginning at 0x7fffffffe058 above.
We can rearrange this display to show the bit patterns at each of the four locations:
7fffffffe058: 67
7fffffffe059: 45
7fffffffe05a: 23
7fffffffe05b: 01
Yet when we look at the entire 32-bit value in hexadecimal the bytes seem to be arranged
in the proper order:
7fffffffe058: 01234567
When we examine memory one byte at a time, each byte is displayed in numerically
ascending addresses. At first glance, the value appears to be stored backwards.
We should note here that many processors, e.g., the PowerPC architecture, use big
endian storage. As the name suggests, the most significant (“biggest”) byte is stored
in the first (lowest-numbered) memory address. If we ran the program above on a big
endian computer, we would see (assuming the variable is located at the same address):
2.7. ASCII CHARACTER CODE 21
Generally, you do not need to worry about endianess in a program. It becomes a concern
when data is stored as one data type, then accessed as another.
When translating either of these statements into machine code, the compiler must do
two things:
• store each of the characters in a location in memory where the control unit can
access them, and
Table 2.3: ASCII code for representing characters. The bit patterns (bit pat.) are shown
in hexadecimal.
had been received incorrectly. Of course, if two bits had been incorrectly received, the
error would pass undetected, but the chances of this double error are remarkably small.
Modern communication systems are much more reliable, and parity is seldom used
when sending individual bytes.
In some environments the high-order bit is used to provide a code for special characters. A
little thought will show you that even all eight bits will not support all languages, e.g., Greek,
Russian, Chinese. The Unicode character coding has recently been adopted to support
documents that use other characters. Java uses Unicode, and C libraries that support
Unicode are also available.
A computer system that uses an ASCII video system (most modern computers) can be
programmed to send a byte to the screen. The video system interprets the bit pattern as
an ASCII code (from Table 2.3) and displays the corresponding character on the screen.
2.8. WRITE AND READ FUNCTIONS 23
Getting back to the text string, “Hello world\n”, the compiler would store this as
a constant char array. There must be a way to specify the length of the array. In a
C-style string this is accomplished by using the sentinel character NUL at the end of
the string. So the compiler must allocate thirteen bytes for this string. An example of
how this string is stored in memory is shown in Figure 2.3. Notice that C uses the
LF character as a single newline character even though the C syntax requires that the
programmer write two characters — ’\n’. The area of memory shown includes the
three bytes immediately following the text string.
Address Contents
4004a1: 48
4004a2: 65
4004a3: 6c
4004a4: 6c
4004a5: 6f
4004a6: 20
4004a7: 77
4004a8: 6f
4004a9: 72
4004aa: 6c
4004ab: 64
4004ac: 0a
4004ad: 00
4004ae: 25
4004af: 73
4004b0: 00
Figure 2.3: A text string stored in memory by a C compiler, including three “garbage”
bytes after the string. Values are shown in hexadecimal. A different compi-
lation will likely place the string in a different memory location.
In Pascal the length of the string is specified by the first byte in the string. It is taken to be
an 8-bit unsigned integer. So C-style strings are typically processed by sentinel-controlled
loops, and count-controlled string processing loops are more common in Pascal.
The C++ string class has additional features, but the actual text string is stored as a C-style
text string within the C++ string instance.
This byte is initialized to the bit pattern 4116 (’A’ from Table 2.3). The write function is
invoked to display the character on the screen. The arguments to write are:
1. STDOUT_FILENO is defined in the system header file, unistd.h.3 It is the GNU/Linux
file descriptor for standard out (usually the screen). GNU/Linux sees all devices as
files. When a program is started the operating system opens a path to standard out
and assigns it as file descriptor number 1.
2. &aLetter is a memory address. The sequence of one-byte bit patterns starting at
this address will be sent to standard out.
3. 1 (one) is the number of bytes that will be sent (to standard out) as a result of this
call to write.
The program returns a 0 to the operating system.
1 /*
2 * oneChar.c
3 * Writes a single character on the screen.
4 * Bob Plantz - 4 June 2009
5 */
6
7 #include <unistd.h>
8
9 int main(void)
10 {
11 char aLetter = ’A’;
12 write(STDOUT_FILENO, &aLetter, 1); // STDOUT_FILENO is
13 // defined in unistd.h
14 return 0;
15 }
Listing 2.3: Displaying a single character using C.
Now let’s consider a program that echoes each character entered from the keyboard.
We will allocate a single char variable, read one character into the variable, and then
echo the character for the user with a message. The program will repeat this sequence
one character at a time until the user hits the return key. The program is shown in
Listing 2.4.
1 /*
2 * echoChar1.c
3 * Echoes a character entered by the user.
4 * Bob Plantz - 4 June 2009
5 */
6
7 #include <unistd.h>
8
9 int main(void)
10 {
11 char aLetter;
12
18 return 0;
19 }
Listing 2.4: Echoing characters entered from the keyboard.
Again, the program correctly echoes the first character, but the two characters bc
remain in the input line buffer. When echoChar1 terminates the shell program reads
the remaining characters from the line buffer and interprets them as a command. In
this case, bc is a program, so the shell executes that program.
An important point of the program in Listing 2.4 is to illustrate the simplistic behavior
of the write and read functions. They work at a very low level. It is your responsibility
to design your program to interpret each byte that is written to the screen or read from
the keyboard.
2.9. EXERCISES 26
2.9 Exercises
2-1 (§2.1) Express the following bit patterns in hexadecimal.
a) 83af c) aaaa
b) 9001 d) 5555
2-3 (§2.1) How many bits are represented by each of the following?
a) ffffffff d) 111116
b) 7fff58b7def0 e) 000000002
c) 11112 f) 0000000016
2-4 (§2.1) How many hexadecimal digits are required to represent each of the following?
2-5 (§2.2) Referring to Equation 2.5, what are the values of r, n and each di for the
decimal number 29458254? The hexadecimal number 29458254?
2-6 (§2.2) Convert the following 8-bit numbers to decimal by hand:
a) 10101010 e) 10000000
b) 01010101 f) 01100011
c) 11110000 g) 01111011
d) 00001111 h) 11111111
a) 1010101111001101 e) 1000000000000000
b) 0001001000110100 f) 0000010000000000
c) 1111111011011100 g) 1111111111111111
d) 0000011111010000 h) 0011000000111001
2.9. EXERCISES 27
2-8 (§2.2) In Section 2.2 we developed an algorithm for converting from binary to
decimal. Develop a similar algorithm for converting from hexadecimal to decimal.
Use your new algorithm to convert the following 8-bit numbers to decimal by hand:
a) a0 e) 64
b) 50 f) 0c
c) ff g) 11
d) 89 h) c8
2-9 (§2.2) In Section 2.2 we developed an algorithm for converting from binary to
decimal. Develop a similar algorithm for converting from hexadecimal to decimal.
Use your new algorithm to convert the following 16-bit numbers to decimal by
hand:
a) a000 e) 8888
b) ffff f) 0190
c) 0400 g) abcd
d) 1111 h) 5555
2-10 (§2.3) Convert the following unsigned, decimal integers to 8-bit hexadecimal
representation.
a) 100 e) 255
b) 123 f) 16
c) 10 g) 32
d) 88 h) 128
2-11 (§2.3) Convert the following unsigned, decimal integers to 16-bit hexadecimal
representation.
a) 1024 e) 256
b) 1000 f) 65535
c) 32768 g) 2005
d) 32767 h) 43981
2-12 (§2.3) Invent a code that would allow us to store letter grades with plus or minus.
That is, the grades A, A- B+, B, B-, . . . , D, D-, F. How many bits are required for
your code?
2-13 (§2.3) We have shown how to write only the first sixteen addresses in hexadecimal
in Figure 2.1. How would you write the address of the seventeenth byte (byte
number sixteen) in hexadecimal? Hint: If we started with zero in the decimal
number system we would use a ‘9’ to represent the tenth item. How would you
represent the eleventh item in the decimal system?
2-14 (§2.3) Redo the table in Figure 2.2 such that it shows the memory contents in
decimal.
2.9. EXERCISES 28
2-15 (§2.3) Redo the table in Figure 2.2 such that it shows each of the sixteen bytes
containing its byte number. That is, byte number 0 contains zero, number 1 contains
one, etc. Show the contents in binary.
2-16 (§2.3) Redo the table in Figure 2.2 such that it shows each of the sixteen bytes
containing its byte number. That is, byte number 0 contains zero, number 1 contains
one, etc. Show the contents in hexadecimal.
2-17 (§2.4) You want to allocate an area in memory for storing any number between 0
and 4,000,000,000. This memory area will start at location 0x2fffeb96. Give the
addresses of each byte of memory that will be required.
2-18 (§2.4) You want to allocate an area in memory for storing an array of 30 bytes. The
first byte will have the value 0x00 stored in it, the second 0x01, the third 0x02, etc.
This memory area will start at location 0x001000. Show what this area of memory
looks like.
2-19 (§2.4) In Section 2.4 we invented a binary code for representing letter grades.
Referring to that code, express each of the grades as an 8-bit unsigned decimal
integer.
2-20 (§2.5) Enter the program in Listing 2.2 and check your answers for Exercise 2-6.
Note that printf and scanf do not have a conversion for binary. Check the answers
in hexadecimal.
2-21 (§2.5) Enter the program in Listing 2.2 and check your answers for Exercise 2-7.
Note that printf and scanf do not have a conversion for binary. Check the answers
in hexadecimal.
2-22 (§2.5) Enter the program in Listing 2.2 and check your answers for Exercise 2-8.
2-23 (§2.5) Enter the program in Listing 2.2 and check your answers for Exercise 2-9.
2-24 (§2.5) Enter the program in Listing 2.2 and check your answers for Exercise 2-10.
2-25 (§2.5) Enter the program in Listing 2.2 and check your answers for Exercise 2-11.
2-26 (§2.5) Modify the program in Listing 2.2 so that it also displays the addresses of
the x and y variables. Note that addresses are typically displayed in hexadecimal.
How many bytes does the compiler allocate for each of the ints?
2-27 (§2.6) Enter the program in Listing 2.1. Follow through the program with gdb as in
the example in Section 2.6. Using the numbers you get, explain where the variables
anInt and aFloat are stored in memory and what is stored in each location.
2-28 (§2.7) Write a program in C that creates a display similar to Figure 2.3. Hints: use
a char* variable to process the string one character at a time; use %08x to format
the display of the address.
2-29 (§2.6) Enter the program in Listing 2.4. Explain why there seems to be an extra
prompt in the program. Set breakpoints at both the read statement and at the
following write statement. Examine the contents of the aLetter variable before
the read and after it. Notice that the behavior of gdb seems very strange when
dealing with the read statement. Explain the behavior. Hint: Both gdb and the
program you are debugging use the same keyboard for input.
2.9. EXERCISES 29
2-30 (§2.8) Modify the program in Listing 2.4 so that it prompts the user to enter an
entire line, reads the line, then echoes the entire line. Read only one byte at a time
from the keyboard.
2-31 (§2.8) This is similar to Exercise 2-30 except that when the newline character is
read from the keyboard (and stored in memory), the program replaces the newline
character with a NUL character. The program has now read a line from the keyboard
and stored it as a C-style text string. If your algorithm is correct, you will be able to
read the text string using the read low-level function and display it with the printf
library function thusly (assuming the variable where the string is stored is named
theString),
printf("%s\n", theString);
and have only one newline. Notice that this program discards the newline generated
when the user hits the return key. This is the same behavior you would see if you
used
scanf("\%s", theString);
in C, or
cin >> theString;
in C++ to read the input text from the keyboard.
2-32 (§2.8) Write a C program that prompts the user to enter a line of text on the
keyboard then echoes the entire line. The program should continue echoing each
line until the user responds to the prompt by not entering any text and hitting
the return key. Your program should have two functions, writeStr and readLn, in
addition to the main function. The text string itself should be stored in a char array
in main. Both functions should operate on NUL-terminated text strings.
• writeStr takes one argument, a pointer to the string to be displayed and it
returns the number of characters actually displayed. It uses the write system
call function to write characters to the screen.
• readLn takes two arguments, one that points to the char array where the
characters are to be stored and one that specifies the maximum number of
characters to store in the char array. Additional keystrokes entered by the user
should be read from the OS input buffer and discarded. readLn should return
the number of characters actually stored in the char array. readLn should not
store the newline character (’\n’). It uses the read system call function to read
characters from the keyboard.
Chapter 3
Computer Arithmetic
We next turn our attention to a code for storing decimal integers. Since all storage in a
computer is by means of on/off switches, we cannot simply store integers as decimal
digits. Exercises 3-1 and 3-2 should convince you that it will take some thought to come
up with a good code that uses simple on/off switches to represent decimal numbers.
Another very important issue when talking about computer arithmetic was pointed
out in Section 2.3 (page 9). Namely, the programmer must decide how many bits will be
used for storing the numbers before performing any arithmetic operations. This raises
the possibility that some results will not fit into the allocated number of bits. As you will
see in Section 9.2 (page 214), the computer hardware provides for this possibility with
the Carry Flag (CF) and Overflow Flag (OF) in the rflags register located in the CPU.
Depending on what you intend the bit patterns to represent, either the Carry Flag or
the Overflow Flag (not both) will indicate the correctness of the result. However, most
high level languages, including C and C++, do not check the CF and OF after performing
arithmetic operations.
11 ←− carries
67 ←− x
+ 79 ←− y
46 ←− sum
We start by working from the right, adding the two decimal digits in the ones place. 7 +
9 exceeds 10 by 6. We show this by placing a 6 in the ones place in the sum and carrying
a 1 to the tens place. Next we add the three decimal digits in the tens place, 1 (the carry
into the tens place from the ones place) + 6 + 7. The sum of these three digits exceeds
10 by 4, which we show by placing a 4 in the tens place in the sum and recording the
fact that there is an ultimate carry of one. Recall that we had decided to use only two
1 Most computer architectures provide arithmetic operations in other number systems, but these are
30
3.1. ADDITION AND SUBTRACTION 31
digits, so there is no hundreds place. Using the notation of Equation 2.5 (page 8), we
describe addition of two decimal integers in Algorithm 3.1.
Notice that:
• Algorithm 3.1 works because we use a positional notation when writing numbers —
a digit one place to the left counts ten times more.
• Carry from the current position one place to the left is always 0 or 1.
• The reason we use 10 in the / and % operations is that there are exactly ten digits
in the decimal number system : 0, 1, 2, . . . , 9.
• Since we are working in an N-digit system, we must restrict our result to N digits.
The final carry (0 or 1) must be stated in addition to the N-digit result.
By changing “10” to “2" we get Algorithm 3.2 for addition in the binary number
system. The only difference is that a digit one place to the left counts two times more.
Example 3-a
ones place:
sum0 = (1 + 1) % 2 = 0
carry = (1 + 1) / 2 = 1
3.1. ADDITION AND SUBTRACTION 32
twos place:
sum1 = (1 + 1 + 0) % 2 = 0
carry = (1 + 1 + 0) / 2 = 1
fours place:
sum2 = (1 + 0 + 1) % 2 = 0
carry = (1 + 0 + 1) / 2 = 1
eights place:
sum3 = (1 + 1 + 1) % 2 = 1
carry = (1 + 1 + 1) / 2 = 1
sixteens place:
sum4 = (1 + 0 + 0) % 2 = 1
carry = (1 + 0 + 0) / 2 = 0
thirty-twos place:
sum5 = (0 + 1 + 0) % 2 = 1
carry = (0 + 1 + 0) / 2 = 0
sixty-fours place:
sum6 = (0 + 0 + 0) % 2 = 1
carry = (0 + 0 + 0) / 2 = 0
one hundred twenty-eights place:
sum7 = (0 + 1 + 0) % 2 = 1
carry = (0 + 1 + 0) / 2 = 0
In this eight-bit example the result is 1111 1000, and there is no carry beyond the eight
bits. The lack of carry is recorded in the rflags register by setting the CF bit to zero.
It should not surprise you that this algorithm also works for hexadecimal. In fact, it
works for any radix, as shown in Algorithm 3.3.
For hexadecimal:
• A digit one place to the left counts sixteen times more.
• We use 16 in the / and % operations because there are sixteen digits in the hex-
adecimal number system: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f.
Addition in hexadecimal brings up a notational issue. For example,
d + 9 = ?? Oops, how do we write this?
1 × 23 + 0 × 22 + 1 × 21 + 1 × 20 (3.1)
This is easily converted to decimal by simply working out the arithmetic in decimal:
1 × 23 + 0 × 22 + 1 × 21 + 1 × 20 = 8 + 0 + 2 + 1
= 11 (3.2)
From Table 2.1 on page 7 we see that 10112 = b16 , and we conclude that b16 = 1110 . We
can add a “decimal” column to the table, giving Table 3.1.
Table 3.1: Correspondence between binary, hexadecimal, and unsigned decimal values
for the hexadecimal digits.
3.1. ADDITION AND SUBTRACTION 34
Example 3-b
Compute the sum of x = 0xabcd and y = 0x6089.
Solution:
1 011 ←− carries
abcd ←− x
+ 6089 ←− y
0c56 ←− sum
Now we can see how Algorithm 3.3 with radix = 16 was applied in order to add the
hexadecimal numbers, abcd and 6089. Having memorized Table 3.1, we will convert
between hexadecimal and decimal “in our heads.”
ones place:
sum0 = (d + 9) % 16 = 6
carry = (d + 9) / 16 = 1
sixteens place:
sum1 = (1 + c + 8) % 16 = 5
carry = (1 + c + 8) / 16 = 1
two hundred fifty-sixes place:
sum2 = (1 + b + 0) % 16 = c
carry = (1 + b + 0) / 16 = 0
four thousand ninety-sixes place:
sum3 = (0 + a + 6) % 16 = 0
carry = (0 + a + 6) / 16 = 1
This four-digit example has an ultimate carry of 1, which is recorded in the rflags
register by setting the CF to one. The arithmetic was performed by first converting
each digit to decimal. It is then a simple matter to convert each decimal value back to
hexadecimal (see Table 3.1) to express the final answer in hexadecimal.
Let us now turn to the subtraction operation. As you recall from subtraction in the
decimal number system, you must sometimes borrow from the next higher-order digit
in the minuend. This is shown in Algorithm 3.4.
3.1. ADDITION AND SUBTRACTION 35
Example 3-c
Subtract y = 10101011 from x = 11001101.
Solution:
ones place:
difference0 = 1 - 1 = 0
twos place:
Borrow from the fours place in the minuend.
The borrow becomes 2 in the twos place.
difference1 2 - 1 = 1
fours place:
Since we borrowed 1 from here, the minuend has a 0 left.
difference2 = 0 - 0 = 0
eights place:
difference3 = 1 - 1 = 0
sixteens place:
difference4 = 0 - 0 = 0
thirty-twos place:
3.2. ARITHMETIC ERRORS — UNSIGNED INTEGERS 36
This, of course, also works for hexadecimal, but remember that a digit one place to
the left counts sixteen times more. For example, consider x = 0x6089 and y = 0xab5d:
1101 ←− borrows
6089 ←− x
− ab5d ←− y
b52c ←− dif f erence
Notice in this second example that we had to borrow from “beyond the width” of
the two values. That is, the two values are each sixteen bits wide, and the result must
also be sixteen bits. Whether there is borrow “from outside” to the high-order digit is
recorded in the CF of the rflags register whenever a subtract operation is performed:
• no borrow from outside → CF = 0
• borrow from outside → CF = 1
Another way to state this is for unsigned numbers:
• if the subtrahend is equal to or less than the minuend the CF is set to zero
• if the subtrahend is larger than the minuend the CF bit is set to one
and CF = 0.
So far, the binary number system looks reasonable. Let’s try two larger four-bit
numbers:
3.3. ARITHMETIC ERRORS — SIGNED INTEGERS 37
and CF = 1. The result, 2, is arithmetically incorrect. The problem here is that the
addition has produced carry beyond the fourth bit. Since this is not taken into account
in the result, the answer is wrong.
Now consider subtraction of the two numbers:
01002 = 0 ×23 + 1 ×22 + 0 ×21 + 0 ×20 = 410
- 11102 = 1 ×23 + 1 ×22 + 1 ×21 + 0 ×20 = -1410
01102 = 0 ×23 + 1 ×22 + 1 ×21 + 0 ×20 = 610
and CF = 1.
The result, 6, is arithmetically incorrect. The problem in this case is that the subtrac-
tion has had to borrow from beyond the fourth bit. Since this is not taken into account
in the result, the answer is wrong.
From the discussion in Section 3.1 (page 30) you should be able to convince yourself
that these four-bit arithmetic examples generalize to any size arithmetic performed by
the computer. After adding two numbers, the Carry Flag will always be set to zero if
there is no ultimate carry, or it will be set to one if there is ultimate carry. Subtraction
will set the Carry Flag to zero if no borrow from the “outside” is required, or one if
borrow is required. These examples illustrate the principle:
• When adding or subtracting two unsigned integers, the result is arithmetically
correct if and only if the Carry Flag (CF) is set to zero.
It is important to realize that the CF and OF bits in the rflags register are always
set to the appropriate value, 0 or 1, each time an addition or subtraction is performed
by the CPU. In particular, the CPU will not ignore the CF when there is no carry, it will
actively set the CF to zero.
The result, -4, is arithmetically incorrect. We should note here that the problem is
the way in which the computer does addition — it performs binary addition on the bit
patterns that in themselves have no inherent meaning. There are computers that use
this particular code for storing signed decimal integers. They have a special “signed
add” instruction. By the way, notice that such computers have both a +0 and a -0!
Most computers, including the x86, use another code for representing signed decimal
integers — the two’s complement code. To see how this code works, we start with an
example using the decimal number system.
Say that you have a cassette player and wish to represent both positive and negative
positions on the tape. It would make sense to somehow fast-forward the tape to its
center and call that point “zero.” Most cassette players have a four decimal digit counter
3.3. ARITHMETIC ERRORS — SIGNED INTEGERS 38
that represents tape position. The counter, of course, does not give actual tape position,
but a “coded” representation of the tape position. Since we wish to call the center of
the tape “zero,” we push the counter reset button to set it to 0000.
Now, moving the tape forward — the positive direction — will cause the counter to
increment. And moving the tape backward — the negative direction — will cause the
counter to decrement. In particular, if we start at zero and move to “+1” the “code” on
the tape counter will show 0001. On the other hand, if we start at zero and move to “-1”
the “code” on the tape counter will show 9999.
Using our tape code system to perform the arithmetic in the previous example — (+2)
+ (-2):
• When adding two signed integers in the two’s complement notation, carry is irrele-
vant.
The two’s complement code uses this pattern for representing signed decimal integers
in bit patterns. The correspondence between signed decimal (two’s complement),
hexadecimal, and binary for four-bit values is shown in Table 3.2.
or
− 2(4−1) ≤ x ≤ +(2(4−1) − 1) (3.4)
x + (−x) = 2n (3.6)
Notice that 2 written in binary is “1” followed by n zeros. That is, it requires n+1 bits
n
to represent. Another way of saying this is, “in the n-bit two’s complement code adding
a number to its negative produces n zeros and carry.”
We now derive a method for computing the negative of a number in the two’s com-
plement code. Solving Equation 3.6 for −x, we get:
− x = 2n − x (3.7)
For example, if we wish to compute -1 in binary (in the two’s complement code) in 8
bits, we perform the arithmetic:
or in hexadecimal:
− 116 = 10016 − 0116 = ff16 (3.9)
This subtraction is error prone, so let’s perform a few algebraic manipulations on
Equation 3.7, which defines the negation operation. First, we subtract one from both
sides:
− x − 1 = 2n − x − 1 (3.10)
Rearranging a little:
−x − 1 = 2n − 1 − x
= (2n − 1) − x (3.11)
Now, consider the quantity (2n − 1). Since 2n is written in binary as one (1) followed
by n zeros, (2n − 1) is written as n ones. For example, for n = 8:
28 − 1 = 111111112 (3.12)
Thus, we can express the right-hand side of Equation 3.11 as
You can see how easy the subtraction on the right-hand side of Equation 3.13 is if we
consider the previous example of computing -1 in binary in eight bits. Let x = 1, giving:
or in hexadecimal:
f16 − 0116 = fe16 (3.15)
Another (simpler) way to look at this is
The value of the right-hand side of Equation 3.16 is called the reduced radix comple-
ment of x. Since the radix is two, it is common to call this the one’s complement of x.
From Equation 3.11 we see that this computation — the reduced radix complement of x
— gives
This leads us to Algorithm 3.5 for negating any integer stored in the two’s complement,
n-bit code.
This process — computing the one’s complement, then adding one — is called computing
the two’s complement.
Be Careful!
• “In two’s complement” describes the storage code.
• “Taking the two’s complement” is an active computation. If the value the computation
is applied to an integer stored in the two’s complement notation, this computation is
mathematically equivalent to negating the number.
Combining Algorithm 3.5 with observations about Table 3.2 above, we can easily
compute the decimal equivalent of any integer stored in the two’s complement notation
by applying Algorithm 3.6.
Example 3-d
The 16-bit integer 567816 is stored in two’s complement notation. Convert it to a
signed, decimal integer.
Solution:
Since the high-order bit is zero, we simply compute the decimal equivalent:
Example 3-e
The 16-bit integer 876516 is stored in two’s complement notation. Convert it to a
signed, decimal integer.
Solution:
Since the high-order bit is one, we first negate the number in the two’s complement
format.
Place a minus sign in front of the number (since we negated it in the two’s complement
domain).
876516 = −3087510
Algorithm 3.7 shows how to convert a signed decimal number to two’s complement
binary.
Example 3-f
Convert the signed, decimal integer +31693 to a 16-bit integer in two’s complement
notation. Give the answer in hexadecimal.
Solution:
Since this is a positive number, we simply convert it. The answer is to be given in
hexadecimal, so we will repetitively divide by 16 to get the answer.
So the answer is
3169310 = 7bcd16
Example 3-g
Convert the signed, decimal integer -250 to a 16-bit integer in two’s complement
notation. Give the answer in hexadecimal.
Solution:
Since this is a negative number, we first negate it, giving +250. Then we convert this
value. The answer is to be given in hexadecimal, so we will repetitively divide by 16 to
get the answer.
This gives us
25010 = 00fa16
Now we take the one’s complement: 00fa ⇒ ff05
and add one: ⇒ ff06 So the answer is
−25010 = ff0616
In Section 3.3 (page 37) you saw that carry is irrelevant when working with signed
integers. You also saw that adding two signed numbers can produce an incorrect result.
That is, the sum may exceed the range of values that can be represented in the allocated
number of bits.
The flags register, rflags, provides a bit, the Overflow Flag (OF), for detecting
whether the sum of two n-bit, signed numbers stored in the two’s complement code has
exceeded the range allocated for it. Each operation that affects the overflow flag sets
the bit equal to the exclusive or of the carry into the highest-order bit of the operands
and the ultimate carry. For example, when adding the two 8-bit numbers, 1516 and 6f16 ,
we get:
carry −→ 0 1 ←− penultimate carry
0001 0101 ←− x
+ 0110 1111 ←− y
1000 0100 ←− sum
In this example, there is a carry of zero and a penultimate (next to last) carry of one.
The OF flag is equal to the exclusive or of carry and penultimate carry:
OF = 0 ^ 1
=1 (3.20)
x = 1...
y = 0...
That is, the high-order bit of one number is 1 and the high-order bit of the other
is 0, regardless of what the other bits are. Now, if we add x and y, there are two
possible results with respect to carry:
We conclude that adding two integers of opposite sign always yields 0 for the
overflow flag.
Next, notice that since y is positive and x negative:
0 ≤ y ≤ +(2(n−1) − 1) (3.21)
−2 (n−1)
≤x<0 (3.22)
Thus, the sum of two integers of opposite sign remains within the range of signed
integers, and there is no overflow (OF = 0).
Case 2: Both numbers are positive. Since both are positive, we can express x and y
in binary as:
x = 0...
y = 0...
That is, the high-order bit is 0, regardless of what the other bits are. Now, if we
add x and y, there are two possible results with respect to carry:
1. If the penultimate carry is zero:
carry −→ 0 0 ←− penultimate carry
0 . . . ←− x
+ 0 . . . ←− y
0 . . . ←− sum
this addition would produce OF = 0 ^ 0 = 0. The high-order bit of the sum is
zero, so it is a positive number, and the sum is within range.
2. If the penultimate carry is one:
carry −→ 0 1 ←− penultimate carry
0 . . . ←− x
+ 0 . . . ←− y
1 . . . ←− sum
this addition would produce OF = 0 ^ 1 = 1. The high-order bit of the sum is
one, so it is a negative number. Adding two positive numbers cannot yield a
negative sum, so this sum has exceeded the allocated range.
Case 3: Both numbers are negative. Since both are negative, we can express x and
y in binary as:
x = 1...
y = 1...
That is, the high-order bit is 1, regardless of what the other bits are. Now, if we
add x and y, there are two possible results with respect to carry:
3.4. OVERFLOW AND SIGNED DECIMAL INTEGERS 45
Be Careful! Do not to confuse positive signed numbers with unsigned numbers. The range
for unsigned 32-bit integers is 0 – 4294967295, and for signed 32-bit integers the range is
-2147483648 – +2147483647.
The codes used for both unsigned integers and signed integers are circular in nature.
That is, for a given number of bits, each code “wraps around.” This can be seen pictorially
in the “Decoder Ring” shown in Figure 3.1 for three-bit numbers.
3.4. OVERFLOW AND SIGNED DECIMAL INTEGERS 46
Figure 3.1: “Decoder Ring” for three-bit signed and unsigned integers. Move clockwise
when adding numbers, counter-clockwise when subtracting. Crossing over
000 sets the CF to one, indicating an error for unsigned integers. Crossing
over 100 sets the OF to one, indicating an error for signed integers.
3.4. OVERFLOW AND SIGNED DECIMAL INTEGERS 47
Example 3-h
Using the “Decoder Ring” (Figure 3.1), add the unsigned integers 3 + 4.
Solution:
Working only in the inner ring, start at the tic mark for 3, which corresponds to the bit
pattern 011. The bit pattern corresponding to 4 is 100, which is four tic marks CW from
zero. So move four tic marks CW from the 3 tic mark. This places us at the tic mark
labeled 111, which corresponds to 7. Since we did not pass the tic mark at the top of
the Decoder Ring, CF = 0. Thus, the result is correct.
Example 3-i
Using the “Decoder Ring” (Figure 3.1), add the unsigned integers 5 + 6.
Solution:
Working only in the inner ring, start at the tic mark for 5, which corresponds to the
bit pattern 101. The bit pattern corresponding to 6 is 110, which is six tic marks CW
from zero. So move six tic marks CW from the 5 tic mark. This places us at the tic mark
labeled 011, which corresponds to 3. Since we have crossed the tic mark at the top of
the Decoder Ring, the CF becomes 1. Thus, the result is incorrect.
Example 3-j
Using the “Decoder Ring” (Figure 3.1), add the signed integers (+1) + (+2).
Solution:
Working only in the outer ring, start at the tic mark for +1, which corresponds to the
bit pattern 001. The bit pattern corresponding to +2 is 010, which is two tic marks CW
from zero. So move two tic marks CW from the +1 tic mark. This places us at the tic
mark labeled 011, which corresponds to +3. Since we did not pass the tic mark at the
bottom of the Decoder Ring, OF = 0. Thus, the result is correct.
Example 3-k
Using the “Decoder Ring” (Figure 3.1), add the signed integers (+3) + (-4).
Solution:
Working only in the outer ring, start at the tic mark for +3, which corresponds to the
bit pattern 011. The bit pattern corresponding to -4 is 100, which is four tic marks CCW
from zero. So move four tic marks CCW from the +3 tic mark. This places us at the tic
mark labeled 111, which corresponds to -1. Since we did not pass the tic mark at the
bottom of the Decoder Ring, OF = 0. Thus, the result is correct.
3.5. C/C++ BASIC DATA TYPES 48
Example 3-l
Using the “Decoder Ring” (Figure 3.1), add the signed integers (+3) + (+1).
Solution:
Working only in the outer ring, start at the tic mark for +3, which corresponds to the
bit pattern 011. The bit pattern corresponding to +1 is 001, which is one tic mark CW
from zero. So move one tic mark CW from the +3 tic mark. This places us at the tic
mark labeled 100, which corresponds to -4. Since we did pass the tic mark at the bottom
of the Decoder Ring, OF = 1. Thus, the result is incorrect.
Table 3.3: Sizes (in bits) of some C/C++ data types in 32-bit and 64-bit modes. The size
of a long depends on the mode. Pointers (addresses) are 32 bits in 32-bit
mode and can be 32 or 64 bits in 64-bit mode.
in this table are taken from the System V Application Binary Interface specifications,
reference [33] for 32-bit and reference [25] for 64-bit, and are used by the gcc compiler
for the x86-64 architecture. Language specifications tend to be more permissive in order
to accommodate other hardware architectures. For example, see reference [10] for the
specifications for C.
A given “real world” value can usually be represented in more than one data type. For
example, most people would think of “123” as representing “one hundred twenty-three.”
This value could be stored in a computer in int format or as a text string. An int in
our C/C++ environment is stored in 32 bits, and the bit pattern would be
0x0000007b
As a C-style text string, it would also require four bytes of memory, but their bit patterns
would be
0x31 0x32 0x33 0x00
The int format is easier to use in arithmetic and logical expressions, but the interface
with the outside world through the screen and the keyboard uses the char format. If a
user entered 123 from the keyboard, the operating system would read the individual
characters, each in char format. The text string must be converted to int format. After
the numbers are manipulated, the result must be converted from the int format to char
format for display on the screen.
3.5. C/C++ BASIC DATA TYPES 49
C programmers use functions in the stdio library and C++ programmers use func-
tions in the iostream library to do these conversions between the int and char formats.
For example, the C code sequence
scanf("%i", &x);
x += 100;
printf("%i", x);
• reads characters from the keyboard and converts the character sequence into the
corresponding int format.
• adds 100 to the int.
• converts the resulting int into a character sequence and displays it on the screen.
The C or C++ I/O library functions in the code segments above do the necessary
conversions between character sequences and the int storage format. However, once
the conversion is performed, they ultimately call the read system call function to read
bytes from the keyboard and the write system call function to write bytes to the screen.
As shown in Figure 3.2, an application program can call the read and write functions
directly to transfer bytes.
application
C I/O libraries
write read
OS
screen/keyboard
Figure 3.2: Relationship of I/O libraries to application and operating system. An applica-
tion can use functions in the I/O libraries to convert between keyboard/screen
chars and basic data types, or it can directly use the read /write system
calls to transfer raw bytes.
When using the read and write system call functions for I/O, it is the programmer’s
responsibility to do the conversions between the char type used for I/O and the storage
formats used within the program. We will soon be writing our own functions in assembly
3.5. C/C++ BASIC DATA TYPES 50
language to convert between the character format used for screen display and keyboard
input, and the internal storage format of integers in the binary number system. The
purpose of writing our own functions is to gain a thorough understanding of how data is
represented internally in the computer.
Aside: If the numerical data are used primarily for display, with few arithmetic operations,
it makes more sense to store numerical data in character format. Indeed, this is done in
many business data processing environments. But this makes arithmetic operation more
complicated.
1 /*
2 * mulDiv.c
3 * Asks user to enter an integer. Then prompts user to enter
4 * a power of two to multiply the integer, then another power
5 * of two to divide. Assumes that user does not request more
6 * than 30 as the power of 2.
7 * Bob Plantz - 4 June 2009
8 */
9
10 #include <stdio.h>
11
12 int main(void)
13 {
14 int x;
15 int leftShift, rightShift;
16
28 return 0;
29 }
Listing 3.1: Shifting to multiply and divide by powers of two.
It is easy to see what each of these operators does by using truth tables. To illustrate
how truth tables work, consider the algorithm for binary addition. In Section 3.1 (page
30) we saw that the ith bit in the result is the sum of the ith bit of one number plus the
ith bit of the other number plus the carry produced from adding the (i-1)th bits. This
sum will produce a carry of zero or one. In other words, a bit adder has three inputs —
the two corresponding bits from the two numbers being added and the carry from the
previous bit addition — and two outputs — the result and the carry. In a truth table we
have a column for each input and each output. Then we write down all possible input bit
combinations and then show the output(s) in the corresponding row. A truth table for
the bit addition operation is shown in Figure 3.3. We use the notation x[i] to represent
the ith bit in the variable x; x[i-j] would specify bits i through j.
3.5. C/C++ BASIC DATA TYPES 52
Figure 3.3: Truth table for adding two bits with carry from a previous bit addition. x[i]
is the ith bit of x; carry[(i-1)] is the carry from adding the (i-1)th bits.
The bitwise logical operators act on the corresponding bits of two operands as shown
in Figure 3.4.
x[i] ∼x[i]
complement
0 1
1 0
Figure 3.4: Truth tables showing bitwise C/C++ operations. x[i] is the ith bit in the
variable x.
Example 3-m
Let int x = 0x1234abcd. Compute the and, or, and xor with 0xdcba4321.
Solution:
Make sure that you distinguish these bitwise logical operators from the C/C++ logical
operators, &&, ||, and !. The logical operators work on groups of bits organized into
integral data types rather than individual bits. For comparison, the truth tables for the
C/C++ logical operators are shown in Figure 3.5
x y x && y
and
0 0 0
0 non-zero 0
non-zero 0 0
non-zero non-zero 1
x y x || y
or
0 0 0
0 non-zero 1
non-zero 0 1
non-zero non-zero 1
x !x
complement
0 1
non-zero 0
Figure 3.5: Truth tables showing C/C++ logical operations. x and y are variables of
integral data type.
Table 3.4: Hexadecimal characters and corresponding int. Note the change in pattern
from ‘9’ to ‘a’.
1 /*
2 * convertHex.c
3 * Asks user to enter a number in hexadecimal
4 * then echoes it in hexadecimal and in decimal.
5 * Assumes that user does not make mistakes.
6 * Bob Plantz - 4 June 2009
7 */
8
9 #include <stdio.h>
10 #include <unistd.h>
11
12 int main(void)
13 {
14 int x;
15 unsigned char aChar;
16
20 x = 0; // initialize result
21 read(STDIN_FILENO, &aChar, 1); // get first character
22 while (aChar != ’\n’) // look for return key
23 {
24 x = x << 4; // make room for next four bits
25 if (aChar <= ’9’)
26 {
27 x = x + (int)(aChar & 0x0f);
28 }
29 else
30 {
3.6. OTHER CODES 55
40 return 0;
41 }
Listing 3.2: Reading hexadecimal values from keyboard.
From Table 3.5 we can see that six bit patterns are “wasted.” The effect of this
inefficiency is that a 16-bit storage location has a range of 0 – 9999 if we use BCD, but
the range is 0 – 65535 if we use binary.
BCD is important in specialized systems that deal primarily with numerical data.
There are I/O devices that deal directly with numbers in BCD without converting to/from
a character code, for example, ASCII. The COBOL programming language supports a
packed BCD format where two BCD characters are stored in each 8-bit byte. The last
(4-bit) digit is used to store the sign of the number as shown in Table 3.6. The specific
codes used depend upon the particular implementation.
then add a zero to the beginning of each of the original bit patterns and a 1 to each of
the reflected ones:
decimal Gray code
0 00
1 01
2 11
3 10
3.6. OTHER CODES 57
Let us repeat these two steps to add another bit. Reflect the pattern:
Gray code
00
01
11
10
10
11
01
00
then add a zero to the beginning of each of the original bit patterns and a 1 to each of
the reflected ones:
decimal Gray code
0 000
1 001
2 011
3 010
4 110
5 111
6 101
7 100
The Gray code for four bits is shown in Table 3.7. Notice that the pattern of only
changing one bit between adjacent values also holds when the bit pattern “wraps around.”
That is, only one bit is changed when going from the highest value (15 for four bits) to
the lowest (0).
3.7 Exercises
3-1 (§3.1) How many bits are required to store a single decimal digit?
3-2 (§3.1) Using the answer from Exercise 1, invent a code for storing eight decimal
digits in a thirty-two bit register. Using your new code, does binary addition produce
the correct results?
3-3 (§3.3) Select several pairs of signed integers from Table 3.2, convert each to binary
using the table, perform the binary addition, and check the results. Does this code
always work?
3-4 (§3.3) If you did not select them in Exercise 3, add +4 and +5 using the four-bit,
two’s complement code (from Table 3.2). What answer do you get?
3-5 (§3.3) If you did not select them in Exercise 3, add -4 and -5 using the four-bit, two’s
complement code (from Table 3.2). What answer do you get?
3-6 (§3.3) Select any positive integer from Table 3.2. Add the binary representation
for the positive value to the binary representation for the negative value. What is
the four-bit result? What is the value of the CF? The OF? If you do the addition “on
paper” (that is, you can use as many digits as you wish), how could you express, in
English, the result of adding the positive representation of an integer to its negative
representation in the two’s complement notation? The negative representation to
the positive representation? Which two integers do not have a representation of
the opposite sign?
3-7 (§3.3) The following 8-bit hexadecimal values are stored in two’s complement format.
What are the equivalent signed decimal numbers?
a) 55 e) 80
b) aa
f) 63
c) f0
d) 0f g) 7b
3-8 (§3.3) The following 16-bit hexadecimal values are stored in two’s complement
format. What are the equivalent signed decimal numbers?
a) 1234 e) 8000
b) edcc f) 0400
c) fedc g) ffff
d) 07d0 h) 782f
3-9 (§3.3) Show how each of the following signed, decimal integers would be stored in
8-bit two’s complement format. Give your answer in hexadecimal.
a) 100 e) 127
b) -1 f) -16
c) -10 g) -32
d) 88 h) -128
3.7. EXERCISES 59
3-10 (§3.3) Show how each of the following signed, decimal integers would be stored in
16-bit two’s complement format. Give your answer in hexadecimal.
a) 1024 e) -256
b) -1024 f) -32768
c) -1 g) -32767
d) 32767 h) -128
3-11 (§3.4) Perform binary addition of the following pairs of 8-bit numbers (shown
in hexadecimal) and indicate whether your result is “right” or “wrong.” First
treat them as unsigned values, then as signed values (stored in two’s complement
format). Thus, you will have two “right/wrong” answers for each sum. Note that
the computer performs only one addition, setting both the CF and OF according
to the results of the addition. It is up to the program to test the appropriate flag
depending on whether the numbers are being considered as unsigned or signed in
the program.
a) 55 + aa d) 63 + 7b
b) 55 + f0 e) 0f + ff
c) 80 + 7b f) 80 + 80
3-12 (§3.4, 3.5) Perform binary addition of the following pairs of 16-bit numbers (shown
in hexadecimal) and indicate whether your result is “right” or “wrong.” First
treat them as unsigned values, then as signed values (stored in two’s complement
format). Thus, you will have two “right/wrong” answers for each sum. Note that
the computer performs only one addition, setting both the CF and OF according
to the results of the addition. It is up to the program to test the appropriate flag
depending on whether the numbers are being considered as unsigned or signed in
the program.
3-13 (§3.5) Enter the program in Figure 3.1 and get it to work. Use the program to
compute 1 (one) multiplied by 2 raised to the 31st power. What result do you get
for 1 (one) multiplied by 2 raised to the 32nd power? Explain the results.
3-14 (§3.5) Write a C program that prompts the user to enter a hexadecimal value,
multiplies it by ten, then displays the result in hexadecimal. Your main function
should
Use the readLn and writeStr functions from Exercise 2 -32 to read from the
keyboard and display on the screen. Place the functions to perform the conversions
in separate files. Hint: review Figure 3.2.
3-15 (§3.5) Write a C program that prompts the user to enter a binary value, multiplies
it by ten, then displays the result in binary. (“Binary” here means that the user
communicates with the program in ones and zeros.) Your main function should
a) declare a char array,
b) call the readLn function to read from the keyboard,
c) call a function to convert the input text string to an int,
d) multiply the int by ten,
e) call a function to convert the int to its corresponding binary text string,
f) call writeStr to display the resulting binary text string.
Use the readLn and writeStr functions from Exercise 2 -32 to read from the
keyboard and display on the screen. Your functions to convert from a binary text
string to an int and back should be placed in separate functions.
3-16 (§3.5) Write a C program that prompts the user to enter unsigned decimal integer,
multiplies it by ten, then displays the result in binary. (“Binary” here means that
the user communicates with the program in ones and zeros.) Your main function
should
a) declare a char array,
b) call the readLn function to read from the keyboard,
c) call a function to convert the input text string to an int,
d) multiply the int by ten,
e) call a function to convert the int to its corresponding decimal text string,
f) call writeStr to display the resulting decimal text string.
Use the readLn and writeStr functions from Exercise 2 -32 to read from the
keyboard and display on the screen. Your function to convert from a decimal text
string to an int should be placed in a separate function. Hint: this problem cannot
be solved by simply shifting bit patterns. Think carefully about the mathematical
equivalence of shifting bit patterns left or right.
3-17 (§3.5) Modify the program in Exercise 3-16 so that it works with signed decimal
integers.
Chapter 4
Logic Gates
This chapter provides an overview of the hardware components that are used to build a
computer. We will limit the discussion to electronic computers, which use transistors
to switch between two different voltages. One voltage represents 0, the other 1. The
hardware devices that implement the logical operations are called logic gates.
x y x·y
x x·y 0 0 0
y
0 1 0
1 0 0
1 1 1
61
4.1. BOOLEAN ALGEBRA 62
We can see from the truth table that the AND operator follows similar rules as
multiplication in elementary algebra.
• OR — a binary operator; the result is 1 if at least one of the two operands is 1;
otherwise the result is 0. We will use ’+’ to designate the OR operation. It is also
common to use the ’∨’ symbol or simply “OR”. The hardware symbol for the OR
gate is shown in Figure 4.2. The inputs are x and y. The resulting output, x + y, is
shown in the truth table in this figure. From the truth table we can see that the OR
x y x+y
x 0 0 0
y x+y
0 1 1
1 0 1
1 1 1
operator follows the same rules as addition in elementary algebra except that
1 + 1 = 1
x x0
x x0 0 1
1 0
The NOT operation has no analog in elementary algebra. Be careful to notice that
inversion of a value in elementary algebra is a division operation, which does not
exist in Boolean algebra.
Two-state variables can be combined into expressions with these three operators in
the same way that you would use the C/C++ operators &&, ||, and ! to create logical
expressions commonly used to control if and while statements. We now examine some
Boolean algebra properties for manipulating such expressions. As you read through this
material, keep in mind that the same techniques can be applied to logical expressions in
programming languages.
These properties are commonly presented as theorems. They are easily proved from
application of truth tables.
There is a duality between the AND and OR operators. In any equality you can
interchange AND and OR along with the constants 0 and 1, and the equality still holds.
Thus the properties will be presented in pairs that illustrate their duality. We first
consider properties that are the same as in elementary algebra.
4.1. BOOLEAN ALGEBRA 63
x · (y · z) = (x · y) · z (4.1)
x + (y + z) = (x + y) + z (4.2)
It is straightforward to prove these equations with truth tables. For example, for
Equation 4.1:
x y z (y · z) (x · y) x · (y · z) = (x · y) · z
0 0 0 0 0 0 0
0 0 1 0 0 0 0
0 1 0 0 0 0 0
0 1 1 1 0 0 0
1 0 0 0 0 0 0
1 0 1 0 0 0 0
1 1 0 0 1 0 0
1 1 1 1 1 1 1
x y z (y + z) (x + y) x + (y + z) = (x + y) + z
0 0 0 0 0 0 0
0 0 1 1 0 1 1
0 1 0 1 1 1 1
0 1 1 1 1 1 1
1 0 0 0 1 1 1
1 0 1 1 1 1 1
1 1 0 1 1 1 1
1 1 1 1 1 1 1
x·1=x (4.3)
x+0=x (4.4)
Now we consider properties where Boolean algebra differs from elementary algebra.
• AND and OR are commutative:
This is easily proved by looking at the second and third lines of the respective truth
tables. In elementary algebra, only the addition and multiplication operators are
commutative.
• AND and OR have a null value:
x·0=0 (4.7)
x+1=1 (4.8)
The null value for the AND is the same as multiplication in elementary algebra. But
addition in elementary algebra does not have a null constant, while OR in Boolean
algebra does.
4.1. BOOLEAN ALGEBRA 64
x · x0 = 0 (4.9)
x + x0 = 1 (4.10)
x·x=x (4.11)
x+x=x (4.12)
That is, repeated application of either operator to the same value does not change
it. This differs considerably from elementary algebra — repeated application of
addition is equivalent to multiplication and repeated application of multiplication
is the power operation.
• AND and OR are distributive:
x · (y + z) = x · y + x · z (4.13)
x + y · z = (x + y) · (x + z) (4.14)
Going from right to left in Equation 4.13 is the very familiar factoring from addition
and multiplication in elementary algebra. On the other hand, the operation in
Equation 4.14 has no analog in elementary algebra. It follows from the idempotency
property. The NOT operator has an obvious property:
• NOT shows involution:
(x0 )0 = x (4.15)
Again, since there is no complement in elementary algebra, there is no equivalent
property.
• DeMorgan’s Law is an important expression of the duality between the AND and
OR operations.
(x · y)0 = x0 + y 0 (4.16)
(x + y)0 = x0 · y 0 (4.17)
The validity of DeMorgan’s Law can be seen in the following truth tables. For
Equation 4.16:
x y (x · y) (x · y)0 x0 y0 x0 + y 0
0 0 0 1 1 1 1
0 1 0 1 1 0 1
1 0 0 1 0 1 1
1 1 1 0 0 0 0
x y (x + y) (x + y)0 x0 y0 x0 · y 0
0 0 0 1 1 1 1
0 1 1 0 1 0 0
1 0 1 0 0 1 0
1 1 1 0 0 0 0
4.2. CANONICAL (STANDARD) FORMS 65
It is common to index the minterms according to the values of the variables that
would cause that minterm to evaluate to 1. For example, x0 · y 0 · z 0 = 1 when x = 0, y = 0,
and z = 0, so this would be m0 . The minterm x0 · y · z 0 evaluates to 1 when x = 0, y = 1,
and z = 0, so is m2 . Table 4.1 lists all the minterms for a three-variable expression.
minterm x y z
m 0 = x0 · y 0 · z 0 0 0 0
m 1 = x0 · y 0 · z 0 0 1
m 2 = x0 · y · z 0 0 1 0
m 3 = x0 · y · z 0 1 1
m4 = x · y 0 · z 0 1 0 0
m5 = x · y 0 · z 1 0 1
m6 = x · y · z 0 1 1 0
m7 = x · y · z 1 1 1
Table 4.1: Minterms for three variables. mi is the ith minterm. The x, y, and z values
cause the corresponding minterm to evaluate to 1.
A convenient notation for expressing a sum of minterms is to use the symbol with
P
a numerical list of the minterm indexes. For example,
F (x, y, z) = x0 · y 0 · z 0 + x0 · y 0 · z + x · y 0 · z + x · y · z 0
= m0 + m1 + m5 + m6
X
= (0, 1, 5, 6) (4.18)
As you might expect, each of the terms defined above has a dual definition.
sum term: A term in which the literals are connected with the OR operator. OR is
additive, hence the use of “sum.”
maxterm or standard sum: A sum term that contains each of the variables in the
problem, either in its complemented or uncomplemented form. For example, if an
expression involves three variables, x, y, and z, (x + y + z), (x0 + y + z 0 ), and (x0 + y 0 + z 0 )
are all maxterms, but (x + y) is not.
product of sums (PoS): One or more sum terms connected with AND operators. AND
is multiplicative, hence the use of “product.”
product of maxterms (PoM) or canonical product: A PoS in which each sum term
is a maxterm. Since all the variables are present in each maxterm, the canonical
product is unique for a given problem.
It also follows that any Boolean function can be uniquely expressed as a product of
maxterms (PoM) that evaluate to 1. Starting with the product of maxterms ensures that
the full effect of each variable has been taken into account. Again, this often does not
lead to the best implementation, and in Section 4.3 we will see some tools to simplify
PoMs.
It is common to index the maxterms according to the values of the variables that
would cause that maxterm to evaluate to 0. For example, x + y + z = 0 when x = 0, y = 0,
and z = 0, so this would be M0 . The maxterm x0 + y + z 0 evaluates to 0 when x = 1, y = 0,
and z = 1, so is m5 . Table 4.2 lists all the maxterms for a three-variable expression.
4.3. BOOLEAN FUNCTION MINIMIZATION 67
M axterm x y z
M0 = x + y + z 0 0 0
M1 = x + y + z 0 0 0 1
M2 = x + y 0 + z 0 1 0
M3 = x + y 0 + z 0 0 1 1
M4 = x 0 + y + z 1 0 0
M5 = x 0 + y + z 0 1 0 1
M6 = x 0 + y 0 + z 1 1 0
M7 = x 0 + y 0 + z 0 1 1 1
Table 4.2: Maxterms for three variables. Mi is the ith maxterm. The x, y, and z values
cause the corresponding maxterm to evaluate to 0.
The similar notation for expressing a product of maxterms is to use the symbol
Q
with a numerical list of the maxterm indexes. For example (and see Exercise 4-8),
The names “minterm” and “maxterm” may seem somewhat arbitrary. But consider
the two functions,
F1 (x, y, z) = x · y · z (4.20)
F2 (x, y, z) = x + y + z (4.21)
There are eight (23 ) permutations of the three variables, x, y, and z. F1 has one minterm
and evaluates to 1 for only one of the permutations, x = y = z = 1. F2 has one maxterm
and evaluates to 1 for all permutations except when x = y = z = 0. This is shown in the
following truth table:
minterm maxterm
x y z F1 = (x · y · z) F2 = (x + y + z)
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 0 1
1 0 0 0 1
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1
ORing more minterms to an SoP expression expands the number of cases where it
evaluates to 1, and ANDing more maxterms to a PoS expression reduces the number of
cases where it evaluates to 1.
These definitions imply that there can be more than one minimal solution to a problem.
Good hardware design practice involves finding all the minimal solutions, then assessing
each one within the context of the available hardware. For example, judiciously placed
NOT gates can actually reduce hardware complexity (Section 4.4.3, page 83).
x y
(x · y 0 ) + (x0 · y) + (x · y)
Now let us simplify the expression in Equation 4.22 to see if we can reduce the
hardware requirements. This process will probably seem odd to a person who is not
used to manipulating Boolean expressions, because there is not a single correct path to
a solution. We present one way here. First we use the idempotency property (Equation
4.12) to duplicate the third term, and then rearrange a bit:
F1 (x, y) = x · y 0 + x · y + x0 · y + x · y (4.23)
4.3. BOOLEAN FUNCTION MINIMIZATION 69
Next we use the distributive property (Equation 4.13) to factor the expression:
F1 (x, y) = x · 1 + y · 1
=x+y (4.25)
which you recognize as the simple OR operation. It is easy to see that this is a minimal
sum of products for this function. We can implement Equation 4.22 with a single OR
gate — see Figure 4.2 on page 62. This is clearly a less expensive, faster circuit than
the one shown in Figure 4.4.
To illustrate how a product of sums expression can be minimized, consider the
function:
F2 (x, y) = (x + y 0 ) · (x0 + y) · (x0 + y 0 ) (4.26)
The expression on the right-hand side is a PoM. The circuit for this function is shown in
Figure 4.5. It requires three OR gates, one AND gate, and two NOT gates.
x y
(x + y 0 ) · (x0 + y) · (x0 + y 0 )
We will use the distributive property (Equation 4.14) on the right two factors and
recognize the complement (Equation 4.9):
F2 (x, y) = (x + y 0 ) · (x0 + y · y 0 )
= (x + y 0 ) · x0 (4.27)
Now, use the distributive (Equation 4.13) and complement (Equation 4.9) properties to
obtain:
F2 (x, y) = x · x0 + x0 · y 0
= x0 · y 0 (4.28)
Thus, the function can be implemented with two NOT gates and a single AND gate,
which is clearly a minimal product of sums. Again, with a little algebraic manipulation
we have arrived at a much simpler solution.
4.3. BOOLEAN FUNCTION MINIMIZATION 70
Example 4-a
Design a function that will detect the even 4-bit integers.
Solution:
The even 4-bit integers are given by the function:
F (w, x, y, z) = w0 · x0 · y 0 · z 0 + w0 · x0 · y · z 0 + w0 · x · y 0 · z 0 + w0 · x · y · z 0
+ w · x0 · y 0 · z 0 + w · x0 · y · z 0 + w · x · y 0 · z 0 + w · x · y · z 0
F (w, x, y, z) = z 0 · (w0 · x0 · y 0 + w0 · x0 · y + w0 · x · y 0 + w0 · x · y
+ w · x0 · y 0 + w · x0 · y + w · x · y 0 + w · x · y)
= z 0 · (w0 · (x0 · y 0 + x0 · y + x · y 0 + x · y) + w · (x0 · y 0 + x0 · y + x · y 0 + x · y))
= z 0 · (w0 + w) · (x0 · y 0 + x0 · y + x · y 0 + x · y)
= z 0 · (w0 + w) · (x0 · (y 0 + y) + x · (y 0 + y))
= z 0 · (w0 + w) · (x0 + x) · (y 0 + y)
F (x, y, z) = z 0
F (x, y) y
0 1
0 m0 m1
x
1 m2 m3
value of x for each row is shown by the number (0 or 1) immediately to the left of the
row, and the value of y for each column appears at the top of the column. Although it
occurs automatically in a two-variable Karnaugh map, the cells must be arranged such
that only one variable changes between two cells that share an edge. This is called the
adjacency property.
The procedure for simplifying an SoP expression using a Karnaugh map is:
1. Place a 1 in each cell that corresponds to a minterm that evaluates to 1 in the
expression.
4.3. BOOLEAN FUNCTION MINIMIZATION 71
2. Combine cells with 1s in them and that share edges into the largest possible groups.
Larger groups result in simpler expressions. The number of cells in a group must
be a power of 2. The edges of the Karnaugh map are considered to wrap around to
the other side, both vertically and horizontally.
3. Groups may overlap. In fact, this is common. However, no group should be fully
enclosed by another group.
4. The result is the sum of the product terms that represent each group.
The Karnaugh map provides a graphical means to find the same simplifications as
algebraic manipulations, but some people find it easier to spot simplification patterns
on a Karnaugh map. Probably the easiest way to understand how Karnaugh maps are
used is through an example. We start with Equation 4.22 (repeated here):
F1 (x, y) = x · y 0 + x0 · y + x · y
and use a Karnaugh map to graphically find the same minimal sum of products that the
algebraic steps in Equations 4.23 through 4.25 gave us.
We start by placing a 1 in each cell corresponding to a minterm that appears in the
equation as shown in Figure 4.7. The two cells on the right side correspond to the
F1 (x, y) y
0 1
0 1
x
1 1 1
F1 (x, y) y
0 1
0 1
x
1 1
1
Since groups can overlap, we create a second grouping as shown in Figure 4.9. This
grouping shows the simplification,
x · y 0 + x · y = x · (y 0 + y)
=x (4.30)
The group in the bottom row represents the product term x, and the one in the
right-hand column represents y. So the simplification is:
F1 (x, y) = x + y (4.31)
4.3. BOOLEAN FUNCTION MINIMIZATION 72
F1 (x, y) y
0 1
0 1
x
1 1 1
Note that the overlapping cell, x · y, is the term that we used the idempotent property
to duplicate in Equation 4.23. Grouping overlapping cells is a graphical application of
the idempotent property (Equation 4.12).
Next we consider a three-variable Karnaugh map. Table 4.1 (page 66) lists all the
minterms for three variables, x, y, and z, numbered from 0 – 7. A total of eight cells
are needed, so we will draw it four cells wide and two high. Our Karnaugh map will be
drawn with y and z on the horizontal axis, and x on the vertical. Figure 4.10 shows how
the three-variable minterms map onto a Karnaugh map.
F (x, y, z) yz
00 01 11 10
0 m0 m1 m3 m2
x
1 m4 m5 m7 m6
Notice the order of the bit patterns along the top of the three-variable Karnaugh map,
which is chosen such that only one variable changes value between any two adjacent
cells (the adjacency property). It is the same as a two-variable Gray code (see Table 3.7,
page 57). That is, the order of the columns is such that the yz values follow the Gray
code.
A four-variable Karnaugh map is shown in Figure 4.11. The y and z variables are on
the horizontal axis, w and x on the vertical. From this four-variable Karnaugh map we
see that the order of the rows is such that the wx values also follow the Gray code, again
to implement the adjacency property.
F (w, x, y, z) yz
00 01 11 10
00 m0 m1 m3 m2
01 m4 m5 m7 m6
wx
11 m12 m13 m15 m14
10 m8 m9 m11 m10
Other axis labeling schemes also work. The only requirement is that entries in
adjacent cells differ by only one bit (which is a property of the Gray code). See Exercises
4-9 and 4-10.
4.3. BOOLEAN FUNCTION MINIMIZATION 73
Example 4-b
Find a minimal sum of products expression for the function
F (x, y, z) = x0 · y 0 · z 0 + x0 · y 0 · z + x0 · y · z 0
+ x · y0 · z0 + x · y · z0 + x · y · z
Solution:
First we draw the Karnaugh map:
F (x, y, z) yz
00 01 11 10
0 1 1 1
x
1 1 1 1
Several groupings are possible. Keep in mind that groupings can wrap around. We will
work with
F (x, y, z) yz
00 01 11 10
0 1 1 1
x
1 1 1 1
F (x, y, z) = z 0 + x0 · y 0 + x · y
Example 4-c
Find a minimal product of sums for the function (repeat of Example 4-b).
F (x, y, z) = x0 · y 0 · z 0 + x0 · y 0 · z + x0 · y · z 0
+ x · y0 · z0 + x · y · z0 + x · y · z
Solution:
Using the Karnaugh map zeros,
F (x, y, z) yz
00 01 11 10
0 0
x
1 0
F 0 (x, y, z) = x0 · y · z + x · y 0 · z
F (x, y, z) = (x + y 0 + z 0 ) · (x0 + y + z 0 )
We now work an example with four variables.
Example 4-d
F (w, x, y, z) = w0 · x0 · y 0 · z 0 + w0 · x0 · y · z 0 + w0 · x · y 0 · z
+ w0 · x · y · z + w · x · y 0 · z + w · x · y · z
+ w · x0 · y 0 · z 0 + w · x0 · y · z 0
Solution:
Using the groupings on the Karnaugh map,
F (w, x, y, z) yz
00 01 11 10
00 1 ! " 1
#
01 1 1
wx
11 1
" 1 !
#
10 1 1
F (w, x, y, z) = x0 · z 0 + x · z
4.3. BOOLEAN FUNCTION MINIMIZATION 75
Not only have we greatly reduced the number of AND and OR gates, we see that the two
variables w and y are not needed. By the way, you have probably encountered a circuit
that implements this function. A light controlled by two switches typically does this.
As you probably expect by now a Karnaugh map also works when a function is
specified as a product of sums. The differences are:
1. maxterms are numbered 0 for uncomplemented variables and 1 for complemented,
and
F1 is a sum of products with only one minterm, and F2 is a product of sums with only
one maxterm. Figure 4.12(a) shows how the minterm appears on a Karnaugh map, and
Figure 4.12(b) shows the maxterm.
yz yz
F1 (x, y, z) 00 01 11 10 F2 (x, y, z) 00 01 11 10
0 1 0
x x
1 1
(a) (b)
Figure 4.12: Comparison of one minterm (a) versus one maxterm (b) on a Karnaugh
map.
Figure 4.13 shows how three-variable maxterms map onto a Karnaugh map. As with
minterms, x is on the vertical axis, y and z on the horizontal. To use the Karnaugh map
for maxterms, place a 0 is in each cell corresponding to a maxterm.
F (x, y, z) yz
00 01 11 10
0 M 0 M1 M3 M2
x
1 M 4 M5 M7 M6
F (w, x, y, z) yz
00 01 11 10
00 M 0 M1 M3 M2
01 M 4 M5 M7 M6
wx
11 M12 M13 M15 M14
10 M8 M9 M11 M10
Example 4-e
F (x, y, z) = (x + y + z) · (x + y + z 0 ) · (x + y 0 + z 0 )
· (x0 + y + z) · (x0 + y 0 + z 0 )
Solution:
This expression includes maxterms 0, 1, 3, 4, and 7. These appear in a Karnaugh map:
F (x, y, z) yz
00 01 11 10
0 0 0 0
x
1 0 0
Next we encircle the largest adjacent blocks, where the number of cells in each block is
a power of two. Notice that maxterm M0 appears in two groups.
F (x, y, z) yz
00 01 11 10
0 0 0 0
x
1 0
0
From this Karnaugh map it is very easy to write the function as a minimal product of
sums:
F (x, y, z) = (x + y) · (y + z) · (y 0 + z 0 )
There are situations where some minterms (or maxterms) are irrelevant in a function.
This might occur, say, if certain input conditions are impossible in the design. As an
example, assume that you have an application where the exclusive or (XOR) operation
is required. The symbol for the operation and its truth table are shown in Figure 4.15.
The minterms required to implement this operation are:
x ⊕ y = x · y 0 + x0 · y
This is the simplest form of the XOR operation. It requires two AND gates, two NOT
gates, and an OR gate for realization.
4.4. CRASH COURSE IN ELECTRONICS 77
x y x⊕y
x 0 0 0
y x⊕y
0 1 1
1 0 1
1 1 0
But let us say that we have the additional information that the two inputs, x and y
can never be 1 at the same time. Then we can draw a Karnaugh map with an “×” for the
minterm that cannot exist as shown in Figure 4.16. The “×” represents a “don’t care”
cell — we don’t care whether this cell is grouped with other cells or not.
F (x, y) y
0 1
0 1
x
1 1 ×
Figure 4.16: A “don’t care” cell on a Karnaugh map. Since x and y cannot both be 1 at
the same time, we don’t care if the cell xy = 11 is included in our groupings
or not.
Since the cell that represents the minterm x · y is a “don’t care”, we can include it
in our minimization groupings, leading to the two groupings shown in Figure 4.17. We
F (x, y) y
0 1
0 1
x
1 ×
1
Figure 4.17: Karnaugh map for xor function if we know x = y = 1 cannot occur.
easily recognize this Karnaugh map as being realizable with a single OR gate, which
saves one OR gate and an AND gate.
+ +
c c
voltage
- Power voltage
-
time c Supply c time
- AC - DC
Figure 4.18: AC/DC power supply.
Computer circuits use DC power. They distinguish between two different voltage
levels to provide logical 0 and 1. For example, logical 0 may be represented by 0.0 volts
and logical 1 by +2.5 volts. Or the reverse may be used — +2.5 volts as logical 0 and
0.0 volts as logical 1. The only requirement is that the hardware design be consistent.
Fortunately, programmers do not need to be concerned about the actual voltages used.
Electrical engineers typically think of the AC characteristics of a circuit in terms
of an ongoing sinusoidal voltage. Although DC power is used, computer circuits are
constantly switching between the two voltage levels. Computer hardware engineers
need to consider circuit element time characteristics when the voltage is suddenly
switched from one level to another. It is this transient behavior that will be described in
the following sections.
the effects of each of these properties, we will consider the electronic devices that are
used to add one of these properties to a specific location in a circuit; namely, resistors,
capacitors, and inductors. Each of these circuit devices has a different relationship
between the voltage difference across the device and the current flowing through it.
A resistor irreversibly transforms electrical energy into heat. It does not store energy.
The relationship between voltage and current for a resistor is given by the equation
v=iR (4.34)
where v is the voltage difference across the resistor at time t, i is the current flowing
through it at time t, and R is the value of the resistor. Resistor values are specified in
ohms. The circuit shown in Figure 4.19 shows two resistors connected in series through
a switch to a battery. The battery supplies 2.5 volts. The Greek letter Ω is used to
A i 1.0 kΩ B
+
2.5 v 1.5 kΩ
−
indicate ohms, and kΩ indicates 103 ohms. Since current can only flow in a closed path,
none flows until the switch is closed.
Both resistors are in the same path, so when the switch is closed the same current
flows through each of them. The resistors are said to be connected in series. The total
resistance in the path is their sum:
R = 1.0 kΩ + 1.5 kΩ
= 2.5 × 103 ohms (4.35)
The amount of current can be determined from the application of Equation 4.34. Solving
for i,
v
i=
R
2.5 volts
=
2.5 × 103 ohms
= 1.0 × 10−3 amps
= 1.0 ma (4.36)
where “ma” means “milliamps.”
We can now use Equation 4.34 to determine the voltage difference between points A
and B.
vAB = i R
= 1.0 × 10−3 amps × 1.0 × 103 ohms
= 1.0 volts (4.37)
Similarly, the voltage difference between points B and C is
vBC = i R
= 1.0 × 10−3 amps × 1.5 × 103 ohms
= 1.5 volts (4.38)
4.4. CRASH COURSE IN ELECTRONICS 80
Figure 4.20 shows the same two resistors connected in parallel. In this case, the
it A
i1 i2
+
2.5 v 1.0 kΩ 1.5 kΩ
−
voltage across the two resistors is the same: 2.5 volts when the switch is closed. The
current in each one depends upon its resistance. Thus,
v
i1 =
R1
2.5 volts
=
1.0 × 103 ohms
= 2.5 × 10−3 amps
= 2.5 ma (4.39)
and
v
i2 =
R2
2.5 volts
=
1.5 × 103 ohms
= 1.67 × 10−3 amps
= 1.67 ma (4.40)
The total current, it , supplied by the battery when the switch is closed is divided at point
A to supply both the resistors. It must equal the sum of the two currents through the
resistors,
it = i1 + i2
= 2.5 ma + 1.67 ma
= 4.17 ma (4.41)
A capacitor stores energy in the form of an electric field. It reacts slowly to voltage
changes, requiring time for the electric field to build. The voltage across a capacitor
changes with time according to the equation
1 t
Z
v= i dt (4.42)
C 0
A i 1.0 kΩ B
+
2.5 v 1.0 µf
−
Figure 4.21: Capacitor in series with a resistor; vAB is the voltage across the resistor
and vBC is the voltage across the capacitor.
Assuming the voltage across the capacitor, vBC , is 0.0 volts when the switch is first
closed, current flows through the resistor and capacitor. The voltage across the resistor
plus the voltage across the capacitor must be equal to the voltage available from the
battery. That is,
2.5 = i R + vBC (4.43)
If we assume that the voltage across the capacitor, vBC , is 0.0 volts when the switch
is first closed, the full voltage of the battery, 2.5 volts, will appear across the resistor.
Thus, the initial current flow in the circuit will be
2.5 volts
iinitial =
1.0 kΩ
= 2.5 ma (4.44)
As the voltage across the capacitor increases, according to Equation 4.42, the voltage
across the resistor, vAB , decreases. This results in an exponentially decreasing build
up of voltage across the capacitor. When it finally equals the voltage of the battery, the
voltage across the resistor is 0.0 volts and current flow in the circuit becomes zero. The
rate of the exponential decrease is given by the product RC, called the time constant.
Using the values of R and C in Figure 4.21 we get
Thus, assuming the capacitor in Figure 4.21 has 0.0 volts across it when the switch is
closed, the voltage that develops over time is given by
−3
vBC = 2.5 (1 − e−t/10 ) (4.46)
This is shown in Figure 4.22. At the time t = 1.0 millisecond (one time constant), the
voltage across the capacitor is
−3
/10−3
vBC = 2.5 (1 − e−10 )
−1
= 2.5 (1 − e )
= 2.5 × 0.63
= 1.58 volts (4.47)
4.4. CRASH COURSE IN ELECTRONICS 82
2.5 0
2 0.5
1.5 1
vBC , volts vAB , volts
1 1.5
0.5 2
0 2.5
0 2 4 6 8 10
msec.
Figure 4.22: Capacitor charging over time in the circuit in Figure 4.21. The left-hand
y-axis shows voltage across the capacitor, the right-hand voltage across
the resistor.
After 6 time constants of time have passed, the voltage across the capacitor has
reached
−3
/10−3
vBC = 2.5 (1 − e−6×10 )
−6
= 2.5 (1 − e )
= 2.5 × 0.9975
= 2.49 volts (4.48)
At this time the voltage across the resistor is essentially 0.0 volts and current flow is
very low.
Inductors are not used in logic circuits. In the typical PC, they are found as part of the
CPU power supply circuitry. If you have access to the inside of a PC, you can probably
see a small (∼1 cm. in diameter) donut-shaped device with wire wrapped around it on
the motherboard near the CPU. This is an inductor used to smooth the power supplied
to the CPU.
An inductor stores energy in the form of a magnetic field. It reacts slowly to current
changes, requiring time for the magnetic field to build. The relationship between voltage
at time t across an inductor and current flow through it is given by the equation
di
v=L (4.49)
dt
where L is the value of the inductor in henrys.
Figure 4.23 shows an inductor connected in series with a resistor. When the switch
A 1.0 µh B
i
+
2.5v 1.0 kΩ
−
is open no current flows through this circuit. Upon closing the switch, the inductor
4.4. CRASH COURSE IN ELECTRONICS 83
initially impedes the flow of current, taking time for a magnetic field to be built up in
the inductor.
At this initial point no current is flowing through the resistor, so the voltage across it,
vBC , is 0.0 volts. The full voltage of the battery, 2.5 volts, appears across the inductor,
vAB . As current begins to flow through the inductor the voltage across the resistor, vBC ,
grows. This results in an exponentially decreasing voltage across the inductor. When it
finally reaches 0.0 volts, the voltage across the resistor is 2.5 volts and current flow in
the circuit is 2.5 ma.
The rate of the exponential voltage decrease is given by the time constant L/R. Using
the values of R and L in Figure 4.23 we get
When the switch is closed, the voltage that develops across the inductor over time is
given by
−9
vAB = 2.5 × e−t/10 (4.51)
This is shown in Figure 4.24. Note that after about 6 nanoseconds (6 time constants) the
2.5 0
2 0.5
1.5 1
vBC , volts vAB , volts
1 1.5
0.5 2
0 2.5
0 2 4 6 8 10
nanosec.
Figure 4.24: Inductor building a magnetic field over time in the circuit in Figure 4.23.
The left-hand y-axis shows voltage across the resistor, the right-hand volt-
age across the inductor.
voltage across the inductor is essentially equal to 0.0 volts. At this time the full voltage
of the battery is across the resistor and a steady current of 2.5 ma flows.
This circuit in Figure 4.23 illustrates how inductors are used in a CPU power supply.
The battery in this circuit represents the computer power supply, and the resistor
represents the load provided by the CPU. The voltage produced by a power supply
includes noise, which consists of small, high-frequency fluctuations added to the DC
level. As can be seen in Figure 4.24, the voltage supplied to the CPU, vBC , changes little
over short periods of time.
to represent 0. Logic circuits are constructed from components that can switch between
these the high and low voltages.
The basic switching device in today’s computer logic circuits is the metal-oxide-
semiconductor field-effect transistor (MOSFET). Figure 4.25 shows a NOT gate imple-
mented with a single MOSFET. The MOSFET in this circuit is an n-type. You can think
VDD
R
output
input
VSS
of it as a three-terminal device. The input terminal is called the gate. The terminal
connected to the output is the drain, and the terminal connected to VSS is the source. In
this circuit the drain is connected to positive (high) voltage of a DC power supply, VDD ,
through a resistor, R. The source is connected to the zero voltage, VSS .
When the input voltage to the transistor is high, the gate acquires an electrical
charge, thus turning the transistor on. The path between the drain and the source of
the transistor essentially become a closed switch. This causes the output to be at the
low voltage. The transistor acts as a pull down device.
The resulting circuit is equivalent to Figure 4.26(a). In this circuit current flows from
VDD VDD
R R
input = high output input = low output
VSS VSS
(a) (b)
Figure 4.26: Single transistor switch equivalent circuit; (a) switch closed; (b) switch
open.
VDD to VSS through the resistor R. The output is connected to VSS , that is, 0.0 volts. The
current flow through the resistor and transistor is
VDD − VSS
i= (4.52)
R
The problem with this current flow is that it uses power just to keep the output low.
If the input is switched to the low voltage, the transistor turns off, resulting in the
equivalent circuit shown in Figure 4.26(b). The output is typically connected to another
transistor’s input (its gate), which draws essentially no current, except during the time
4.4. CRASH COURSE IN ELECTRONICS 85
it is switching from one state to the other. In the steady state condition the output
connection does not draw current. Since no current flows through the resistor, R, there
is no voltage change across it. So the output connection will be at VDD volts, the high
voltage. The resistor is acting as the pull up device.
These two states can be expressed in the truth table
input output
low high
high low
VDD
input output
input output
VSS 0 1
1 0
Figure 4.28(a) shows the equivalent circuit with a high voltage input. The pull up
transistor (a p-type) is off, and the pull down transistor (an n-type) is on. This results in
the output being pulled down to the low voltage. In Figure 4.28(b) a low voltage input
turns the pull up transistor on and the pull down transistor off. The result is the output
is pulled up to the high voltage.
4.5. NAND AND NOR GATES 86
VDD VDD
VSS VSS
(a) (b)
Figure 4.28: CMOS inverter equivalent circuit; (a) pull up open and pull down closed;
(b) pull up closed and pull down open.
Figure 4.29 shows an AND gate implemented with CMOS transistors. (See Exercise
VDD
VDD
x
A output
VSS
y
x y A output
0 0 1 0
VSS 0 1 1 0
1 0 1 0
1 1 0 1
4-12.) Notice that the signal at point A is NOT(x AND y). The circuit from point A to the
output is a NOT gate. It requires two fewer transistors than the AND operation. We will
examine the implications of this result in Section 4.5.
• NAND — a binary operator; the result is 0 if and only if both operands are 1;
otherwise the result is 1. We will use (x · y)0 to designate the NAND operation. It is
also common to use the ’↑’ symbol or simply “NAND”. The hardware symbol for the
NAND gate is shown in Figure 4.30. The inputs are x and y. The resulting output,
(x · y)0 , is shown in the truth table in this figure.
x y (x · y)0
x 0 0 1
y (x · y)0
0 1 1
1 0 1
1 1 0
• NOR — a binary operator; the result is 0 if at least one of the two operands is 1;
otherwise the result is 1. We will use (x + y)0 to designate the NOR operation. It is
also common to use the ’↓’ symbol or simply “NOR”. The hardware symbol for the
NOR gate is shown in Figure 4.31. The inputs are x and y. The resulting output,
(x + y)0 , is shown in the truth table in this figure.
x y (x + y)0
x 0 0 1
y (x + y)0
0 1 0
1 0 0
1 1 0
The small circle at the output of the NAND and NOR gates signifies “NOT”, just as
with the NOT gate (see Figure 4.3). Although we have explicitly shown NOT gates when
inputs to gates are complemented, it is common to simply use these small circles at the
input. For example, Figure 4.32 shows an OR gate with both inputs complemented. As
x y (x0 + y 0 )
x 0 0 1
y (x0 + y 0 )
0 1 1
1 0 1
1 1 0
the truth table in this figure shows, this is an alternate way to draw a NAND gate. See
Exercise 4-14 for an alternate way to draw a NOR gate.
One of the interesting properties about NAND gates is that it is possible to build
AND, OR, and NOT gates from them. That is, the NAND gate is sufficient to implement
any Boolean function. In this sense, it can be thought of as a universal gate.
First, we construct a NOT gate. To do this, simply connect the signal to both inputs
of a NAND gate, as shown in Figure 4.33.
4.5. NAND AND NOR GATES 88
x (x · x)0 = x0
(x · y)0 = x0 + y 0
(x + y 0 )0 = (x0 )0 · (y 0 )0
0
=x·y (4.53)
(x · y)0
x x·y
y
(x0 · y 0 )0 = (x0 )0 + (y 0 )0
=x+y (4.54)
we use three NAND gates connected as shown in Figure 4.35 to create an OR gate.
x
x+y
y
It may seem like we are creating more complexity in order to build circuits from
NAND gates. But consider the function
F (w, x, y, z) = (w · x) + (y · z) (4.55)
Without knowing how logic gates are constructed, it would be reasonable to implement
this function with the circuit shown in Figure 4.36. Using the involution property
w
x
(w · x) + (y · z)
y
z
Figure 4.36: The function in Equation 4.55 using two AND gates and one OR gate.
(Equation 4.15) it is clear that the circuit in Figure 4.37 is equivalent to the one in Figure
4.36.
4.6. EXERCISES 89
w
x
(w · x) + (y · z)
y
z
Figure 4.37: The function in Equation 4.55 using two AND gates, one OR gate and four
NOT gates.
Next, comparing the AND-gate/NOT-gate combination with Figure 4.30, we see that
each is simply a NAND gate. Similarly, comparing the NOT-gates/OR-gate combination
with Figure 4.32, it is also a NAND gate. Thus we can also implement the function in
Equation 4.55 with three NAND gates as shown in Figure 4.38.
w
x
(w · x) + (y · z)
y
z
Figure 4.38: The function in Equation 4.55 using only three NAND gates.
From simply viewing the circuit diagrams, it may seem that we have not gained
anything in this circuit transformation. But we saw in Section 4.4.3 that a NAND gate
requires fewer transistors than an AND gate or OR gate due to the signal inversion
properties of transistors. Thus, the NAND gate implementation is a less expensive and
faster implementation.
The conversion from an AND/OR/NOT gate design to one that uses only NAND gates
is straightforward:
1. Express the function as a minimal SoP.
2. Convert the products (AND terms) and the final sum (OR) to NANDs.
3. Add a NAND gate for any product with only a single literal.
As with software, hardware design is an iterative process. Since there usually is not a
unique solution, you often need to develop several designs and analyze each one within
the context of the available hardware. The example above shows that two solutions that
look the same on paper may be dissimilar in hardware.
In Chapter 6 we will see how these concepts can be used to construct the heart of a
computer — the CPU.
4.6 Exercises
4-1 (§4.1) Prove the identity property expressed by Equations 4.3 and 4.4.
4-2 (§4.1) Prove the commutative property expressed by Equations 4.5 and 4.6.
4-3 (§4.1) Prove the null property expressed by Equations 4.7 and 4.8.
4-4 (§4.1) Prove the complement property expressed by Equations 4.9 and 4.10.
4-5 (§4.1) Prove the idempotent property expressed by Equations 4.11 and 4.12.
4-6 (§4.1) Prove the distributive property expressed by Equations 4.13 and 4.14.
4.6. EXERCISES 90
4-9 (§4.3.2) Show where each minterm is located with this Karnaugh map axis labeling
using the notation of Figure 4.10.
F (x, y, z) xy
00 01 11 10
0
z
1
4-10 (§4.3.2) Show where each minterm is located with this Karnaugh map axis labeling
using the notation of Figure 4.10.
F (x, y, z) xz
00 01 11 10
0
y
1
4-11 (§4.3.2) Design a logic function that detects the prime single-digit numbers. As-
sume that the numbers are coded in 4-bit BCD (see Section 3.6.1, page 55). The
function is 1 for each prime number.
4-12 (§4.4.3) Using drawings similar to those in Figure 4.28, verify that the logic circuit
in Figure 4.29 is an AND gate.
4-13 (§4.5) Show that the gate in Figure 4.32 is a NAND gate.
4-14 (§4.5) Give an alternate way to draw a NOR gate, similar to the alternate NAND
gate in Figure 4.32.
4-15 (§4.5) Design a circuit using NAND gates that detects the “below” condition for
two 2-bit values. That is, given two 2-bit variables x and y, F (x, y) = 1 when the
unsigned integer value of x is less than the unsigned integer value of y.
Logic Circuits
In this chapter we examine how the concepts in Chapter 4 can be used to build some of
the logic circuits that make up a CPU, Memory, and other devices. We will not describe
an entire unit, only a few small parts. The goal is to provide an introductory overview
of the concepts. There are many excellent books that cover the details. For example,
see [20], [23], or [24] for circuit design details and [28], [31], [34] for CPU architecture
design concepts.
Logic circuits can be classified as either
• Combinational Logic Circuits — the output(s) depend only on the input(s) at any
specific time and not on any previous input(s).
• Sequential Logic Circuits — the output(s) depend both on previous and current
input(s).
An example of the two concepts is a television remote control. You can enter a number
and the output (a particular television channel) depends only on the number entered. It
does not matter what channels been viewed previously. So the relationship between the
input (a number) and the output is combinational.
The remote control also has inputs for stepping either up or down one channel.
When using this input method, the channel selected depends on what channel has been
previously selected and the sequence of up/down button pushes. The channel up/down
buttons illustrate a sequential input/output relationship.
Although a more formal definition will be given in Section 5.3, this television example
also illustrates the concept of state. My television remote control has a button I can push
that will show the current channel setting. If I make a note of the beginning channel
setting, and keep track of the sequence of channel up and down button pushes, I will
know the ending channel setting. It does not matter how I originally got to the beginning
channel setting. The channel setting is the state of the channel selection mechanism
because it tells me everything I need to know in order to select a new channel by using
a sequence of channel up and down button pushes.
91
5.1. COMBINATIONAL LOGIC CIRCUITS 92
xi yi Carryi+1 Sumi
0 0 0 0
0 1 0 1
1 0 0 1
1 1 1 0
where xi is the ith bit of the multiple bit value, x; yi is the ith bit of the multiple bit
value, y; Sumi is the ith bit of the multiple bit value, Sum; Carryi+1 is the carry from
adding the next-lower significant bits, xi , yi .
full adder: A combinational logic device that has three 1-bit inputs, Carryi , xi , and yi ,
and two outputs that are related by the truth table:
where xi is the ith bit of the multiple bit value, x; yi is the ith bit of the multiple bit
value, y; Sumi is the ith bit of the multiple bit value, Sum; Carryi+1 is the carry from
adding the next-lower significant bits, xi , yi , and Carryi .
First, let us look at the Karnaugh map for the sum:
Sumi xi yi
00 01 11 10
0 1 1
Carryi
1 1 1
There are no obvious groupings. We can write the function as a sum of product terms
from the Karnaugh map.
Sumi (Carryi , xi , yi ) = Carryi0 · x0i · yi + Carryi0 · xi · yi0
(5.1)
+ Carryi · x0i · yi0 + Carryi · xi · yi
In the Karnaugh map for carry:
Carryi+1 xi yi
00 01 11 10
0 1
Carryi
1 1 1 1
5.1. COMBINATIONAL LOGIC CIRCUITS 93
Carryi+1 xi yi
00 01 11 10
0 1
Carryi
1 1 1
1
Equations 5.1 and 5.2 lead directly to the circuit for an adder in Figure 5.1.
xi
yi
Carryi
Sumi
Carryi+1
For a different approach, we look at the definition of half adder. The sum is simply
the XOR of the two inputs, and the carry is the AND of the two inputs. This leads to the
circuit in Figure 5.2.
xi
yi Sumi
Carryi+1
Thus,
You should be able to see two other possible groupings on this Karnaugh map and may
wonder why they are not circled here. The two ungrouped minterms, Carryi · x0i · yi and
Carryi · xi · yi0 , form a pattern that suggests an exclusive or operation.
Notice that the first product term in Equation 5.5, xi · yi , is generated by the Carry
portion of a half-adder, and that the exclusive or portion, xi ⊕ yi , of the second product
term is generated by the Sum portion. A logic gate implementation of a full adder is
shown in Figure 5.3. You can see that it is implemented using two half adders and an
xi
yi Sumi
Carryi+1
Carryi
OR gate. And now you understand the terminology “half adder” and “full adder.”
We cannot say which of the two adder circuits, Figure 5.1 or Figure 5.3, is better from
just looking at the logic circuits. Good engineering design depends on many factors, such
as how each logic gate is implemented, the cost of the logic gates and their availability,
etc. The two designs are given here to show that different approaches can lead to
different, but functionally equivalent, designs.
x3 y3 x2 y2 x1 y1 x0 y0
c3 c2 c1
c4
s3 s2 s1 s0
s=x+y
CF = c4
OF = c3 ⊕ c4
referred to as a “slice”) of the total width of the values being added, and the carry
“ripples” from the lowest-order place to the highest-order.
The final carry from the highest-order full adder, c4 in the 4-bit adder of Figure 5.4,
is stored in the CF bit of the Flags register (see Section 6.2). And the exclusive or of the
final carry and penultimate carry, c4 ⊕ c3 in the 4-bit adder of Figure 5.4, is stored in the
OF bit.
Recall that in the 2’s complement code for storing integers a number is negated by
taking its 2’s complement. So we can subtract y from x by doing:
x − y = x + (2’s complement of y)
= x + [(y’s bits flipped) + 1] (5.6)
Thus, subtraction can be performed with our adder in Figure 5.4 if we complement
each yi and set the initial carry in to 1 instead of 0. Each yi can be complemented by
XOR-ing it with 1. This leads to the 4-bit circuit in Figure 5.5 that will add two 4-bit
numbers when f unc = 0 and subtract them when f unc = 1.
There is, of course, a time delay as the sum is computed from right to left. The
computation time can be significantly reduced through more complex circuit designs
that pre-compute the carry.
5.1.3 Decoders
Each instruction must be decoded by the CPU before the instruction can be carried
out. In the x86-64 architecture the instruction for copying the 64 bits of one register to
another register is
0100 0s0d 1000 1001 11ss sddd
where “ssss” specifies the source register and “dddd” specifies the destination register.
(Yes, the bits that specify the registers are distributed through the instruction in this
manner. You will learn more about this seemingly odd coding pattern in Chapter 9.) For
example,
0100 0001 1000 1001 1100 0101
causes the ALU to copy the 64-bit value in register 0000 to register 1101. You will see in
Chapter 9 that this instruction is written in assembly language as:
5.1. COMBINATIONAL LOGIC CIRCUITS 96
x3 y3 x2 y2 x1 y1 x0 y0
c3 c2 c1
c4
s3 s2 s1 s0
if (f unc == 0)
s=x+y
else // f unc == 1
s=x−y
CF = c4
OF = c3 ⊕ c4
decoder: A device with n binary inputs and 2n binary outputs. Each bit pattern at the
input causes exactly one of the 2n to equal 1.
A decoder can be thought of as converting an n-bit input to a 2n output. But while the
input can be an arbitrary bit pattern, each corresponding output value has only one bit
set to 1.
In some applications not all the 2n outputs are used. For example, Table 5.1 is a truth
table that shows how a decoder can be used to convert a BCD value to its corresponding
decimal numeral display. A 1 in a “display” column means that is the numeral that is
input display
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
x3 x2 x1 x0 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 1 0 0 0 0 0 0 0 0 1 0
0 0 1 0 0 0 0 0 0 0 0 1 0 0
0 0 1 1 0 0 0 0 0 0 1 0 0 0
0 1 0 0 0 0 0 0 0 1 0 0 0 0
0 1 0 1 0 0 0 0 1 0 0 0 0 0
0 1 1 0 0 0 0 1 0 0 0 0 0 0
0 1 1 1 0 0 1 0 0 0 0 0 0 0
1 0 0 0 0 1 0 0 0 0 0 0 0 0
1 0 0 1 1 0 0 0 0 0 0 0 0 0
Table 5.1: BCD decoder. The 4-bit input causes the numeral with a 1 in its column to be
displayed.
selected by the corresponding 4-bit input value. There are six other possible outputs
5.1. COMBINATIONAL LOGIC CIRCUITS 97
corresponding to the input values 1010 – 1111. But these input values are illegal in BCD,
so these outputs are simply ignored.
It is common for decoders to have an additional input that is used to enable the
output. The truth table in Table 5.2 shows a decoder with a 3-bit input, an enable line,
and an 8-bit (23 ) output. The output is 0 whenever enable = 0. When enable = 1, the ith
enable x2 x1 x0 y7 y6 y5 y4 y3 y2 y1 y0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 1
1 0 0 1 0 0 0 0 0 0 1 0
1 0 1 0 0 0 0 0 0 1 0 0
1 0 1 1 0 0 0 0 1 0 0 0
1 1 0 0 0 0 0 1 0 0 0 0
1 1 0 1 0 0 1 0 0 0 0 0
1 1 1 0 0 1 0 0 0 0 0 0
1 1 1 1 1 0 0 0 0 0 0 0
Table 5.2: Truth table for a 3 × 8 decoder with enable. If enable = 0, y = 0. If enable = 1,
x = i ⇒ yi = 1 and yj = 0 for all j 6= i.
output bit is 1 if and only if the binary value of the input is equal to i. For example, when
enable = 1 and x = 0112 , y = 000010002 . That is,
y3 = x02 · x1 · x0
= m3 (5.7)
This clearly generalizes such that we can give the following description of a decoder:
1. For n input bits (excluding an enable bit) there are 2n output bits.
2. The ith output bit is equal to the ith minterm for the n input bits.
The 3 × 8 decoder specified in Table 5.2 can be implemented with 4-input AND gates as
shown in Figure 5.6.
Decoders are more versatile than it might seem at first glance. Each possible input
can be seen as a minterm. Since each output is one only when a particular minterm
evaluates to one, a decoder can be viewed as a “minterm generator.” We know that any
logical expression can be represented as the OR of minterms, so it follows that we can
implement any logical expression by ORing the output(s) of a decoder.
For example, let us rewrite Equation 5.1 for the Sum expression of a full adder using
minterm notation (see Section 4.3.2):
enable x2 x2 x1 x1 x0 x0
y0
y1
y2
y3
y4
y5
y6
y7
where the subscripts on x, y, and Carry refer to the bit slice and the subscripts on m are
part of the minterm notation. We can implement a full adder with a 3 × 8 decoder and
two 4-input OR gates, as shown in Figure 5.7.
m0
m1
xi m2
3×8 Sumi
yi m3
Carryi decoder m
4
m5
m6
m7 Carryi+1
Enable
Figure 5.7: Full adder implemented with 3 × 8 decoder. This is for one bit slice. An n-bit
adder would require n of these circuits.
5.1. COMBINATIONAL LOGIC CIRCUITS 99
5.1.4 Multiplexers
There are many places in the CPU where one of several signals must be selected to
pass onward. For example, as you will see in Chapter 9, a value to be added by the ALU
may come from a CPU register, come from memory, or actually be stored as part of the
instruction itself. The device that allows this selection is essentially a switch.
multiplexer: A device that selects one of multiple inputs to be passed on as the output
based on one or more selection lines. Up to 2n inputs can be selected by n selection
lines. Also called a mux.
Figure 5.8 shows a multiplexer that can switch between two different inputs, x and y.
The select input, s, determines which of the sources, either x or y, is passed on to the
output. The action of this 2-way multiplexer is most easily seen in a truth table:
s Output
1 x
0 y
x
y
Output
Here is a truth table for a multiplexer that can switch between four inputs, w, x, y,
and z:
s1 s0 Output
0 0 w
0 1 x
1 0 y
1 1 z
That is,
Output = s00 · s01 · w + s00 · s1 · x + s0 · S10 · y + s0 · s1 · z (5.10)
which is implemented as shown in Figure 5.9. The symbol for this multiplexer is shown in
Figure 5.10. Notice that the selection input, s, must be 2 bits in order to select between
four inputs. In general, a 2n -way multiplexer requires an n-bit selection input.
5.2. PROGRAMMABLE LOGIC DEVICES 100
w
x
y
z
Output
s0 s1
w 0
x 1
Output
y 2
z 3 Sel
S0 , S1
x y
F1 (x, y) F2 (x, y)
Figure 5.11: Simplified circuit for a programmable logic array. The “S” shaped line at
the inputs to each gate represent fuses. The fuses are “blown” to remove
that input.
Programmable Logic Array (PLA): Both the AND gate plane and the OR gate plane
are programmable.
Read Only Memory (ROM): Only the OR gate plane is programmable.
Programmable Array Logic (PAL): Only the AND gate plane is programmable.
w x y z
F1 F2 F3
Figure 5.12: Programmable logic array schematic. The horizontal lines to the AND
gate inputs represent multiple wires — one for each input variable and its
complement. The vertical lines to the OR gate inputs also represent multiple
wires — one for each AND gate output. The dots represent connections.
Referring again to Figure 5.11, we see that the output from each AND gate is
connected to each of the OR gates. Therefore, the OR gates also have multiple inputs —
one for each AND gate — and the vertical lines leading to the OR gate inputs represent
multiple wires. The PLA in Figure 5.12 has been programmed to provide the three
functions:
F1 (w, x, y, z) = w0 · y · z + w · x · z 0 (5.11)
0 0 0
F2 (w, x, y, z) = w · x · y · z 0
(5.12)
0
F3 (w, x, y, z) = w · y · z + w · x · z 0
(5.13)
And the OR gate plane has been programmed to store the four characters (in ASCII
code):
5.3. SEQUENTIAL LOGIC CIRCUITS 103
a1 a0
× ×
× ×
× ×
× ×
d7 d6 d5 d4 d3 d2 d1 d0
Figure 5.13: Eight-byte Read Only Memory (ROM). The “×” connections represent
permanent connections. Each AND gate can be thought of as producing an
address. The eight OR gates produce one byte. The connections (dots) in
the OR plane represent the bit pattern stored at the address.
w x y z
F1 F2
Figure 5.14: Two-function Programmable Array Logic (PAL). The “×” connections repre-
sent permanent connections. Each AND gate can be thought of as producing
an address. The eight OR gates produce one byte. The connections (dots)
in the OR plane represent the bit pattern stored at the address.
output is observed. Sequential logic circuits, on the other hand, have a time history.
That history is summarized by the current state of the circuit.
state: The state of a system is the description of the system such that knowing
(a) the state at time t0 , and
(b) the input(s) from time t0 through time t1 ,
uniquely determines
(c) the state at time t1 , and
(d) the output(s) from time t0 through time t1 .
This definition means that knowing the state of a system at a given time tells you
everything you need to know in order to specify its behavior from that time on. How it
got into this state is irrelevant.
This definition implies that the system has memory in which the state is stored. Since
there are a finite number of states, the term finite state machine(FSM) is commonly
used. Inputs to the system can cause the state to change.
5.3. SEQUENTIAL LOGIC CIRCUITS 105
If the output(s) depend only on the state of the FSM, it is called a Moore machine.
And if the output(s) depend on both the state and the current input(s), it is called a
Mealy machine.
The most commonly used sequential circuits are synchronous — their action is
controlled by a sequence of clock pulses. The clock pulses are created by a clock
generator circuit. The clock pulses are applied to all the sequential elements, thus
causing them to operate in synchrony.
Asynchronous sequential circuits are not based on a clock. They depend upon a
timing delay built into the individual elements. Their behavior depends upon the order
in which inputs are applied. Hence, they are difficult to analyze and will not be discussed
in this book.
6 6 6 6
? ? ? ?
(c) Negative-edge trigger.
Time -
Figure 5.15: Clock signals. (a) For level-triggered circuits. (b) For positive-edge trigger-
ing. (c) For negative-edge triggering.
In Figure 5.15(a), the circuit operations take place during the entire time the clock is
at the 1 level. As will be explained below, this can lead to unreliable circuit behavior. In
order to achieve more reliable behavior, most circuits are designed such that a transition
of the clock signal triggers the circuit elements to start their respective operations.
Either a positive-going (Figure 5.15(b)) or negative-going (Figure 5.15(c)) transition may
be used. The clock frequency must be slow enough such that all the circuit elements
have time to complete their operations before the next clock transition (in the same
direction) occurs.
5.3. SEQUENTIAL LOGIC CIRCUITS 106
5.3.2 Latches
A latch is a storage device that can be in one of two states. That is, it stores one bit. It
can be constructed from two or more gates connected such that feedback maintains the
state as long as power is applied. The most fundamental latch is the SR (Set-Reset).
A simple implementation using NOR gates is shown in Figure 5.16. When Q = 1
S Q0
Q
R
Current Next
S R State State
0 0 0 0
0 0 1 1
0 1 0 0
0 1 1 0
1 0 0 1
1 0 1 1
1 1 0 X
1 1 1 X
Table 5.3: SR latch state table. “X” indicates an indeterminate state. A circuit using this
latch must be designed to prevent this input combination.
Notice that placing 1 on both the Set and Reset inputs at the same time causes
a problem. Then the outputs of both NOR gates would become 0. In other words,
Q = Q0 = 0, which is logically impossible. The circuit design must be such to prevent this
input combination.
The behavior of an SR latch can also be shown by the state diagram in Figure 5.17
A state diagram is a directed graph. The circles show the possible states. Lines with
10
00 0 SR 1 00
01 10
01
Figure 5.17: State diagram for an SR latch. There are two possible inputs, 00 or 01, that
cause the latch to remain in state 0. Similarly, 00 or 10 cause it to remain
in state 1. Since the output is simply the state, it is not shown in this state
diagram. Notice that the input 11 is not allowed, so it is not shown on the
diagram.
arrows show the possible transitions between the states and are labeled with the input
that causes the transition.
The two circles in Figure 5.17 show the two possible states of the SR latch — 0 or
1. The labels on the lines show the two-bit inputs, SR, that cause each state transition.
Notice that when the latch is in state 0 there are two possible inputs, SR = 00 and
SR = 01, that cause it to remain in that state. Similarly, when it is in state 1 either of the
two inputs, SR = 00 or SR = 10, cause it to remain in that state.
The output of the SR latch is simply the state so is not shown separately on this state
diagram. In general, if the output of a circuit is dependent on the input, it is often shown
on the directed lines of the state diagram in the format “input/output.” If the output is
dependent on the state, it is more common to show it in the corresponding state circle
in “state/output” format.
NAND gates are more commonly used than NOR gates, and it is possible to build
an SR latch from NAND gates. Recalling that NAND and NOR have complementary
properties, we will think ahead and use S 0 and R0 as the inputs, as shown in Figure 5.18.
Consider the four possible input combinations.
S0 Q
Q0
R0
If Q = 1 and Q0 = 0, the output of the upper NAND gate is (1 · 0)0 = 1, and the output
of the lower NAND gate is (1 · 1)0 = 0.
Thus, the cross feedback between the two NAND gates maintains the state — Set
or Reset — of the latch.
S’ = 0, R’ = 1: Set. If Q = 1 and Q0 = 0, the output of the upper NAND gate is (0 · 0)0 = 1,
and the output of the lower NAND gate is (1 · 1)0 = 0. The latch remains in the Set
state.
If Q = 0 and Q0 = 1, the output of the upper NAND gate is (0 · 1)0 = 1. This causes
the output of the lower NAND gate to become (1 · 1)0 = 0. The feedback from the
output of the lower NAND gate to the input of the upper keeps the output of the
upper NAND gate at (0 · 0)0 = 1. The latch has moved into the Set state.
S’ = 1, R’ = 0: Reset. If Q = 0 and Q0 = 1, the output of the lower NAND gate is
(0 · 0)0 = 1, and the output of the upper NAND gate is (1 · 1)0 = 0. The latch remains
in the Reset state.
If Q = 1 and Q0 = 0, the output of the lower NAND gate is (1 · 0)0 = 1. This is fed
back to the input of the upper NAND gate to give (1 · 1)0 = 0. The feedback from the
output of the upper NAND gate to the input of the lower keeps the output of the
lower NAND gate at (0 · 0)0 = 1. The latch has moved into the Reset state.
S’ = 0, R’ = 0: Not allowed. If Q = 0 and Q0 = 1, the output of the upper NAND gate
is (0 · 1)0 = 1. This is fed back to the input of the lower NAND gate to give (1 · 0)0 = 1
as its output. The feedback from the output of the lower NAND gate to the input of
the upper maintains its output as (0 · 0)0 = 1. Thus, Q = Q0 = 1, which is not allowed.
If Q = 1 and Q0 = 0, the output of the lower NAND gate is (1 · 0)0 = 1. This is fed back
to the input of the upper NAND gate to give (0 · 1)0 = 1 as its output. The feedback
from the output of the upper NAND gate to the input of the lower maintains its
output as (1 · 1)0 = 0. Thus, Q = Q0 = 1, which is not allowed.
Figure 5.19 shows the behavior of a NAND-based S’R’ latch. The inputs to a NAND-
based S’R’ latch are normally held at 1, which maintains the current state, Q. Its current
state is available at the output. Momentarily changing S 0 or R0 to 0 causes the state to
change to Set or Reset, respectively, as shown in the “Next State” column.
Notice that placing 0 on both the Set and Reset inputs at the same time causes
a problem. Then the outputs of both NOR gates would become 0. In other words,
Q = Q0 = 0, which is logically impossible. The circuit design must be such to prevent this
input combination.
So the S’R’ latch implemented with two NAND gates can be thought of as the com-
plement of the NOR gate SR latch. The state is maintained by holding both S 0 and 0 at 1.
S 0 = 0 causes the state to be 1 (Set), and R0 = 0 causes the state to be 0 (Reset). Using S 0
and R0 as the activating signals are usually called active-low signals.
You have already seen that ones and zeros are represented by either a high or low
voltage in electronic logic circuits. A given logic device may be activated by combinations
5.3. SEQUENTIAL LOGIC CIRCUITS 109
01
11 0 S’R’ 1 11
10 01
10
Current Next
S0 R0 State State
1 1 0 0
1 1 1 1
1 0 0 0
1 0 1 0
0 1 0 1
0 1 1 1
0 0 0 X
0 0 1 X
Figure 5.19: State table and state diagram for an S’R’ latch. There are two possible
inputs, 11 or 10, that cause the latch to remain in state 0. Similarly, 11 or 01
cause it to remain in state 1. Since the output is simply the state, it is not
shown in this state diagram. Notice that the input 00 is not allowed, so it is
not shown on the diagram.
of the two voltages. To show which is used to cause activation at any given input, the
following definitions are used:
active-high signal: The higher voltage represents 1.
active-low signal: The lower voltage represents 1.
Warning! The definitions of active-high versus active-low signals vary in the literature. Make
sure that you and the people you are working with have a clear agreement on the definitions
you are using.
S
Q
Control
Q0
R
NAND gates remain at 1 as long as Control = 0. Table 5.4 shows the state behavior of
the SR latch with control.
5.3. SEQUENTIAL LOGIC CIRCUITS 110
Current Next
Control S R State State
0 − − 0 0
0 − − 1 1
1 0 0 0 0
1 0 0 1 1
1 0 1 0 0
1 0 1 1 0
1 1 0 0 1
1 1 0 1 1
1 1 1 0 X
1 1 1 1 X
Table 5.4: SR latch with Control state table. “–” indicates that the value does not
matter. “X” indicates an indeterminate state. A circuit using this latch must
be designed to prevent this input combination.
It is clearly better if we could find a design that eliminates the possibility of the “not
allowed” inputs. Table 5.5 is a state table for a D latch. It has two inputs, one for control,
the other for data, D. D = 1 sets the latch to 1, and D = 0 resets it to 0.
Current Next
Control D State State
0 − 0 0
0 − 1 1
1 0 0 0
1 0 1 0
1 1 0 1
1 1 1 1
Table 5.5: D latch with Control state table. “–” indicates that the value does not matter.
The D latch can be implemented as shown in Figure 5.21. The one data input, D, is
D S
Q
Control
R Q0
fed to the “S” side of the SR latch; the complement of the data value is fed to the “R”
side.
Now we have a circuit that can store one bit of data, using the D input, and can
be synchronized with a clock signal, using the Control input. Although this circuit is
reliable by itself, the issue is whether it is reliable when connected with other circuit
elements. The D signal almost certainly comes from an interconnection of combinational
and sequential logic circuits. If it changes while the Control is still 1, the state of the
latch will be changed.
5.3. SEQUENTIAL LOGIC CIRCUITS 111
Each electronic element in a circuit takes time to activate. It is a very short period of
time, but it can vary slightly depending upon precisely how the other logic elements are
interconnected and the state of each of them when they are activated. The problem here
is that the Control input is being used to control the circuit based on the clock signal
level. The clock level must be maintained for a time long enough to allow all the circuit
elements to complete their activity, which can vary depending on what actions are being
performed. In essence, the circuit timing is determined by the circuit elements and their
actions instead of the clock. This makes it very difficult to achieve a reliable design.
It is much easier to design reliable circuits if the time when an activity can be
triggered is made very short. The solution is to use edge-triggered logic elements. The
inputs are applied and enough time is allowed for the electronics to settle. Then the
next clock transition activates the circuit element. This scheme provides concise timing
under control of the clock instead of timing determined more or less by the particular
circuit design.
5.3.3 Flip-Flops
Although the terminology varies somewhat in the literature, it is generally agreed that
(see Figure 5.15.):
• A latch uses a level based clock signal.
• A flip-flop is triggered by a clock signal edge.
At each “tick” of the clock, there are four possible actions that might be taken on a
single bit — store 0, store 1, complement the bit (also called toggle), or leave it as is.
A D flip-flop is a common device for storing a single bit. We can turn the D latch into
a D flip-flop by using two D latches connected in a master/slave configuration as shown
in Figure 5.22. Let us walk through the operation of this circuit.
Master Slave
D
Q0
CK
The bit to be stored, 0 or 1, is applied to the D input of the Master D latch. The
clock signal is applied to the CK input. It is normally 0. When the clock signal makes a
transition from 0 to 1, the Master D latch will either Reset or Set, following the D input
of 0 or 1, respectively.
While the CK input is at the 1 level, the control signal to the Slave D latch is 1, which
deactivates this latch. Meanwhile, the output of this flip-flop, the output of the Slave D
latch, is probably connected to the input of another circuit, which is activated by the
same CK. Since the state of the Slave does not change during this clock half-cycle, the
second circuit has enough time to read the current state of the flip-flop connected to its
input. Also during this clock half-cycle, the state of the Master D latch has ample time
to settle.
5.3. SEQUENTIAL LOGIC CIRCUITS 112
When the CK input transitions back to the 0 level, the control signal to the Master
D latch becomes 1, deactivating it. At the same time, the control input to the Slave D
latch goes to 0, thus activating the Slave D latch to store the appropriate value, 0 or 1.
The new input will be applied to the Slave D latch during the second clock half-cycle,
after the circuit connected to its output has had sufficient time to read its previous state.
Thus, signals travel along a path of logic circuits in lock step with a clock signal.
There are applications where a flip-flop must be set to a known value before the
clocking begins. Figure 5.23 shows a D flip-flop with an asynchronous preset input
added to it. When a 1 is applied to the P R input, Q becomes 1 and Q0 0, regardless of
PR
D
Q
Q0
CK
what the other inputs are, even CLK. It is also common to have an asynchronous clear
input that sets the state (and output) to 0.
There are more efficient circuits for implementing edge-triggered D flip-flops, but
this discussion serves to show that they can be constructed from ordinary logic gates.
They are economical and efficient, so are widely used in very large scale integration
circuits. Rather than draw the details for each D flip-flop, circuit designers use the
symbols shown in Figure 5.24. The various inputs and outputs are labeled in this figure.
CLR CLR
D Q D Q
Q1 Q2
CK Q CK Q
PR PR
(a) (b)
Figure 5.24: Symbols for D flip-flops. Includes asynchronous clear (CLR) and preset
(PR). (a) Positive-edge triggering; (b) Negative-edge triggering.
5.25.
0 0 T 1 0
Current Next
T State State
0 0 0
0 1 1
1 0 1
1 1 0
Figure 5.25: T flip-flop state table and state diagram. Each clock tick causes a state
transition, with the next state depending on the current state and the value
of the input, T .
To determine the value that must be presented to the D flip-flop in order to implement
a T flip-flop, we add a column for D to the state table as shown in Table 5.6. By simply
Current Next
T State State D
0 0 0 0
0 1 1 1
1 0 1 1
1 1 0 0
Table 5.6: T flip-flop state table showing the D flip-flop input required to place the T
flip-flop in the next state.
looking in the “Next State” column we can see what the input to the D flip-flop must be
in order to obtain the correct state. These values are entered in the D column. (We will
generalize this design procedure in Section 5.4.)
From Table 5.6 it is easy to write the equation for D:
D = T 0 · Q + T · Q0
=T ⊕Q (5.16)
D Q Q T Q
T
Q1 Q2
CK CK Q Q0 CK Q
(a) (b)
Figure 5.26: T flip-flop. (a) Circuit using a D flip-flop. (b) Symbol for a T flip-flop.
5.3. SEQUENTIAL LOGIC CIRCUITS 114
Implementing all four possible actions — set, reset, keep, toggle — requires two
inputs, J and K, which leads us to the JK flip-flop. The state table and state diagram for
a JK flip-flop are shown in Figure 5.27.
10
11
00 0 JK 1 00
01 10
01
11
Current Next
J K State State
0 0 0 0
0 0 1 1
0 1 0 0
0 1 1 0
1 0 0 1
1 0 1 1
1 1 0 1
1 1 1 0
In order to determine the value that must be presented to the D flip-flop we add a
column for D to the state table as shown in Table 5.7. shows what values must be input
Current Next
J K State State D
0 0 0 0 0
0 0 1 1 1
0 1 0 0 0
0 1 1 0 0
1 0 0 1 1
1 0 1 1 1
1 1 0 1 1
1 1 1 0 0
Table 5.7: JK flip-flop state table showing the D flip-flop input required to place the JK
flip-flop in the next state.
D = J 0 · K 0 · Q + J · K 0 · Q0 + J · K 0 · Q + J · K · Q0
= J · Q0 · (K 0 + K) + K 0 · Q · (J + J 0 )
= J · Q0 + K 0 · Q (5.17)
CLR
J J Q
D Q Q Q2
CK
K Q1
K Q
CK CK Q Q0 PR
(a) (b)
Figure 5.28: JK flip-flop. (a) Circuit using a D flip-flop. (b) Symbol for a JK flip-flop with
asynchronous CLR and PR inputs.
Example 5-a
Design a counter that has an Enable input. When Enable = 1 it increments through
the sequence 0, 1, 2, 3, 0, 1,. . . with each clock tick. Enable = 0 causes the counter to
remain in its current state.
Solution:
1
0 1 2 0
1 1
Enable = 0 Enable = 1
Current Next Next
0 0 3 0
n n n 1
0 0 1
1 1 2
2 2 3
3 3 0
Enable = 0 Enable = 1
Current Next Next
n1 n0 n1 n0 n1 n0
0 0 0 0 0 1
0 1 0 1 1 0
1 0 1 0 1 1
1 1 1 1 0 0
Enable = 0 Enable = 1
Current Next Next
n1 n0 n1 n0 J1 K1 J0 K0 n1 n0 J1 K1 J0 K0
0 0 0 0 0 X 0 X 0 1 0 X 1 X
0 1 0 1 0 X X 0 1 0 1 X X 1
1 0 1 0 X 0 0 X 1 1 X 0 1 X
1 1 1 1 X 0 X 0 0 0 X 1 X 1
Notice the “don’t care” entries in the state table. Since the JK flip-flop is so versatile,
including the “don’t cares” helps find simpler circuit realizations. (See Exercise
5-3.)
5. We use Karnaugh maps, using E for Enable.
5.4. DESIGNING SEQUENTIAL CIRCUITS 117
J0 (E, n1 , n0 ) n1 n0 K0 (E, n1 , n0 ) n1 n0
00 01 11 10 00 01 11 10
0 X X 0 X X
E E
1 1 X X 1 1 X 1 1 X
J1 (E, n1 , n0 ) n1 n0 K1 (E, n1 , n0 ) n1 n0
00 01 11 10 00 01 11 10
0 X X 0 X X
E E
1 1 X X 1 X X 1
J0 (E, n1 , n0 ) = E
K0 (E, n1 , n0 ) = E
J1 (E, n1 , n0 ) = E · n0
K1 (E, n1 , n0 ) = E · n0
Enable J Q n0
Q0
CK
J Q n1
Q1
CK
CLK
The timing of the binary counter is shown here when counting through the sequence 3,
0, 1, 2, 3 (11, 00, 01, 10, 11).
5.4. DESIGNING SEQUENTIAL CIRCUITS 118
1
n1
0
1
Q1 .JK
0
1
n0
0
1
Q0 .JK
0
CLK 0
Q1 Q0 11 00 01 10 11
Qi .JK is the input to the ith JK flip-flop, and ni is its output. (Recall that J = K in this
design.) When the ith input, Qi .JK, is applied to its JK flip-flop, remember that the state
of the flip-flop does not change until the second half of the clock cycle. This can be seen
when comparing the trace for the corresponding output, ni , in the figure.
Note the short delay after a clock transition before the value of each ni actually
changes. This represents the time required for the electronics to completely settle to
the new values.
Except for very inexpensive microcontrollers, most modern CPUs execute instructions
in stages. An instruction passes through each stage in an assembly-line fashion, called a
pipeline. The action of the first stage is to fetch the instruction from memory, as will be
explained in Chapter 6.
After an instruction is fetched from memory, it passes onto the next stage. Simulta-
neously, the first stage of the CPU fetches the next instruction from memory. The result
is that the CPU is working on several instructions at the same time. This provides some
parallelism, thus improving execution speed.
Almost all programs contain conditional branch points — places where the next
instruction to be fetched can be in one of two different memory locations. Unfortunately,
the decision of which of the two instructions to fetch is not known until the decision-
making instruction has moved several stages into the pipeline. In order to maintain
execution speed, as soon as a conditional branch instruction has passed on from the
fetch stage, the CPU needs to predict where to fetch the next instruction from.
In this next example we will design a circuit to implement a prediction circuit.
Example 5-b
Design a circuit that predicts whether a conditional branch is taken or not. The
predictor continues to predict the same outcome, take the branch or do not take the
branch, until it makes two mistakes in a row.
Solution:
1. We use “Yes” to indicate when the branch is taken and “No” to indicate when it is
not. The state diagram shows four states:
5.4. DESIGNING SEQUENTIAL CIRCUITS 119
No fromYes
N No Yes
N
Y N N Y
fromNo Yes
No Yes Y
Y
Let us begin in the “No” state. The prediction is that the next branch will also not
be taken. The notation in the state bubbles is output
state
, showing that the output in this
state is also “No.”
The input to the circuit is whether or not the branch was actually taken. The arc
labeled “N” shows the transition when the branch was not taken. It loops back to
the “No” state, with the prediction (the output) that the branch will not be taken
the next time. If the branch is taken, the “Y” arc shows that the circuit moves into
the “fromNo” state, but still predicting no branch the next time.
From the “fromNo” state, if the branch is not taken (the prediction is correct), the
circuit returns to the “No” state. However, if the branch is taken, the “Y” shows
that the circuit moves into the “Yes” state. This means that the circuit predicted
incorrectly twice in a row, so the prediction is changed to “Yes.”
You should be able to follow this state diagram for the other cases and convince
yourself that both the “fromNo” and “fromYes” states are required.
2. Next we look at the state table:
T aken = N o T aken = Y es
Current Next Next
State Prediction State Prediction State Prediction
No No No No f romN o No
f romN o No No No Y es Y es
f romY es Y es No No Y es Y es
Y es Y es f romY es Y es Y es Y es
Since there are four states, we need two bits. We will let 0 represent “No” and 1
represent “Yes.” The input is whether the branch is actually taken (1) or not (0).
And the output is the prediction of whether it will be taken (1) or not (0).
We choose a binary code for the state, s1 s0 , such that the high-order bit represents
the prediction, and the low-order bit what the last input was. That is:
State Prediction s1 s0
No No 0 0
f romN o No 0 1
f romY es Y es 1 0
Y es Y es 1 1
Input = 0 Input = 1
Current N ext N ext
s1 s0 s1 s0 s1 s0
0 0 0 0 0 1
0 1 0 0 1 1
1 0 0 0 1 1
1 1 1 0 1 1
Input = 0 Input = 1
Current N ext N ext
s1 s0 s1 s0 J1 K1 J0 K0 s1 s0 J1 K1 J0 K0
0 0 0 0 0 X 0 X 0 1 0 X 1 X
0 1 0 0 0 X X 1 1 1 1 X X 0
1 0 0 0 X 1 0 X 1 1 X 0 1 X
1 1 1 0 X 0 X 1 1 1 X 0 X 0
J0 (In, s1 , s0 ) s1 s0 K0 (In, s1 , s0 ) s1 s0
00 01 11 10 00 01 11 10
0 X X 0 X 1 1 X
In In
1 1 X X 1 1 X X
J1 (In, s1 , s0 ) s1 s0 K1 (In, s1 , s0 ) s1 s0
00 01 11 10 00 01 11 10
0 X X 0 X X 1
In In
1 1 X X 1 X X
J0 (In, s1 , s0 ) = In
K0 (In, s1 , s0 ) = In0
J1 (In, s1 , s0 ) = In · s0
K1 (In, s1 , s0 ) = In0 · s00
Actual J Q s0
Q0
CK
K Q
J Q s1 = P rediction
Q1
CK
CLK
5.5.1 Registers
Registers are used in places where small amounts of very fast memory is required. Many
are found in the CPU where they are used for numerical computations, temporary data
storage, etc. They are also used in the hardware that serves to interface between the
CPU and other devices in the computer system.
We begin with a simple 4-bit register, which allows us to store four bits. Figure 5.29
shows a design for implementing a 4-bit register using D flip-flops. As described above,
each time the clock cycles the state of each of the D flip-flops is set according to the
value of d = d3 d2 d1 d0 . The problem with this circuit is that any changes in any of the di s
will change the state of the corresponding bit in the next clock cycle, so the contents of
the register are essentially valid for only one clock cycle.
One-cycle buffering of a bit pattern is sufficient for some applications, but there is
also a need for registers that will store a value until it is explicitly changed, perhaps
billions of clock cycles later. The circuit in Figure 5.30 uses adds a load signal and
feedback from the output of each bit. When load = 1 each bit is set according to its
corresponding input, di . When load = 0 the output of each bit, ri , is used as the input,
giving no change. So this register can be used to store a value for as many clock cycles
as desired. The value will not be changed until load is set to 1.
Most computers need many general purpose registers. When two or more registers
are grouped together, the unit is called a register file. A mechanism must be provided
for addressing one of the registers in the register file.
Consider a register file composed of eight 4-bit registers, r0 – r7. We could build eight
copies of the circuit shown in Figure 5.30. Let the 4-bit data input, d, be connected in
parallel to all of the corresponding data pins, d3 d2 d1 d0 , of each of the eight registers.
5.5. MEMORY ORGANIZATION 122
d3 d2 d1 d0
D Q r0
Q0
CK
D Q r1
Q1
CK
D Q r2
Q2
CK
D Q r3
Q3
CLK CK
Figure 5.29: A 4-bit register. A D flip-flop is used to hold each bit. The state of the ith
bit is set by the value of di at each clock tick. The 4-bit value stored in the
register is r = r3 r2 r1 r0 .
Three bits are required to address one of the registers (23 = 8). If the 8-bit output from
a 3 × 8 decoder is connected to the eight load inputs of each of the registers, d will be
loaded into one, and only one, of the registers during the next clock cycle. All the other
registers will have load = 0, and they will simply maintain their current state. Selecting
the output from one of the eight registers can be done with four 8-input multiplexers.
One such multiplexer is shown in Figure 5.31. The inputs r0i – r7i are the ith bits from
each of eight registers, r0 – r7. One of the eight registers is selected for the 1-bit output,
Reg_Outi , by the 3-bit input Reg_Sel. Keep in mind that four of these output circuits
would be required for 4-bit registers. The same Reg_Sel would be applied to all four
multiplexers simultaneously in order to output all four bits of the same register. Larger
registers would, of course, require correspondingly more multiplexers.
There is another important feature of this design that follows from the master/slave
property of the D flip-flops. The state of the slave portion does not change until the
second half of the clock cycle. So the circuit connected to the output of this register can
read the current state during the first half of the clock cycle, while the master portion is
preparing to change the state to the new contents.
d3 d2 d1 d0
load
D Q r0
Q0
CK
D Q r1
Q1
CK
D Q r2
Q2
CK
D Q r3
Q3
CK
CLK
Figure 5.30: A 4-bit register with load. The storage portion is the same as in Figure
5.29. When load = 1 each bit is set according to its corresponding input,
di . When load = 0 the output of each bit, ri , is used as the input, giving no
change.
r0i
r1i
0
r2i 1
2
r3i 3
4
Reg_Outi
r4i 5
r5i 6
Sel
7
r6i 3
r7i Reg_Sel
Figure 5.31: 8-way mux to select output of register file. This only shows the output of
the ith bit. n are required for n-bit registers. Reg_Sel is a 3-bit signal that
selects one of the eight inputs.
si D Q r0
Q0
CK
D Q r1
Q1
CK
D Q r2
Q2
CK
D Q r3
Q3
CLK CK
Figure 5.32: Four-bit serial-to-parallel shift register. A D flip-flop is used to hold each
bit. Bits arrive at the input, si , one at a time. The last four input bits are
available in parallel at r3 – r0 .
of bits is input at si . At each clock tick, the output of Q0 is applied to the input of Q1 , thus
copying the previous value of r0 to the new r1 . The state of Q0 changes to the value of the
new si , thus copying this to be the new value of r0 . The serial stream of bits continues to
ripple through the four bits of the shift register. At any time, the last four bits in the
serial stream are available in parallel at the four outputs, r3 ,. . . ,r0 .
The same circuit could be used to provide a time delay of four clock ticks in a serial
bit stream. Simply use r3 as the serial output.
Enable In Out
0 0 highZ
0 1 highZ
1 0 0
1 1 1
and its circuit symbol is shown in Figure 5.33. When Enable = 1 the output, which is
Enable
In Out
equal to the input, is connected to whatever circuit element follows the tri-state buffer.
But when Enable = 0, the output is essentially disconnected. Be careful to realize that
this is different from 0; being disconnected means it has no effect on the circuit element
to which it is connected.
A 4-way multiplexer using a 2 × 4 decoder and four tri-state buffers is illustrated in
Figure 5.34. Compare this design with the 4-way multiplexer shown in Figure 5.9, page
x
Output
y
s0 2×4
s1 decoder
Figure 5.34: Four way multiplexer built from tri-state buffers. Output = w, x, y, or z,
depending on which one is selected by s1 s0 fed into the decoder. Compare
with Figure 5.9, page 100.
100. The tri-state buffer design may not be an advantage for small multiplexers. But an
n-way multiplexer without tri-state buffers requires an n-input OR gate, which presents
some technical electronic problems.
Figure 5.35 shows how tri-state buffers can be used to implement a single memory
cell. This circuit shows only one 4-bit memory cell so you can compare it with the register
design in Figure 5.29, but it scales to much larger memories. W rite is asserted to store
data in the D flip-flops. Read enables the output tri-state buffer in order to connect
the single output line to M em_data_out. The address decoder is also used to enable the
tri-state buffers to connect a memory cell to the output, r3 r2 r1 r0 .
This type of memory is called Static Random Access Memory (SRAM). “Static” because
the memory retains its stored values as long as power to the circuit is maintained.
“Random access” because it takes the same length of time to access the memory at any
address.
A 1 MB memory requires a 20 bit address. This requires a 20 × 220 address decoder
as shown in Figure 5.36. Recall from Section 5.1.3 (page 95) that an n × 2n decoder
requires 2n AND gates. We can simplify the circuitry by organizing memory into a grid
of rows and columns as shown in Figure 5.37. Although two decoders are required, each
5.5. MEMORY ORGANIZATION 126
d3 d2 d1 d0
Read
addrj
W rite
D Q r0
Q0
CK
D Q r1
Q1
CK
D Q r2
Q2
CK
D Q r3
Q3
CK
CLK
Figure 5.35: 4-bit memory cell. Each bit is output through a tri-state buffer. addrj is one
output from a decoder corresponding to an address.
requires 2n/2 AND gates, for a total of 2 × 2n/2 = 2(n/2)+1 AND gates for the decoders. Of
course, memory cell access is slightly more complex, and some complexity is added in
order to split the 20-bit address into two 10-bit portions.
Write
20 20 × 220 220
Address 1 MB Mem.
Decoder
Read
Data
Figure 5.36: Addressing 1 MB of memory with one 20 × 220 address decoder. The short
line through the connector lines indicates the number of bits traveling in
parallel in that connection.
10 10 × 210
Decoder
210
Write
20
Address
10 10 × 210 210
1 MB Mem.
Decoder
Read
Data
These operations take more time than simply switching flip-flops, so DRAM is ap-
preciably slower than SRAM. In addition, capacitors lose their charge over time. So
each row of capacitors must be read and refreshed in the order of every 60 msec. This
requires additional circuitry and further slows memory access. But the much lower cost
of DRAM compared to SRAM warrants the slower access time.
This has been only an introduction to how switching transistors can be connected
5.6. EXERCISES 128
Data Latch
into circuits to create a CPU. We leave the details to more advanced books, e.g., [20],
[23], [24], [28], [31], [34].
5.6 Exercises
The greatest benefit will be derived from these exercises if you either build the circuits
with hardware or using a simulation program. Several free circuit simulation applications
are available that run under GNU/Linux.
Table 6.1: X86-64 operating modes. Intel manuals use the term “IA-32e” and AMD
manuals use “Long” when running a 64-bit operating system. Both manuals
use the same terminology for the two sub-modes. Adapted from Table 1-1 in
[2].
In this book we describe the view of the CPU when running a 64-bit operating system.
Intel manuals call this the IA-32e mode and the AMD manuals call it the long mode. The
CPU can run in one of two sub-modes under a 64-bit operating system. Both manuals
use the same terminology for the two sub-modes.
• Compatibility mode – Most programs compiled for a 32-bit or 16-bit environment
can be run without re-compiling.
129
6.1. CPU OVERVIEW 130
Instruction Pointer
L1 Cache
Memory
Instruction Register
Control Unit
Registers
Arithmetic
/Logic Unit
Bus Interface
Flags Register
Figure 6.1: CPU block diagram. The CPU communicates with the Memory and I/O
subsystems via the Address, Data, and Control buses. See Figure 1.1 (page
3).
connected together through internal buses. Keep in mind that this is a highly simplified
diagram. Actual CPUs are much more complicated, but the general concepts discussed
in this chapter apply to all of them.
We will now describe briefly each of the subsystems in Figure 6.1. The descriptions
provided here are generic and apply to most CPUs. Components that are of particular
interest to a programmer are described within the context of the x86 ISA later in this
chapter.
Bus Interface: This is the means for the CPU to communicate with the rest of the
computer system — Memory and I/O Devices. It contains circuitry to place addresses
on the address bus, read and write data on the data bus, and read and write signals
on the control bus. The bus interface on many CPUs interfaces with external bus
control units that in turn interface with memory and with different types of I/O
buses, e.g., SATA, PCI-E, etc. The external control units are transparent to the
programmer.
L1 Cache Memory: Although it could be argued that this is not a part of the CPU, most
modern CPUs include very fast cache memory on the CPU chip. As you will see in
Section 6.4, each instruction must be fetched from memory. The CPU can execute
instructions much faster than they can be fetched. The interface with memory
makes it more efficient to fetch several instructions at one time, storing them in L1
cache where the CPU has very fast access to them. Many modern CPUs use two
L1 cache memories organized in a Harvard architecture — one for instructions,
the other for data. (See Section 1.2, page 4.) Its use is generally transparent to an
applications programmer.
6.2. CPU REGISTERS 131
Instruction Pointer: This is a 64-bit register that always contains the address of the
next instruction to be executed. See Section 6.2 for more details.
Instruction Register: This register contains the instruction that is currently being
executed. Its bit pattern determines what the Control Unit is causing the CPU to
do. Once that action has been completed, the bit pattern in the instruction register
can be changed, and the CPU will perform the operation specified by this next bit
pattern.
Most modern CPUs use an instruction queue that is built into the chip. Several instructions
are waiting in the queue, ready to be executed. Separate electronic circuitry keeps the
instruction queue full while the regular control unit is executing the instructions. But this
is simply an implementation detail that allows the control unit to run faster. The essence of
how the control unit executes a program is represented by the single instruction register
model.
Control Unit: The bits in the Instruction Register are decoded in the Control Unit. It
generates the signals that control the other subsystems in the CPU to carry out the
action(s) specified by the instruction. It is typically implemented as a finite-state
machine and contains Decoders (Section 5.1.3), Multiplexers (Section 5.1.4), and
other logic components.
Arithmetic Logic Unit (ALU): A device that performs arithmetic and logic operations
on groups of bits. The logic circuitry to perform addition is discussed in Section
5.1.1.
Flags Register: Each operation performed by the ALU results in various conditions
that must be recorded. For example, addition can produce a carry. One bit in the
Flags Register will be set to either zero (no carry) or one (carry) after the ALU has
completed any operation that may produce a carry.
We will now look at how the logic circuits discussed in Chapter 4 can be used to implement
some of these subsystems.
• CPU registers are accessed by using the names that are predefined in the assembler.
• Memory is accessed by the programmer providing a name for the memory location
and using that name in the user program.
6.2. CPU REGISTERS 132
Table 6.2: The x86-64 registers. Not all the registers shown here are discussed in this
chapter. Some are discussed in subsequent chapters that deal with the related
topic.
The x86-64 architecture registers are shown in Table 6.2. Each bit in each register is
numbered from right to left, beginning with zero. So the right-most bit is number 0, the
next one to the left number 1, etc. Since there are 64 bits in each register, the left-most
bit is number 63.
The general purpose registers can be accessed in the following ways:
• Quadword — all 64 bits [63 – 0].
• Doubleword — the low-order 32 bits [31 – 0].
• Word — the low-order 16 bits [15 – 0].
• Byte — the low-order 8 bits [7 – 0] (and in four registers bits [15 – 8]).
The assembler uses a different name for each group of bits in a register. The assembler
names for the groups of the bits are given in Table 6.3. In 64-bit mode, writing to an
8-bit or 16-bit portion of a register does not affect the other 56 or 48 bits in the register.
However, when writing to the low-order 32 bits, the high-order 32 bits are set to zero.
A pictorial representation of the naming of each portion of the general-purpose
registers is shown in Figure 6.2.
6.2. CPU REGISTERS 133
bits 63-0 bits 31-0 bits 15-0 bits 15-8 bits 7-0
rax eax ax ah al
rbx ebx bx bh bl
rcx ecx cx ch cl
rdx edx dx dh dl
rsi esi si sil
rdi edi di dil
rbp ebp bp bpl
rsp esp sp spl
r8 r8d r8w r8b
r9 r9d r9w r9b
r10 r10d r10w r10b
r11 r11d r11w r11b
r12 r12d r12w r12b
r13 r13d r13w r13b
r14 r14d r14w r14b
r15 r15d r15w r15b
Table 6.3: Assembly language names for portions of the general-purpose CPU registers.
Programs running in 32-bit mode can only use the registers above the line in
this table. 64-bit mode allows the use of all the registers. The ah, bh, ch, and
dh registers cannot be used with any of the (8-bit) registers below the line.
rax -
eax -
ax -
ah - al -
rsi -
esi -
si -
sil -
r8 -
r8d -
r8w -
r8b -
Figure 6.2: Graphical representation of general purpose registers. The three shown
here are representative of the pattern of all the general purpose registers.
6.2. CPU REGISTERS 134
The 8-bit register portions ah, bh, ch, and dh are a holdover from the Intel® 8086/8088
architecture. It had four 16-bit registers, ax, bx, cx, and dx. The low-order bytes were
named al, bl, cl, and dl and the high-order bytes named ah, bh, ch, and dh. Access
to these registers has been maintained in 32-bit mode for backward compatibility but
is limited in 64-bit mode. Access to the 8-bit low-order portions of the rsi, rdi, rsp,
and rbp registers was added along with the move to 64 bits in the x86-64 architecture
but cannot be used in the same instruction with the 8-bit register portions of the xh
registers.
When using less than the entire 64 bits in a register, it is generally bad to write code that
assumes the remaining portion is in any particular state. Such code is difficult to read and
leads to errors during its maintenance phase.
Although these are called “general purpose,” the descriptions in Table 6.4 show that
some of them have some special significance, depending upon how they are used. (Some
of the descriptions may not make sense to you at this point.) In this book, we will use
the rax, rdx, rdi, esi, and r8 – r15 registers for general-purpose storage. They will
be used just like variables in a high-level language. Usage of the rsp and rbp registers
follows a very strict discipline. You should not use either of them for your assembly
language programs until you understand how to use them.
The instruction pointer register, rip1 , always points to the next instruction to be
executed. As explained in Section 6.4 (page 136), every time an instruction is fetched,
1 In many other environments, the equivalent register is called the program counter.
6.3. CPU INTERACTION WITH MEMORY AND I/O 135
the rip register is automatically incremented by the control unit to contain the address of
the next instruction. Thus, the rip register is never directly accessed by the programmer.
On the other hand, every instruction that is executed affects the contents of the rip
register. Thus, the rip register is not a general-purpose register, but it guides the flow
of the entire program.
Most arithmetic and logical operations affect the condition codes in the rflags
register. The bits that are affected are shown in Figure 6.3.
11 10 9 8 7 6 5 4 3 2 1 0
OF SF ZF AF PF CF
Figure 6.3: Condition codes portion of the rflags register. The high-order 32 bits (32 –
63) are reserved for other use and are not shown here. Neither are bits 12 –
31, which are for system flags (see [3]).
The OF, SF, ZF, and CF are described at appropriate places in this book. See [3] and [14]
for descriptions of the other flags.
Two other registers are very important in a program. The rsp register is used as a
stack pointer, as will be discussed in Section 8.2 (page 176). The rbp register is typically
used as a base pointer; it will be discussed in Section 8.4 (page 188).
The “e” prefix on the 32-bit portion of each register name comes from the history of the x86
architecture. The introduction of the 80386 in 1986 brought an increase of register size
from 16 bits to 32 bits. There were no new registers. The old ones were simply “extended.”
• address bus
• data bus
• control bus
Data Bus
Address Bus
Control Bus
Figure 6.4: Subsystems of a computer. The CPU, Memory, and I/O subsystems communi-
cate with one another via the three busses. (Repeat of Figure 1.1.)
As an example of how data can be stored in memory, let us imagine that we have
some data in one of the CPU registers. Storing this data in memory is effected by setting
the states of a group of bits in memory to match those in the CPU register. The control
unit can be programmed to do this by
For example, if the eight bits in memory at address 0x7fffd9a43cef are in the state:
0x7fffd9a43cef: b7
the al register in the CPU is in the state:
%al: e2
and the control unit is programmed to store this value at location 0x7fffd9a43cef, the
control unit then
1. places 0x7fffd9a43cef on the address bus,
2. places the bit pattern e2 on the data bus, and
Important. When the state of any bit in memory or in a register is changed any
previous states are lost forever. There is no way to “undo” this state change or to
determine how the bit got in its current state.
action has been completed, the bit pattern in the instruction register can be changed,
and the CPU will perform the operation specified by this next bit pattern.
Most modern CPUs use an instruction queue. Several instructions are waiting in the queue,
ready to be executed. Separate electronic circuitry keeps the instruction queue full while
the regular control unit is executing the instructions. But this is simply an implementation
detail that allows the control unit to run faster. The essence of how the control unit executes
a program is represented by the single instruction register model.
Since instructions are simply bit patterns, they can be stored in memory. The instruc-
tion pointer register always has the memory address of (points to) the next instruction
to be executed. In order for the control unit to execute this instruction, it is copied into
the instruction register.
The situation is as follows:
Steps 3, 4, and 5 are called an instruction fetch. Notice that steps 3 – 8 constitute a
cycle, the instruction execution cycle. It is shown graphically in Figure 6.5.
6.4. PROGRAM EXECUTION IN THE CPU 138
Fetch the
instruction
pointed to by
Instruction
Pointer
Add number of
bytes in the
instruction to
Instruction
Pointer
Execute the
instruction
No
Is it the halt
instruction?
Yes
Stop CPU
the compiler use a CPU register for the int* ptr variable. The register modifier is
“advisory” only. See Exercise 6-3 for an example when the compiler may not be able to
honor our request.
1 /*
2 * gdbExample1.c
3 * Subtracts one from user integer.
4 * Demonstrate use of gdb to examine registers, etc.
5 * Bob Plantz - 5 June 2009
6 */
7
8 #include <stdio.h>
9
10 int main(void)
11 {
12 register int wye;
13 int *ptr;
14 int ex;
15
16 ptr = &ex;
17 ex = 305441741;
18 wye = -1;
19 printf("Enter an integer: ");
20 scanf("%i", ptr);
21 wye += *ptr;
22 printf("The result is %i\n", wye);
23
24 return 0;
25 }
Listing 6.1: Simple program to illustrate the use of gdb to view CPU registers.
We introduced some gdb commands in Chapter 2. Here are some additional ones
that will be used in this section:
• n — execute current source code statement of a program that has been running; if
it’s a call to a function, the entire function is executed.
• s — execute current source code statement of a program that has been running; if
it’s a call to a function, step into the function.
• si — execute current (machine) instruction of a program that has been running; if
it’s a call to a function, step into the function.
• i r — info registers — displays the contents of the registers, except floating point
and vector.
Here is a screen shot of how I compiled the program then used gdb to control the
execution of the program and observe the register contents. My typing is boldface and
the session is annotated in italics. Note that you will probably see different addresses if
you replicate this example on your own (Exercise 6-1).
The “-g” option is required. It tells the compiler to include debugger informa-
tion in the executable program.
(gdb) print ex
$1 = 305441741
(gdb) print &ex
$2 = (int *) 0x7fffffffe044
I use the print command to view the value assigned to the ex variable and
learn its memory address.
(gdb) help x
Examine memory: x/FMT ADDRESS.
ADDRESS is an expression for the memory address to examine.
FMT is a repeat count followed by a format letter and a size letter.
Format letters are o(octal), x(hex), d(decimal), u(unsigned decimal),
t(binary), f(float), a(address), i(instruction), c(char) and s(string).
Size letters are b(byte), h(halfword), w(word), g(giant, 8 bytes).
The specified number of objects of the specified size are printed
according to the format.
Defaults for format and size letters are those previously used.
Default count is 1. Default address is following last thing printed
with this command or "print".
The help command will provide very brief instructions on using a command.
We want to display values stored in specific memory locations in various for-
mats, and the help command provides a reminder of how to use the command.
I verify that the value assigned to the ex variable is stored at location 0x7fffffffe044.
Next, I examine all four bytes of the word, one byte at a time. In this display,
• 0xcd is stored in the byte at address 0x7fffffffe044,
• 0xab is stored in the byte at address 0x7fffffffe045,
• 0x34 is stored in the byte at address 0x7fffffffe046, and
• 0x12 is stored in the byte at address 0x7fffffffe047.
In other words, the byte-wise display appears to be backwards. This is due to
the values being stored in the little endian storage scheme as explained on
page 20 in Chapter 2.
I also examine all four bytes of the word, two bytes at a time. In this display,
6.5. USING GDB TO VIEW THE CPU REGISTERS 143
This shows how gdb displays these four bytes as though they represent two
16-bit ints stored in little endian format. (You can now see why I entered such
a strange integer in this demonstration run.)
The compiler has honored our request and allocated a register for the wye
variable. Registers are located in the CPU and do not have memory addresses,
so gdb cannot print the address. We will need to use the i r command to view
the register contents.
(gdb) i r
rax 0x7fffffffe044 140737488347204
rbx 0xffffffff 4294967295
rcx 0x4005e0 4195808
rdx 0x7fffffffe158 140737488347480
rsi 0x7fffffffe148 140737488347464
rdi 0x1 1
rbp 0x7fffffffe060 0x7fffffffe060
rsp 0x7fffffffe040 0x7fffffffe040
r8 0x400670 4195952
r9 0x7ffff7de9740 140737351948096
r10 0x7fffffffdec0 140737488346816
r11 0x7ffff7a3e680 140737348101760
r12 0x400480 4195456
r13 0x7fffffffe140 140737488347456
r14 0x0 0
r15 0x0 0
rip 0x4005a9 0x4005a9 <main+29>
eflags 0x202 [ IF ]
6.5. USING GDB TO VIEW THE CPU REGISTERS 144
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
The i r command displays the current contents of the CPU registers. The first
column is the name of the register. The second shows the current bit pattern
in the register, in hexadecimal. Notice that leading zeros are not displayed.
The third column shows some the register contents in 64-bit signed decimal.
The registers that always hold addresses are also shown in hexadecimal in
the third column. The columns are often not aligned due to the tabbing of the
display.
We see that the value in the ebx general purpose register is the same as that
stored in the wye variable, 0xffffffff.3 (Recall that ints are 32 bits, even in
64-bit mode.) We conclude that the compiler chose to allocate ebx as the wye
variable.
Notice the value in the rip register, 0x4005a9. Refer back to where I set the
breakpoint on source line 19. This shows that the program stopped at the
correct memory location.
It is only coincidental that the address of the ex variable is currently stored in
the rax register. If a general purpose register is not allocated as a variable
within a function, it is often used to store results of intermediate computations.
You will learn how to use registers this way in subsequent chapters of this
book.
(gdb) br 21
Breakpoint 2 at 0x4005ce: file gdbExample1.c, line 21.
(gdb) br 22
Breakpoint 3 at 0x4005d6: file gdbExample1.c, line 22.
These two breakpoints will allow us to examine the value stored in the wye
variable just before and after it is modified.
(gdb) cont
Continuing.
Enter an integer: 123
This verifies that the user’s input value is stored correctly and that the wye
variable has not yet been changed.
(gdb) cont
Continuing.
3 If this is not clear, you need to review Section 3.3.
6.6. EXERCISES 145
And this verifies that our (rather simple) algorithm works correctly.
We can specify which registers to display with the i r command. This verifies
that the rbx register is being used as the wye variable.
And we see that the rip has incremented from 0x4005a9 to 0x4005d6. Don’t
forget that the rip register always points to the next instruction to be executed.
(gdb) cont
Continuing.
The result is 122
[Inferior 1 (process 4463) exited normally]
(gdb) q
bob$
Finally, I continue to the end of the program. Notice that gdb is still running
and I have to quit the gdb program.
6.6 Exercises
6-1 (§6.2, §6.5) Enter the program in Listing 6.1 and trace through the program one
line at a time using gdb. Use the n command, not s or si. Keep a written record
of the rip register at the beginning of each line. Hint: use the i r command. How
many bytes of machine code are in each of the C statements in this program? Note
that the addresses you see in the rip register may differ from the example given in
this chapter.
6-2 (§6.2, §6.4) As you trace through the program in Exercise 6-1 stop on line 22:
wye += *ptr;
We determined in the example above that the %rbx register is used for the variable
wye. Inspect the registers.
a) What is the address of the first instruction that will be executed when you
enter the n command?
b) How will %rbx change when this statement is executed?
6-3 (§6.5) Modify the program in Listing 6.1 so that a register is also requested for the
ex variable. Were you able to convince the compiler to do this for you? Did the
compiler produce any error or warning messages? Why do you think the compiler
would not use a register for this variable.
6.6. EXERCISES 146
6-4 (§6.2, §6.5) Use the gdb debugger to observe the contents of memory in the program
from Exercise 2-31. Verify that your algorithm creates a null-terminated string
without the newline character.
6-5 (§6.2, §6.5) Write a program in C that allows you to determine the endianess of
your computer. Hint: use unsigned char* ptr.
6-6 (§6.2, §6.5) Modify the program in Exercise 6-5 so that you can demonstrate, using
gdb, that endianess is a property of the CPU. That is, even though a 32-bit int is
stored little endian in memory, it will be read into a register in the “proper” order.
Hint: declare a second int that is a register variable; examine memory one byte at
a time.
Chapter 10
The assembly language we have studied thus far is executed in sequence. In this chapter
we will learn how to organize assembly language instructions to implement the other
two required program flow constructs — repetition and binary decision.
Text string manipulations provide many examples of using program flow constructs,
so we will use them to illustrate many of the concepts. Almost any program displays
many text string messages on the screen, which are simply arrays of characters.
10.1 Repetition
The algorithms we choose when programming interact closely with the data storage
structure. As you probably know, a string of characters is stored in an array. Each
element of the array is of type char, and in C the end of the data is signified with a
sentinel value, the NUL character (see Table 2.3 on page 22).
The other technique for specifying the length of the string is to store the number of characters
in the string together with the string. This is implemented in Pascal by storing the number
of characters in the first byte of the array, and the actual characters are stored immediately
following.
9 int main(void)
10 {
11 char *aString = "Hello World.\n";
12
235
10.1. REPETITION 236
16 aString++;
17 }
18
19 return 0;
20 }
Listing 10.1: Displaying a string one character at a time (C).
3. If the boolean expression evaluates to true, program flow enters the {. . . } block
and executes the statements there in sequence.
4. At the end of the {. . . } block program flow jumps back up to the evaluation of the
boolean expression.
Initialize Loop
Control Variable
Evaluate false
Boolean
expression
true
Execute Body
of while loop
Next instruction
after while
loop construct
Figure 10.1: Flow chart of a while loop. The large diamond represents a binary decision
that leads to two possible paths, “true” or “false.” Notice the path that
leads back to the top of the while loop after the body has been executed.
Intel®
Syntax cmp destination, source
The cmp operation consists of subtracting the source operand from the destination
10.1. REPETITION 238
operand and setting the condition code bits in the rflags register accordingly. Neither
of the operand values is changed. The subtraction is done internally simply to get the
result and set the OF, SF, ZF, AF, PF, CF condition codes according to the result.
The other instruction is test. The syntax is
Intel®
Syntax test destination, source
The test operation consists of performing a bit-wise and between the two operands
and setting the condition codes in the rflags register accordingly. Neither of the
operand values is changed. The and operation is done internally simply to get the result
and set the SF, ZF, and PF condition codes according to the result. The OF and CF are
set to 0, and the AF value is undefined.
jcc label
Table 10.2 lists four conditional jumps that are commonly used when processing
unsigned values, and Table 10.3 lists four commonly used with signed values.
Since most instructions affect the settings of the condition codes in the rflags register,
each must be used immediately after the instruction that determines the conditions that
the programmer intends to cause the jump.
HINT: It is easy to forget how the order of the source and destination controls the conditional
jump in this construct. Here is a place where the debugger can save you time. Simply put a
breakpoint at the conditional jump instruction. When the program stops there, look at the
values in the source and destination. Then use the si debugger command to execute one
instruction and see where it goes.
1
The jump instructions bring up another addressing mode — rip-relative.
rip-relative: The target is a memory address determined by adding an offset to the
current address in the rip register.
“pc-relative.”
10.1. REPETITION 241
used to store the offset. Note that the offset is stored in two’s complement format to
allow for negative jumps.
For example, if the offset will fit into eight bits the opcode for the je instruction is
7416 , and it is 0f8416 if more than eight bits are required to store the offset (in which case
the offset is stored in as a thirty-two bit value). The machine code is shown in Table 10.4
for four different target address offsets. Notice that the 32-bit offsets are stored in little
endian order in memory.
Table 10.4: Machine code for the je instruction. Four different distances to the jump
target address. Notice that the 32-bit offsets are stored in little endian order.
jmp label
jmp *register
jmp *memory
BE CAREFUL: The unconditional jump uses “*” for indirection, while all other instructions
use “(register).” It might be tempting to use something like “*(%rax).” Although the
(. . . ) are not an error here, they are superfluous. They have essentially the same effect as
something like (x) in an algebraic expression.
The three ways to use an unconditional jump are shown in Listing 10.2.
1 # jumps.s
2 # demonstrates unconditional jumps
3 # Bob Plantz - 12 June 2009
4 # global variable
5 .data
6 pointer:
7 .quad 0
8 format:
9 .string "The jump pattern is %x.\n"
10.1. REPETITION 242
10 # code
11 .text
12 .globl main
13 .type main, @function
14 main:
15 pushq %rbp # save frame pointer
16 movq %rsp, %rbp # set new frame pointer
17
1 .file "helloWorld1.c"
2 .section .rodata
3 .LC0:
4 .string "Hello World.\n"
5 .text
6 .globl main
7 .type main, @function
8 main:
9 pushq %rbp
10 movq %rsp, %rbp
11 subq $16, %rsp
12 movq $.LC0, -8(%rbp) # pointer to string
13 jmp .L2 # go to bottom of loop
14 .L3:
15 movq -8(%rbp), %rax # load pointer to string
16 movl $1, %edx # 3rd arg. - 1 character
17 movq %rax, %rsi # 2nd arg. - pointer
18 movl $1, %edi # 1st arg. - standard out
19 call write
20 addq $1, -8(%rbp) # aString++;
21 .L2:
22 movq -8(%rbp), %rax # load pointer
23 movzbl (%rax), %eax # get current character
24 testb %al, %al # is it NUL?
25 jne .L3 # no, go to top of loop
26 movl $0, %eax
27 leave
28 ret
29 .size main, .-main
30 .ident "GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0"
31 .section .note.GNU-stack,"",@progbits
Listing 10.3: Displaying a string one character at a time (gcc assembly language). Com-
ments added.
Both versions have exactly the same number of instructions. However, the unconditional
jump instruction, jmp, is executed every time through the loop when testing at the top
but is executed only once in the compiler’s version. Thus, the compiler’s version is
more efficient. The savings is probably insignificant in the vast majority of applications.
However, if a loop is nested within another loop or two, the difference could be important.
We also see another version of the mov instruction on line 22 of the compiler’s version:
22 movzbl (%rax), %eax
This instruction converts the data size from 8-bit to 32-bit, placing zeros in the high-order
24 bits, as it copies the byte from memory to the eax register. The memory address of
the copied byte is in the rax register. (Yes, this instruction writes over the address in
the register as it executes.)
The x86-64 architecture includes instructions for extending the size of a value by
adding more bits to the left. There are two ways to do this:
• Sign extend — copy the sign bit to each of the new high-order bits. For example,
when sign extending an 8-bit value to 16 bits, 85 would become ff85, but 75 would
become 0075.
• Zero extend — make each of the new high-order bits zero. When zero extending
85 to sixteen bits, it becomes 0085.
10.1. REPETITION 245
where s denotes the size of the source operand and d the size of the destination operand.
s meaning number of bits
b byte 8
(Use the s column for d.) w word 16
l longword 32
q quadword 64
It can be used to move an 8-bit value from memory or a register into a 16-, 32-, or 64-bit
register; move a 16-bit value from memory or a register into a 32-bit register; or move a
32-bit value from memory or a register into a 64-bit register. The “s” causes the rest of
the high-order bits in the destination register to be a copy of the sign bit in the source
value. It does not affect the condition codes in the rflags register.
In the Intel syntax the instruction is movsx. The size of the data is determined by the
operands, so the size characters (b, w, l, or q) are not appended to the instruction, and
the order of the operands is reversed.
Intel®
Syntax movsx destination, source
In some cases the Intel syntax is ambiguous. Intel-syntax assemblers use keywords to specify
the data size in such cases. For example, the nasm assembler uses
movsx destination, BYTE [source]
to move one byte and zero extend, and uses
movsx destination, WORD [source]
to move two bytes and sign extend.
where s denotes the size of the source operand and d the size of the destination operand.
s meaning number of bits
b byte 8
(Use the s column for d.) w word 16
l longword 32
q quadword 64
It can be used to move an 8-bit value from memory or a register into a 16-, 32-, or 64-bit
register; or move a 16-bit value from memory or a register into a 32-bit register. The “z”
causes the rest of the high-order bits in the destination register to be set to zero. It does
not affect the condition codes in the rflags register. Recall that moving a 32-bit value
from memory or a register into a 64-bit register sets the high-order 32 bits to zero, so
there is no movzlq instruction.
In the Intel syntax the instruction is movzx The size of the data is determined by the
operands, so the size characters (b, w, l, or q) are not appended to the instruction, and
the order of the operands is reversed.
Intel®
Syntax movzx destination, source
10.1. REPETITION 246
There is also a set of instructions that double the size of data in portions of the rax
register, shown in Table 10.5. The doubling operation includes sign extention into the
affected higher-order portion of the register.
Table 10.5: Instructions to double data size. These instructions do not specify any
operands, but they may change the rax register.
Notice that these instructions do not explicitly specify any operands, but they may
change the rax register. They do not affect the condition codes in the rflags register.
Returning to while loops, the general structure of a count-controlled while loop is
shown in Listing 10.4.
1 # generalWhile.s
2 # general structure of a while loop (not a program)
3 #
4 # count = 10;
5 # while (count > 0)
6 # {
7 # // loop body
8 # count--;
9 # }
10 #
11 # Bob Plantz - 10 June 2009
12
This is not a complete program or even a function. It simply shows the key elements of
a while loop.
Loops, of course, take the most execution time in a program. However, in almost all cases
code readability is more important than efficiency. You should determine that a loop is an
efficiency bottleneck before sacrificing its structure for efficiency. And then you should
generously comment what you have done.
Our assembly language version of a “Hello world” program in Listing 10.5 uses a
sentinel-controlled while loop.
10.1. REPETITION 247
1 # helloWorld3.s
2 # "hello world" program using the write() system call
3 # one character at a time.
4 # Bob Plantz - 12 June 2009
5
6 # Useful constants
7 .equ STDOUT,1
8 # Stack frame
9 .equ aString,-8
10 .equ localSize,-16
11 # Read only data
12 .section .rodata
13 theString:
14 .string "Hello world.\n"
15 # Code
16 .text
17 .globl main
18 .type main, @function
19 main:
20 pushq %rbp # save base pointer
21 movq %rsp, %rbp # set new base pointer
22 addq $localSize, %rsp # for local var.
23
(*aString != ’\0’)
In particular, you have to move the address into a register, then dereference it with the
“(register)” syntax.
Be careful not to confuse this with the indirection operator, “*”, used with the jmp instruction
that you saw in Section 10.1.3, especially since the assembly language indirection operator
is the same as the dereference operator in C/C++.
There are two common errors when using the assembly language syntax.
• The assembly language dereference operator does not work on variable names.
For example, you cannot use
cmpb $0, (ptr(%rbp)) # *** DOES NOT WORK ***
nor
cmpb $0, (\$theString) # *** DOES NOT WORK ***
work to dereference the theString location. Unfortunately, the assembler may not
consider any of these to be syntax errors, just an unnecessary set of parentheses.
Therefore, you probably will not get an assembler error message, just incorrect
program behavior.
• Another common error is to forget to dereference the register once you get the
address stored in it:
cmpb $0, %esi # *** DOES NOT WORK ***
This would compare a byte in the eax register itself with the value zero. Since there
are four bytes in the eax register, this code will generate an assembler warning
message because it does not specify which byte.
BE CAREFUL: The C/C++ syntax for the NUL character, ’\0’, is not recognized by the gnu
assembler, as. From Table 2.3 we see that the bit pattern for the NUL character is 0x00, and
this value must be used in the gnu assembly language.
We also need to add one to the pointer variable so as to move it to the next character
in the string. Adding one is a common operation, so there is an operator that simply
adds one,
incs source
The inc instruction adds one to the source operand. The operand can be a register or a
memory location.
On line 35 of the program in Listing 10.5, incl is used to add one to the address
stored in memory:
10.1. REPETITION 249
BE CAREFUL: It is easy to think that the instruction ought to be incb since each character
is only one byte. The address in this program is 32 bits, so we have to use incl. And, of
course, when we use a 64-bit address, we need to use incq. Don’t forget that the value we
are adding one to is an address, not the value stored at that address.
Subtracting one from a counter is also a common operation. The dec instruction
subtracts one from an operand and sets the rflags register accordingly. The operand
can be a register or a memory location.
decs source
A decl instruction is used on line 27 in Listing 10.6 to both subtract one from
the counter variable and to set the condition codes in the rflags register for the jg
instruction.
1 # printStars.s
2 # prints 10 * characters on a line
3 # Bob Plantz - 12 June 2009
4
5 # Useful constants
6 .equ STDOUT,1
7 # Stack frame
8 .equ theChar,-1
9 .equ counter,-16
10 .equ localSize,-16
11 # Code
12 .text
13 .globl main
14 .type main, @function
15 main:
16 pushq %rbp # save base pointer
17 movq %rsp, %rbp # set new base pointer
18 addq $localSize, %rsp # for local var.
19
8 #include <unistd.h>
9
10 int main(void)
11 {
12 char *ptr;
13 char response;
14
25 if (response == ’y’)
26 {
27 ptr = "Changes saved.\n";
10.2. BINARY DECISIONS 251
Let’s look at the flow of the program that the if-else controls.
1. The boolean expression (response == ’y’) is evaluated.
2. If the evaluation is true, the first block, the one that displays “Changes saved.”, is
executed.
3. If the evaluation is false, the second block, the one that displays “Changes dis-
carded.”, is executed.
4. In both cases the next statement to be executed is the return 0;
The program control flow of the if-else construct is illustrated in Figure 10.2.
10.2. BINARY DECISIONS 252
Next instruction
after if-then
construct
Figure 10.2: Flow chart of if-else construct. The large diamond represents a binary
decision that leads to two possible paths, “true” or “false.” Notice that
either the “then” block or the “else” block is executed, but not both. Each
leads to the end of the if-else construct.
We already know all the assembly language instructions needed to implement the
if-else in Listing 10.7. The important thing to note is that there must be an uncondi-
tional jump at the end of the “then” block to transfer program flow around the “else”
block. The assembly language generated for this program is shown in Listing 10.8.
1 .file "yesNo1.c"
2 .section .rodata
3 .LC0:
4 .string "Save changes? "
5 .LC1:
6 .string "Changes saved.\n"
7 .LC2:
8 .string "Changes discarded.\n"
9 .text
10 .globl main
11 .type main, @function
12 main:
13 pushq %rbp
14 movq %rsp, %rbp
15 subq $16, %rsp
16 movq $.LC0, -8(%rbp)
17 jmp .L2
18 .L3:
19 movq -8(%rbp), %rax
20 movl $1, %edx
21 movq %rax, %rsi
22 movl $1, %edi
23 call write
24 addq $1, -8(%rbp)
10.2. BINARY DECISIONS 253
25 .L2:
26 movq -8(%rbp), %rax
27 movzbl (%rax), %eax
28 testb %al, %al
29 jne .L3
30 leaq -9(%rbp), %rax # place to store user response
31 movl $1, %edx
32 movq %rax, %rsi
33 movl $0, %edi
34 call read
35 movzbl -9(%rbp), %eax # get user response
36 cmpb $121, %al # response == ’y’ ?
37 jne .L4 # no, go to else part
38 movq $.LC1, -8(%rbp) # yes, write "Changes saved.\n"
39 jmp .L5
40 .L6:
41 movq -8(%rbp), %rax
42 movl $1, %edx
43 movq %rax, %rsi
44 movl $1, %edi
45 call write
46 addq $1, -8(%rbp)
47 .L5:
48 movq -8(%rbp), %rax
49 movzbl (%rax), %eax
50 testb %al, %al
51 jne .L6
52 jmp .L7 # jump around else part
53 .L4: # else part,
54 movq $.LC2, -8(%rbp) # write "Changes discarded.\n"
55 jmp .L8
56 .L9:
57 movq -8(%rbp), %rax
58 movl $1, %edx
59 movq %rax, %rsi
60 movl $1, %edi
61 call write
62 addq $1, -8(%rbp)
63 .L8:
64 movq -8(%rbp), %rax
65 movzbl (%rax), %eax
66 testb %al, %al
67 jne .L9
68 .L7: # after if-else statement
69 movl $0, %eax
70 leave
71 ret
72 .size main, .-main
73 .ident "GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0"
74 .section .note.GNU-stack,"",@progbits
Listing 10.8: Get yes/no response from user (gcc assembly language).
10.2. BINARY DECISIONS 254
This is not a complete program or even a function. It simply shows the key elements of
an if-else construct.
Our assembly language version of the yes/no program in Listing 10.10 follows this
general pattern. It, of course, uses more meaningful labels than what the compiler
generated.
1 # yesNo2.s
2 # Prompts user to enter a y/n response.
3 # Bob Plantz - 12 June 2009
4
5 # Useful constants
6 .equ STDIN,0
7 .equ STDOUT,1
8 # Stack frame
9 .equ response,-1
10 .equ ptr,-16
11 .equ localSize,-16
12 # Read only data
13 .section .rodata
14 queryMsg:
15 .string "Save changes? "
16 saveMsg:
10.2. BINARY DECISIONS 255
44 getResp:
45 movl $1, %edx # read one byte
46 leaq response(%rbp), %rsi # into this location
47 movl $STDIN, %edi # from keyboard
48 call read
49 # if (response == ’y’)
50 cmpb $’y’, response(%rbp) # was it ’y’?
51 jne noChange # no, there is no change
52
68 saveEnd:
10.2. BINARY DECISIONS 256
87 allDone:
88 movl $0, %eax # return 0;
89 popq %rbx # restore reg.
90 movq %rbp, %rsp # restore stack pointer
91 popq %rbp # restore for caller
92 ret
Listing 10.10: Get yes/no response from user (programmer assembly language).
7 #include <unistd.h>
8
9 int main()
10 {
11 char response; // For user’s response
12 char* ptr; // For text messages
13
In particular, notice that the decision regarding whether the character entered by the
user is a numeral or not is made on the lines:
36 movzbl -9(%rbp), %eax # load numeral character
37 cmpb $57, %al # is numeral > ’9’?
38 jg .L4 # yes, go to else part
39 movzbl -9(%rbp), %eax # load numeral character
40 cmpb $47, %al # is numeral <= ’/’?
41 jle .L4 # yes, go to else part
42 movq $.LC1, -8(%rbp) # "then" part
Consulting Table 2.3 on page 22 we see that the program first compares the character
entered by the user with the ascii code for the numeral “9” (5710 = 3916 ). If the character
is numerically greater, the program jumps to .L5, which is the beginning of the “else”
part. Then the character is compared to the ASCII code for the character “/”, which is
numerically one less that the ascii code for the numeral “0” (4810 = 3016 ). If the character
is numerically equal to or less than, the program also jumps to .L5.
If neither of these conditions causes a jump to the “else” part, the program simply
continues on to execute the “then” part. At the end of the “then” part, the program skips
over the “else” part to the end of the program:
56 jmp .L7 # skip over "else" part
57 .L4: # "else" part
This is called short-circuit evaluation in C/C++. When connecting boolean tests with
the && and || operators, each of the boolean tests is executed one at a time from left to
right. If the overall result of the expression — true or false — is known before all the
tests are made, the remaining tests are not executed. This is one of the most important
reasons for not writing boolean expressions that include side effects; the operation that
produces a needed side effect may never get executed.
where cc is a 1 – 4 letter sequence specifying the settings of the condition codes. Similar
to the conditional jump instructions, the conditional data move takes place if the status
flag settings are true, and does not if they are false.
10.3. INSTRUCTIONS INTRODUCED THUS FAR 261
Possible letter sequences are the same as for the conditional jump instructions listed
in Table 10.1 on page 239. The source operand can be either a register or a memory
location, and the destination must be a register. Unlike other data movement instructions,
the cmovcc instruction does not use the operand size suffix; the size is implicitly specified
by the size of the destination register.
The conditional move instruction would allow the above assembly language to be
written with a cmove instruction, where the “e” means “equal” (see Table 10.1).
movl $discardMsg, %esi # load addresses of
movl $saveMsg, %edi # both messages
# if (response == ’y’)
cmpb $’y’, response(%rbp) # was it ’y’?
cmove %edi, %esi # yes, "save" message
movl %esi, ptr(%rbp) # point to message
msgLoop:
movl ptr(%rbp), %esi # current char in string
cmpb $0, (%esi) # null character?
je allDone # yes, leave while loop
10.3.1 Instructions
data movement:
opcode source destination action page
cbtw convert byte to word, al → ax 246
cwtl convert word to long, ax → eax 246
cltq convert long to quad, eax → rax 246
cmovcc %reg/mem %reg conditional move 260
movs $imm/%reg %reg/mem move 156
movs mem %reg move 156
movsss $imm/%reg %reg/mem move, sign extend 245
movzss $imm/%reg %reg/mem move, zero extend 245
popw %reg/mem pop from stack 181
pushw $imm/%reg/mem push onto stack 181
s = b, w, l, q; w = l, q; cc = condition codes
arithmetic/logic:
opcode source destination action page
adds $imm/%reg %reg/mem add 214
adds mem %reg add 214
cmps $imm/%reg %reg/mem compare 237
cmps mem %reg compare 237
decs %reg/mem decrement 249
incs %reg/mem increment 248
leaw mem %reg load effective address 191
subs $imm/%reg %reg/mem subtract 215
subs mem %reg subtract 215
tests $imm/%reg %reg/mem test bits 238
tests mem %reg test bits 238
s = b, w, l, q; w = l, q
10.4 Exercises
10-1 (§10.1) Verify on paper that the machine instructions in Table 10.4 actually cause
a jump of the number of bytes shown (in decimal) when the jump is taken.
10-2 (§10.1) Enter the program in Listing 10.2 and verify that the jump to here1 uses
the rip-relative addressing mode, and the other two jumps use the direct address.
Hint: Produce a listing file for the program and use gdb to examine register and
memory contents.
10-3 (§10.1) Enter the program in Listing 10.5, changing the while loop to use eax as
a pointer:
movl $theString, %eax
whileLoop:
cmpb $0, (%eax) # null character?
je allDone # yes, all done
This would seem to be more efficient than reading the pointer from memory each
time through the loop. Use gdb to debug the program. Set a break point at the call
instruction and another break point at the incl instruction. Inspect the registers
each time the program breaks into gdb. What is happening to the value in eax?
Hint: Read what the “man 2 write” shell command has to say about the write
10.4. EXERCISES 264
system call function. This exercise points out the necessity of understanding what
happens to registers when calling another function. In general, it is safer to use
local variables in the stack frame.
10-4 (§10.1) Assume that you do not know how many numerals there are, only that
the first one is ’0’ and the last one is ’9’ (the character “0” and character “9”).
Write a program in assembly language that displays all the numerals, 0 – 9, on the
screen, one character at a time. Use only one byte in the .data segment for storing
a character; do not allocate a separate byte for each numeral.
10-5 (§10.1) Assume that you do not know how many upper case letters there are,
only that the first one is ’A’ and the last one is ’Z’. Write a program in assembly
language that displays all the upper case letters, A – Z, on the screen, one character
at a time. Use only one byte in the .data segment for storing a character; do not
allocate a separate byte for each numeral.
10-6 (§10.1) Assume that you do not know how many lower case letters there are,
only that the first one is ’a’ and the last one is ’z’. Write a program in assembly
language that displays all the lower case letters, a – z, on the screen, one character
at a time. Use only one byte in the .data segment for storing a character; do not
allocate a separate byte for each numeral.
10-7 (§10.1) Enter the following C program and use the “-S” option to generate the
assembly language:
1 /*
2 * forLoop.c
3 * For loop multiplication.
4 *
5 * Bob Plantz - 21 June 2009
6 */
7
8 #include<stdio.h>
9
10 int main ()
11 {
12 int x, y, z;
13 int i;
14
Identify the loop that performs the actual multiplication. Write an equivalent C
program that uses a while loop instead of the for loop, and also generate the
assembly language for it. Do the loops differ? If so, how?
10.4. EXERCISES 265
10-8 (§10.2) Enter the C program in Listing 10.7 and get it to work. Do you see any odd
behavior when the program terminates? Can you fix it? Hint: When the program
prompts the user, how many keys did you press? What was the second key press?
10-9 (§10.2) Enter the program in Listing 10.10 and get it to work.
10-10 (§10.2) Write a program in assembly language that displays all the printable
characters that are neither numerals nor letters on the screen, one character at a
time. Don’t forget that the space character, ’ ’, is printable. Do not display the
DEL character. Use only one byte for storing a character; do not allocate a separate
byte for each character.
Use only one while loop in this program. You will need an if-else construct with
a compound boolean conditional statement.
10-11 (§10.2) Write a program in assembly language that
a) prompts the user to enter a text string,
b) reads the user’s input into a char array,
c) echoes the user’s input string,
d) increments each character in the string to the next character in the ASCII
sequence, with the last printable character “wrapping around” to the first
printable character, and
e) displays the modified string.
10-12 (§10.2) Write a program in assembly language that
a) prompts the user to enter a text string,
b) reads the user’s input into a char array,
c) echoes the user’s input string,
d) decrements each character in the string to the previous character in the ASCII
sequence, with the first printable character “wrapping around” to the last
printable character, and
e) displays the modified string.
10-13 (§10.2) Write a program in assembly language that
a) instructs the user,
b) prompts the user to enter a character,
c) reads the user’s input into a char variable,
d) if the user enters a ’q’, the program terminates,
e) if the user enters a numeral, the program echoes the numeral the number of
times represented by the numeral plus one, and
f) any other printable character is echoed just once.
The program continues to run until the user enters a ’q’.
For example, a run of the program might look like (user input is boldface):
A single numeral, N, is echoed N+1 times, other characters are
echoed once. ’q’ ends program.
Enter a single character: a
You entered: a
10.4. EXERCISES 266
Good software engineering practice generally includes breaking problems down into
functionally distinct subproblems. This leads to software solutions with many functions,
each of which solves a subproblem. This “divide and conquer” approach has some
distinct advantages:
• It is easier to solve a small subproblem.
1. Input. The data comes from another part of the program and is used by the function,
but is not modified by it.
2. Output. The function provides new data to another part of the program.
3. Update. The function modifies a data item that is held by another part of the
program. The new value is based on the value before the function was called.
All three interactions can be performed if the called function also knows the location
of the data item. This can be done by the calling function passing the address to the
called function or by making the address globally known to both functions. Updates
require that the address be known by the called function.
267
11.1. OVERVIEW OF PASSING ARGUMENTS 268
Outputs can also be implemented by placing the new data item in a location that is
accessible to both the called and the calling function. In C/C++ this is done by placing
the return value from a function in the eax register. And inputs can be implemented by
passing a copy of the data item to the called function. In both of these cases the called
function does not know the location of the original data item, and thus does not have
access to it.
In addition to global data, C syntax allows three ways for functions to exchange data:
• Pass by value — an input value is passed by making a copy of it available to the
function.
• Return value — an output value can be returned to the calling function.
• Pass by pointer — an output value can be stored for the calling function by passing
the address where the output value should be stored to the called function. This
can also be used to update a data item.
The last method, pass by pointer, can also be used to pass large inputs, or to pass inputs
that should be changed — also called updates. It is also the method by which C++
implements pass by reference.
When one function calls another, the information that is required to provide the
interface between the two is called an activation record. Since both the registers and
the call stack are common to all the functions within a program, both the calling function
and the called function have access to them. So arguments can be passed either in
registers or on the call stack. Of course, the called function must know exactly where
each of the arguments is located when program flow transfers to it.
In principle, the locations of arguments need only be consistent within a program. As
long as all the programmers working on the program observe the same rules, everything
should work. However, designing a good set of rules for any real-world project is a very
time-consuming process. Fortunately, the ABI [25] for the x86-64 architecture specifies
a good set of rules. They rules are very tedious because they are meant to cover all
possible situations. In this book we will consider only the simpler rules in order to get
an overall picture of how this works.
In 64-bit mode six of the general purpose registers and a portion of the call stack
are used for the activation record. The area of the stack used for the activation record
is called a stack frame. Within any function, the stack frame contains the following
information:
• Arguments (in excess of six) passed from the calling function.
• The return address back to the calling function.
• The calling function’s frame pointer.
• Local variables for the current function.
and often includes:
• Copies of arguments passed in registers.
• Copies of values in the registers that must be preserved by a function — rbx, r12 –
r15.
Some general memory usage rules (64-bit mode) are:
• Each argument is passed within an 8-byte unit. For example, passing three char
values requires three registers. This 8-byte rule also applies to arguments passed
on the stack.
11.1. OVERVIEW OF PASSING ARGUMENTS 269
• Local variables can be allocated to take up only the amount of memory they require.
For example, three char values can be accommodated in a three-byte memory area.
• The address in the frame pointer (rbp register) must always be a multiple of sixteen.
It should never be changed within a function, except during the prologue and
epilogue.
• The address in the stack pointer (rsp register) must always be a multiple of sixteen
before transferring program flow to another function.
We can see how this works by studying the program in Listing 11.1.
1 /*
2 * addProg.c
3 * Adds two integers
4 * Bob Plantz - 13 June 2009
5 */
6
7 #include <stdio.h>
8 #include "sumInts1.h"
9
10 int main(void)
11 {
12 int x, y, z;
13 int overflow;
14
22 return 0;
23 }
1 /*
2 * sumInts1.h
3 * Returns N + (N-1) + ... + 1
4 * Bob Plantz - 4 June 2008
5 */
6
7 #ifndef SUMINTS1_H
8 #define SUMINTS1_H
9 int sumInts(int, int, int *);
10 #endif
1 /*
2 * sumInts1.c
3 * Adds two integers and outputs their sum.
4 * Returns 0 if no overflow, else returns 1.
5 * Bob Plantz - 13 June 2009
6 */
7
11.1. OVERVIEW OF PASSING ARGUMENTS 270
8 #include "sumInts1.h"
9
14 *sum = a + b;
15
34 .L3:
35 movl $1, -4(%rbp)
36 .L4:
37 movl -4(%rbp), %eax # return overflow;
38 popq %rbp
39 ret
40 .size sumInts, .-sumInts
41 .ident "GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0"
42 .section .note.GNU-stack,"",@progbits
Listing 11.2: Accessing arguments in the sumInts function from Listing 11.1 (gcc as-
sembly language).
After saving the caller’s frame pointer and establishing its own frame pointer, this
function stores the argument values in the local variable area:
5 sumInts:
6 pushq %rbp
7 movq %rsp, %rbp
8 movl %edi, -20(%rbp) # save a
9 movl %esi, -24(%rbp) # save b
10 movq %rdx, -32(%rbp) # save pointer to sum
11 movl $0, -4(%rbp) # overflow = 0;
The arguments are in the following registers (see Table 8.2, page 174):
• a is in edi.
• b is in esi.
• The pointer to sum is in rdx.
Storing them in the local variable area frees up the registers so they can be used in this
function. Although this is not very efficient, the compiler does not need to work very
hard to optimize register usage within the function. The only local variable, overflow,
is initialized on line 11.
The observant reader will note that no memory has been allocated on the stack for
local variables or saving the arguments. The ABI [25] defines the 128 bytes beyond
the stack pointer — that is, the 128 bytes at addresses lower than the one in the rsp
register — as a red zone. The operating system is not allowed to use this area, so the
function can use it for temporary storage of values that do not need to be saved when
another function is called. In particular, leaf functions can store local variables in this
area without moving the stack pointer because they do not call other functions.
11.1. OVERVIEW OF PASSING ARGUMENTS 272
Notice that both the argument save area and the local variable area are aligned on
16-byte address boundaries. Figure 11.1 provides a pictorial view of where the three
arguments and the local variable are in the red zone.
(rbp)-128
Argument Save
Area
sum = (rbp)-32 address
Red Zone
b = (rbp)-24 value
a = (rbp)-20 value
(rbp)-16 ?
(rbp)-12 ?
Local Variable Area
rsp (rbp)-8 ?
overflow = (rbp)-4 ?
Caller’s rbp
rbp (rbp)+8 Return Address
Figure 11.1: Arguments and local variables in the stack frame, sumInts function. The
two input values and the address for the output are passed in registers,
then stored in the Argument Save Area by the called function. Since this is
a leaf function, the Red Zone is used for this function’s stack frame.
As you know, some functions take a variable number of arguments. In these functions,
the ABI [25] specifies the relative offsets of the register save area. The offsets are shown
in Table 11.1.
Register Offset
rdi 0
rsi 8
rdx 16
rcx 24
r8 32
r9 40
xmm0 48
xmm1 64
... ...
xmm15 288
Table 11.1: Argument register save area in stack frame. These relative offsets should
be used in functions with a variable number of arguments.
One of the problems with the C version of sumInts is that it requires a separate check
for overflow:
16 sumInts:
17 if (((a > 0) && (b > 0) && (*sum < 0)) ||
18 ((a < 0) && (b < 0) && (*sum > 0)))
19 {
11.1. OVERVIEW OF PASSING ARGUMENTS 273
20 overflow = 1;
21 }
Writing the function in assembly language allows us to directly check the overflow flag,
as shown in Listing 11.3.
1 # sumInts.s
2 # Adds two 32-bit integers. Returns 0 if no overflow
3 # else returns 1
4 # Bob Plantz - 13 June 2009
5 # Calling sequence:
6 # rdx <- address of output
7 # esi <- 1st int to be added
8 # edi <- 2nd int to be added
9 # call sumInts
10 # returns 0 if no overflow, else returns 1
11 # Read only data
12 .section .rodata
13 overflow:
14 .word 1
15 # Code
16 .text
17 .globl sumInts
18 .type sumInts, @function
19 sumInts:
20 pushq %rbp # save caller’s frame pointer
21 movq %rsp, %rbp # establish our frame pointer
22
The code to perform the addition and overflow check is much simpler.
23 movl $0, %eax # assume no overflow
24 addl %edi, %esi # add values
25 cmovo overflow, %eax # overflow occurred
26 movl %esi, (%rdx) # output sum
The body of the function begins by assuming there will not be overflow, so 0 is stored
in eax, ready to be the return value. The value of the first argument is added to the
second, because the programmer realizes that the values in the argument registers do
not need to be saved. If this addition produces overflow, the cmovo instruction changes
the return value to 1. Finally, in either case the sum is stored at the memory location
whose address was passed to the function as the third argument.
11.2. MORE THAN SIX ARGUMENTS, 64-BIT MODE 274
9 int main(void)
10 {
11 int total;
12 int a = 1;
13 int b = 2;
14 int c = 3;
15 int d = 4;
16 int e = 5;
17 int f = 6;
18 int g = 7;
19 int h = 8;
20 int i = 9;
21
1 /*
2 * sumNine1.h
3 * Computes sum of nine integers.
4 * Bob Plantz - 13 June 2009
5 */
6 #ifndef SUMNINE_H
7 #define SUMNINE_H
8 int sumNine(int one, int two, int three, int four, int five,
9 int six, int seven, int eight, int nine);
10 #endif
1 /*
2 * sumNine1.c
3 * Computes sum of nine integers.
4 * Bob Plantz - 13 June 2009
5 */
11.2. MORE THAN SIX ARGUMENTS, 64-BIT MODE 275
6 #include <stdio.h>
7 #include "sumNine1.h"
8
9 int sumNine(int one, int two, int three, int four, int five,
10 int six, int seven, int eight, int nine)
11 {
12 int x;
13
1 .file "sumNine1.c"
2 .section .rodata
3 .LC0:
4 .string "sumNine done."
5 .text
6 .globl sumNine
7 .type sumNine, @function
8 sumNine:
9 pushq %rbp
10 movq %rsp, %rbp
11 subq $48, %rsp
12 movl %edi, -20(%rbp) # save one
13 movl %esi, -24(%rbp) # save two
14 movl %edx, -28(%rbp) # save three
15 movl %ecx, -32(%rbp) # save four
16 movl %r8d, -36(%rbp) # save five
17 movl %r9d, -40(%rbp) # save six
18 movl -24(%rbp), %eax # load two
19 movl -20(%rbp), %edx # load one, subtotal
20 addl %eax, %edx # add two to subtotal
21 movl -28(%rbp), %eax # load three
22 addl %eax, %edx # add to subtotal
23 movl -32(%rbp), %eax # load four
24 addl %eax, %edx # add to subtotal
25 movl -36(%rbp), %eax # load five
26 addl %eax, %edx # add to subtotal
27 movl -40(%rbp), %eax # load six
28 addl %eax, %edx # add to subtotal
29 movl 16(%rbp), %eax # load seven
30 addl %eax, %edx # add to subtotal
31 movl 24(%rbp), %eax # load eight
32 addl %eax, %edx # add to subtotal
33 movl 32(%rbp), %eax # load nine
34 addl %edx, %eax # add to subtotal
35 movl %eax, -4(%rbp) # x <- total
36 movl $.LC0, %edi
37 call puts
38 movl -4(%rbp), %eax
39 leave
11.2. MORE THAN SIX ARGUMENTS, 64-BIT MODE 277
40 ret
41 .size sumNine, .-sumNine
42 .ident "GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0"
43 .section .note.GNU-stack,"",@progbits
Listing 11.5: Passing more than six arguments to a function (gcc assembly language).
(There are two files here.)
Before main calls sumNine the values of the second through sixth arguments, b –
f, are moved to the appropriate registers, and the first argument, a is loaded into a
temporary register:
21 movl -20(%rbp), %r9d # f is 6th argument
22 movl -24(%rbp), %r8d # e is 5th argument
23 movl -28(%rbp), %ecx # d is 4th argument
24 movl -32(%rbp), %edx # c is 3rd argument
25 movl -36(%rbp), %esi # b is 2nd argument
26 movl -40(%rbp), %eax # load a
The the values of the seventh, eighth, and ninth arguments, g – i, are moved to their
appropriate locations on the call stack. Enough space was allocated at the beginning of
the function to allow for these arguments. They are moved into their correct locations
on lines 27 – 32:
27 movl -8(%rbp), %edi # load i
28 movl %edi, 16(%rsp) # insert on stack
29 movl -12(%rbp), %edi # load h
30 movl %edi, 8(%rsp) # insert on stack
31 movl -16(%rbp), %edi # load g
32 movl %edi, (%rsp) # insert on stack
The stack pointer, rsp, is used as the reference point for storing the arguments on the
stack here because the main function is starting a new stack frame for the function it is
about to call, sumNine.
Then the first argument, a, is moved to the appropriate register:
33 movl %eax, %edi # a is 1st argument
When program control is transferred to the sumNine function, the partial stack
frame appears as shown in Figure 11.2. Even though each argument is only four bytes
(int), each is passed in an 8-byte portion of stack memory. Compare this with passing
arguments in registers; only one data item is passed per register even if the data item
does not take up the entire eight bytes in the register. The return address is at the top
rsp
????
Return Address
seven = (rsp)+8 7
Stack
eight = (rsp)+16 8
Arguments
nine = (rsp)+24 9
Figure 11.2: Arguments 7 – 9 are passed on the stack to the sumNine function. State of
the stack when control is first transfered to this function.
11.2. MORE THAN SIX ARGUMENTS, 64-BIT MODE 278
of the stack, immediately followed by the three arguments (beyond the six passed in
registers). Notice that each argument is in the same position on the stack as it would
have been if it had been pushed onto the stack just before the call instruction. Since the
address in the stack pointer (rsp) was 16-byte aligned before the call to this function,
and the call instruction pushed the 8-byte return address onto the stack, the address
in rsp is now 8-byte aligned.
The prologue of sumNine completes the stack frame. Then the function saves the
register arguments in the register save area of the stack frame:
9 pushq %rbp
10 movq %rsp, %rbp
11 subq $48, %rsp
12 movl %edi, -20(%rbp) # save one
13 movl %esi, -24(%rbp) # save two
14 movl %edx, -28(%rbp) # save three
15 movl %ecx, -32(%rbp) # save four
16 movl %r8d, -36(%rbp) # save five
17 movl %r9d, -40(%rbp) # save six
The state of the stack frame at this point is shown in Figure 11.3.
rsp
(rbp)-48
(rbp)-44
six = (rbp)-40 6
five = (rbp)-36 5
Argument Save
four = (rbp)-32 4
Area
three = (rbp)-28 3
two = (rbp)-24 2
one = (rbp)-20 1
(rbp)-16
(rbp)-12
Local Variable
rbp (rbp)-8
Area
x = (rbp)-4
Caller’s rbp
(rbp)+8 Return Address
seven = (rbp)+16 7
Stack
eight = (rbp)+24 8
Arguments
nine = (rbp)+32 9
Figure 11.3: Arguments and local variables in the stack frame, sumNine function. The
first six arguments are passed in registers but saved in the stack frame.
Arguments beyond six are passed in the portion of the stack frame that is
created by the calling function.
You may question why the compiler did not simply use the red zone. The sumNine
function is not a leaf function. It calls another function, which may require use of the
11.2. MORE THAN SIX ARGUMENTS, 64-BIT MODE 279
call stack. So space must be explicitly allocated on the call stack for local variables and
the register argument save areas.
By the way, the compiler has replaced this function call, a call to printf, with a call
to puts:
36 movl $.LC0, %edi
37 call puts
Since the only thing to be written to the screen is a text string, the puts function is
equivalent.
After the register arguments are safely stored in the argument save area, they can
be easily summed and the total saved in the local variable:
18 movl -24(%rbp), %eax # load two
19 movl -20(%rbp), %edx # load one, subtotal
20 addl %eax, %edx # add two to subtotal
21 movl -28(%rbp), %eax # load three
22 addl %eax, %edx # add to subtotal
23 movl -32(%rbp), %eax # load four
24 addl %eax, %edx # add to subtotal
25 movl -36(%rbp), %eax # load five
26 addl %eax, %edx # add to subtotal
27 movl -40(%rbp), %eax # load six
28 addl %eax, %edx # add to subtotal
29 movl 16(%rbp), %eax # load seven
30 addl %eax, %edx # add to subtotal
31 movl 24(%rbp), %eax # load eight
32 addl %eax, %edx # add to subtotal
33 movl 32(%rbp), %eax # load nine
34 addl %edx, %eax # add to subtotal
35 movl %eax, -4(%rbp) # x <- total
Notice that the seventh, eighth, and ninth arguments are accessed by positive offsets
from the frame pointer, rbp. They were stored in the stack frame by the calling function.
The called function “owns” the entire stack frame so it does not need to make additional
copies of these arguments.
It is important to realize that once the stack frame has been completed within a
function, that area of the call stack cannot be treated as a stack. That is, it cannot be
accessed through pushes and pops. It must be treated as a record. (You will learn more
about records in Section 13.2, page 333.)
If we were to recompile these functions with higher levels of optimization, many of
these assembly language operations would be removed (see Exercise 11-2). But the
point here is to examine the mechanisms that can be used to work with arguments and
to write easily read code, so we study the unoptimized code.
A version of this program written in assembly language is shown in Listing 11.6.
1 # nineInts2.s
2 # Demonstrate how integral arguments are passed in 64-bit mode.
3 # Bob Plantz - 13 June 2009
4 # Bob Plantz - 5 November 2013 - deleted unneeded register usage (lines 48 -
5
6 # Stack frame
7 # passing arguments on stack (rsp)
8 # need 3x8 = 24 -> 32 bytes
9 .equ seventh,0
11.2. MORE THAN SIX ARGUMENTS, 64-BIT MODE 280
10 .equ eighth,8
11 .equ ninth,16
12 # local vars (rbp)
13 # need 10x4 = 40 -> 48 bytes
14 .equ i,-4
15 .equ h,-8
16 .equ g,-12
17 .equ f,-16
18 .equ e,-20
19 .equ d,-24
20 .equ c,-28
21 .equ b,-32
22 .equ a,-36
23 .equ total,-40
24 .equ localSize,-80
25 # Read only data
26 .section .rodata
27 format:
28 .string "The sum is %i\n"
29 # Code
30 .text
31 .globl main
32 .type main, @function
33 main:
34 pushq %rbp # save caller’s base pointer
35 movq %rsp, %rbp # establish ours
36 addq $localSize, %rsp # space for local variables
37 # + argument passing
38 movl $1, a(%rbp) # initialize local variables
39 movl $2, b(%rbp) # etc...
40 movl $3, c(%rbp)
41 movl $4, d(%rbp)
42 movl $5, e(%rbp)
43 movl $6, f(%rbp)
44 movl $7, g(%rbp)
45 movl $8, h(%rbp)
46 movl $9, i(%rbp)
47
62
1 # sumNine2.s
2 # Sums nine integer arguments and returns the total.
3 # Bob Plantz - 13 June 2009
4
5 # Stack frame
6 # arguments already in stack frame
7 .equ seven,16
8 .equ eight,24
9 .equ nine,32
10 # local variables
11 .equ total,-4
12 .equ localSize,-16
13 # Read only data
14 .section .rodata
15 doneMsg:
16 .string "sumNine done"
17 # Code
18 .text
19 .globl sumNine
20 .type sumNine, @function
21 sumNine:
22 pushq %rbp # save caller’s base pointer
23 movq %rsp, %rbp # set our base pointer
24 addq $localSize, %rsp # for local variables
25
42 ret
Listing 11.6: Passing more than six arguments to a function (programmer assembly
language). (There are two files here.)
The assembly language programmer realizes that all nine integers can be summed in
the sumNine function before it calls another function. In addition, none of the values will
be needed after this summation. So there is no reason to store the register arguments
locally:
26 addl %esi, %edi # add two to one
27 addl %ecx, %edi # plus three
28 addl %edx, %edi # plus four
29 addl %r8d, %edi # plus five
30 addl %r9d, %edi # plus six
31 addl seven(%rbp), %edi # plus seven
32 addl eight(%rbp), %edi # plus eight
33 addl nine(%rbp), %edi # plus nine
However, the edi register will be needed for passing an argument to puts, so the
total is saved in a local variable in the stack frame:
34 movl %edi, total(%rbp) # save total
Then it is loaded into eax for return to the calling function:
39 movl total(%rbp), %eax # return total;
The overall pattern of a stack frame is shown in Figure 11.4. The rbp register serves
as the frame pointer to the stack frame. Once the frame pointer address has been
established in a function, its value must never be changed. The return address is always
located +8 bytes offset from the frame pointer. Arguments to the function are positive
offsets from the frame pointer, and local variables are negative offsets from the frame
pointer.
Memory Available
For Use As A
Stack By
rsp This Function
Local Variables
And Saved
Register
Contents
rbp (rbp)-8
Caller’s rbp
(rbp)+8 Return Address
Arguments
Passed In
Stack Frame
It is essential that you follow the register usage and argument passing disciplines
precisely. Any deviation can cause errors that are very difficult to debug.
1. In the calling function:
(a) Assume that the values in the rax, rcx, rdx, rsi, rdi and r8 – r11 registers
will be changed by the called function.
(b) The first six arguments are passed in the rdi, rsi, rdx, rcx, r8, and r9 registers
in left-to-right order.
(c) Arguments beyond six are stored on the stack as though they had been pushed
onto the stack in right-to-left order.
(d) Use the call instruction to invoke the function you wish to call.
2. Upon entering the called function:
(a) Save the caller’s frame pointer by pushing rbp onto the stack.
(b) Establish a new frame pointer at the current top of stack by copying rsp to
rbp.
(c) Allocate space on the stack for all the local variables, plus any required register
save space, by subtracting the number of bytes required from rsp; this value
must be a multiple of sixteen.
(d) If a called function changes any of the values in the rbx, rbp, rsp, or r12 – r15
registers, they must be saved in the register save area, then restored before
returning to the calling function.
(e) If the function calls another function, save the arguments passed in registers
on the stack.
3. Within the called function:
(a) rsp is pointing to the current bottom of the stack that is accessible to this
function. Observe the usual stack discipline (see §8.2). In particular, DO NOT
use the stack pointer to access arguments or local variables.
(b) Arguments passed in registers to the function and saved on the stack are
accessed by negative offsets from the frame pointer, rbp.
(c) Arguments passed on the stack to the function are accessed by positive offsets
from the frame pointer, rbp.
(d) Local variables are accessed by negative offsets from the frame pointer, rbp.
4. When leaving the called function:
(a) Place the return value, if any, in eax.
(b) Restore the values in the rbx, rbp, rsp, and r12 – r15 registers from the
register save area in the stack frame.
(c) Delete the local variable space and register save area by copying rbp to rsp.
(d) Restore the caller’s frame pointer by popping rbp off the stack save area.
(e) Return to calling function with ret.
The best way to design a stack frame for a function is to make a drawing on paper
following the pattern in Figure 11.3. Show all the local variables and arguments to the
function. To be safe, assume that all the register-passed arguments will be saved in the
function. Compute and write down all the offset values on your drawing. When writing
the source code for your function, use the .equ directive to give meaningful names to
each of the numerical offsets. If you do this planning before writing the executable code,
you can simply use the name(%rbp) syntax to access the value stored at name.
11.3. INTERFACE BETWEEN FUNCTIONS, 32-BIT MODE 284
1 .file "sumNine1.c"
2 .section .rodata
3 .LC0:
4 .string "sumNine done."
5 .text
6 .globl sumNine
7 .type sumNine, @function
8 sumNine:
9 pushl %ebp
10 movl %esp, %ebp
11 subl $40, %esp
12 movl 12(%ebp), %eax # load two
13 movl 8(%ebp), %edx # load one, subtotal
14 addl %eax, %edx # add two
15 movl 16(%ebp), %eax # load three
16 addl %eax, %edx # add to subtotal
17 movl 20(%ebp), %eax # load four
18 addl %eax, %edx # etc...
19 movl 24(%ebp), %eax # load five
20 addl %eax, %edx
21 movl 28(%ebp), %eax # load six
22 addl %eax, %edx
23 movl 32(%ebp), %eax # load seven
24 addl %eax, %edx
25 movl 36(%ebp), %eax # load eight
26 addl %eax, %edx
27 movl 40(%ebp), %eax # load nine
28 addl %edx, %eax # total
29 movl %eax, -12(%ebp) # x <- total
30 movl $.LC0, (%esp)
31 call puts
32 movl -12(%ebp), %eax # return x;
33 leave
34 ret
35 .size sumNine, .-sumNine
36 .ident "GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0"
37 .section .note.GNU-stack,"",@progbits
Listing 11.7: Passing more than six arguments to a function (gcc assembly language,
32-bit). (There are two files here.)
The argument passing sequence can be seen on lines 22 – 39 in the main function. Rather
than pushing each argument onto the stack, the compiler has used the technique of
allocating space on the stack for the arguments, then storing each argument directly in
the appropriate location. The result is the same as if they had been pushed onto the
stack, but the direct storage technique is more efficient.
I find it odd that the compiler writer has chosen to set up a base pointer in ebp but
not used it in this function. This is NOT a recommended technique when writing in
assembly language.
11.3. INTERFACE BETWEEN FUNCTIONS, 32-BIT MODE 286
The state of the call stack just before calling the nineInts function is shown in
Figure 11.5. Comparing this with the 64-bit version in Figure 11.3, we see that the local
variables are treated in essentially the same way. But the 32-bit version differs in the
way it passes arguments:
• All the arguments are passed on the call stack, none in registers.
• Arguments are passed in 4-byte blocks.
esp
arg1 = (esp)+0 1
arg2 = (esp)+4 2
arg3 = (esp)+8 3
arg4 = (esp)+12 4
arg5 = (esp)+16 5
arg6 = (esp)+20 6
Arguments
arg7 = (esp)+24 7
arg8 = (esp)+28 8 Beginning of called
arg9 = (esp)+32 function’s stack frame
9
????
????
????
a = (ebp)-40 1
Local variables
b = (ebp)-36 2
c = (ebp)-32 3 Belongs to this
d = (ebp)-28 function’s stack frame
4
e = (ebp)-24 5
f = (ebp)-20 6
g = (ebp)-16 7
h = (ebp)-12 8
i = (ebp)-8 9
ebp (ecx)
Caller’s ebp
Figure 11.5: Calling function’s stack frame, 32-bit mode. Local variables are accessed
relative to the frame pointer (ebp register). In this example, they are all
4-byte values. Arguments are accessed relative to the stack pointer (esp
register). Arguments are passed in 4-byte blocks.
11.4. INSTRUCTIONS INTRODUCED THUS FAR 287
11.4.1 Instructions
data movement:
opcode source destination action page
cbtw convert byte to word, al → ax 246
cwtl convert word to long, ax → eax 246
cltq convert long to quad, eax → rax 246
cmovcc %reg/mem %reg conditional move 260
movs $imm/%reg %reg/mem move 156
movs mem %reg move 156
movsss $imm/%reg %reg/mem move, sign extend 245
movzss $imm/%reg %reg/mem move, zero extend 245
popw %reg/mem pop from stack 181
pushw $imm/%reg/mem push onto stack 181
s = b, w, l, q; w = l, q; cc = condition codes
arithmetic/logic:
opcode source destination action page
adds $imm/%reg %reg/mem add 214
adds mem %reg add 214
cmps $imm/%reg %reg/mem compare 237
cmps mem %reg compare 237
decs %reg/mem decrement 249
incs %reg/mem increment 248
leaw mem %reg load effective address 191
subs $imm/%reg %reg/mem subtract 215
subs mem %reg subtract 215
tests $imm/%reg %reg/mem test bits 238
tests mem %reg test bits 238
s = b, w, l, q; w = l, q
11.5. EXERCISES 288
11.5 Exercises
11-1 (§11.2) Enter the program in Listing 11.6. Single-step through the program with
gdb and record the changes in the rsp and rip registers and the changes in the
stack on paper. Use drawings similar to Figure 11.3.
Note: Each of the two functions should be in its own source file. You can single-step
into the subfunction with gdb at the call instruction in main, then single-step back
11.5. EXERCISES 289
Exercise Solutions
The solutions to most of the exercises in the book are in this Appendix. You should
attempt to work the exercise before looking at the solution. But don’t allow yourself to
get bogged down. If the solution does not come to you within a reasonable amount of
time, peek at the solution for a hint.
A word of warning: I have proofread these solutions many times. Each time has
turned up several errors. I am amazed at how difficult it is to make everything perfect.
If you find an error, please email me and I will try to correct the next printing.
When reading my programming solutions, be aware that my goal is to present simple,
easy-to-read code that illustrates the point. I have not tried to optimize, neither for size
nor performance.
I am also aware that each of us has our own programming style. Yours probably
differs from mine. If you are working with an instructor, I encourage you to discuss
programming style with him or her. I probably will not change my style, but I support
other people’s desire to use their own style.
2 -1 a) 4567 c) fedc
b) 89ab d) 0250
2 -3 a) 32 d) 16
b) 48 e) 8
c) 4 f) 32
2 -4 a) 2 d) 3
b) 8 e) 5
c) 16 f) 2
2 -5 r = 10, n = 8, d7 = 2, d6 = 9, d5 = 4, d4 = 5, d3 = 8, d2 = 2, d1 = 5, d0 = 4.
r = 16, n = 8, d7 = 2, d6 = 9, d5 = 4, d4 = 5, d3 = 8, d2 = 2, d1 = 5, d0 = 4.
434
E.2. DATA STORAGE FORMATS 435
2 -6 a) 170 e) 128
b) 85 f) 99
c) 240 g) 123
d) 15 h) 255
2 -7 a) 43981 e) 32768
b) 4660 f) 1024
c) 65244 g) 65535
d) 2000 h) 12345
a) 160 e) 100
b) 80 f) 12
c) 255 g) 17
d) 137 h) 200
a) 40960 e) 34952
b) 65535 f) 400
c) 1024 g) 43981
d) 4369 h) 21845
2 -10 a) 64 e) ff
b) 7b f) 10
c) 0a g) 20
d) 58 h) 80
2 -12 Since there are 12 values, we need 4 bits. Any 4-bit code would work. For example,
code grade
0000 A
0001 A-
0010 B+
0011 B
0100 B-
0101 C+
0110 C
0111 C-
1000 D+
1001 D
1010 D-
1011 F
2 -13 The addressing in Figure 2.1 uses only four bits. This limits us to a 16-byte
addressing space. In order to increase our space to 17 bytes, we need another bit
for the address. The 17th byte would be number 10000.
2 -17 The range of 32-bit unsigned ints is 0 – 4,294,967,295, so four bytes will be
required. If the storage area begins at byte number 0x2fffeb96, the number will
also occupy bytes number 0x2fffeb97, 0x2fffeb98, 0x2fffeb99.
E.2. DATA STORAGE FORMATS 437
2 * echoDecHexAddr.c
3 * Asks user to enter a number in decimal or hexadecimal
4 * then echoes it in both bases, also showing where values
5 * are stored.
6 *
7 * Bob Plantz - 19 June 2009
8 */
9
10 #include <stdio.h>
11
12 int main(void)
13 {
14 int x;
15 unsigned int y;
16
17 while(1)
18 {
19 printf("Enter a decimal integer: ");
20 scanf("%i", &x);
21 if (x == 0) break;
22
29 y, y, &y);
30 }
31 printf("End of program.\n");
32
33 return 0;
34 }
2 -28
1 /*
2 * stringInHex.c
3 * displays "Hello world" in hex.
4 *
5 * Bob Plantz - 19 June 2009
6 */
7
8 #include <stdio.h>
9
10 int main(void)
11 {
12 char *stringPtr = "Hello world.\n";
13
23 return 0;
24 }
2 -29 Keyboard input is line buffered by the operating system and is not available to
the application program until the user presses the enter key. This action places
two characters in the keyboard buffer – the character key pressed and the end of
line character. (The “end of line” character differs in different operating systems.)
The call to the read function gets one character from the keyboard buffer – the
one corresponding to the key the user pressed. Since there is a breakpoint at the
instruction following the call to read, control returns to the debugger, gdb. But
the end of line character is still in the keyboard buffer, and the operating system
dutifully provides it to gdb.
The net result is the same as if you had pushed the enter key immediately in
response to gdb’s prompt. This causes gdb to execute the previous command,
which was the continue command. So the program immediately loops back to its
prompt.
Experiment with this. Try to enter more than one character before pressing the
enter key. It is all very consistent. You just have to think through exactly which
keys you are pressing when using the debugger to determine what your call to
read are doing.
E.2. DATA STORAGE FORMATS 439
2 -30
1 /*
2 * echoString1.c
3 * Echoes a string entered by user.
4 *
5 * Bob Plantz - 19 June 2009
6 */
7
8 #include <unistd.h>
9 #include <string.h>
10
11 int main(void)
12 {
13 char aString[200];
14 char *stringPtr = aString;
15
37 return 0;
38 }
2 -31
1 /*
2 * echoString2.c
3 * Echoes a string entered by user. Converts input
4 * to C-style string.
5 * Bob Plantz - 19 June 2009
6 */
7
8 #include <stdio.h>
9 #include <unistd.h>
10 #include <string.h>
11
12 int main(void)
E.2. DATA STORAGE FORMATS 440
13 {
14 char aString[200];
15 char *stringPtr = aString;
16
31 return 0;
32 }
2 -32
1 /*
2 * echoString3.c
3 * Echoes a string entered by user.
4 *
5 * Bob Plantz - 19 June 2009
6 */
7
8 #include "readLn.h"
9 #include "writeStr.h"
10
11 int main(void)
12 {
13 char aString[STRLEN]; // limited to 5 for testing readStr
14 // change to 200 for use
15 writeStr("Enter a text string: ");
16 readLn(aString, STRLEN);
17 writeStr("You entered:\n");
18 writeStr(aString);
19 writeStr("\n");
20
21 return 0;
22 }
1 /*
2 * writeStr.h
3 * Writes a line to standard out.
4 *
5 * input:
6 * pointer to C-style text string
7 * output:
E.2. DATA STORAGE FORMATS 441
8 * to screen
9 * returns number of chars written
10 *
11 * Bob Plantz - 19 June 2009
12 */
13
14 #ifndef WRITESTR_H
15 #define WRITESTR_H
16 int writeStr(char *);
17 #endif
1 /*
2 * writeStr.c
3 * Writes a line to standard out.
4 *
5 * input:
6 * pointer to C-style text string
7 * output:
8 * to screen
9 * returns number of chars written
10 *
11 * Bob Plantz - 19 June 2009
12 */
13
14 #include <unistd.h>
15 #include "writeStr.h"
16
28 return count;
29 }
1 /*
2 * readLn.h
3 * Reads a line from standard in.
4 * Drops newline character. Eliminates
5 * excess characters from input buffer.
6 *
7 * input:
8 * from keyboard
9 * output:
10 * null-terminated text string
11 * returns number of chars in text string
E.3. COMPUTER ARITHMETIC 442
12 *
13 * Bob Plantz - 19 June 2009
14 */
15
16 #ifndef READLN_H
17 #define READLN_H
18 int readLn(char *, int);
19 #endif
1 /*
2 * readLn.c
3 * Reads a line from standard in.
4 * Drops newline character. Eliminates
5 * excess characters from input buffer.
6 *
7 * input:
8 * from keyboard
9 * output:
10 * null-terminated text string
11 * returns number of chars in text string
12 *
13 * Bob Plantz - 19 June 2009
14 */
15
16 #include <unistd.h>
17 #include "readLn.h"
18
35 return count;
36 }
3 -2 Store a digit in every four bits. Thus, the lowest-order digit would be stored in bits
7 – 0, the next lowest-order in 15 – 8, etc., with the highest-order digit in bits 31 –
24.
No, binary addition does not work. For example, let’s consider 48 + 27:
number 32bits(hex)
48 −→ 00000048
+27 −→ 00000027
75 0000007f
number 4bits
(+4) −→ 0100
+ (+5) −→ 0101
(−7) ←− 1001
3 -5 No, it doesn’t work. The problem is that the range of 4-bit signed numbers in two’s
complement format is −8 ≤ x ≤ +7, and (−4) + (−5) exceeds this range.
number 4bits
(−4) −→ 1100
+ (−5) −→ 1011
(+7) ←− 0111
3 -6 Adding any number to its negative will set the CF to one and the OF to zero. The
sum is 2n , where n is the number of bits used for representing the signed integer.
That is, the sum is one followed by n zeroes. The one gets recorded in the CF. Since
the CF is irrelevant in two’s complement arithmetic, the result — n zeroes — is
correct.
In two’s complement, zero does not have a representation of opposite sign. (-0.0
does exist in IEEE 754 floating point.) Also, −2n−1 does not have a representation
of opposite sign.
3 -7 a) +85 e) -128
b) -86
f) +99
c) -16
d) +15 g) +123
3 -8 a) +4660 e) -32768
b) -4660 f) +1024
c) -292 g) -1
d) +2000 h) +30767
3 -9 a) 64 e) 7f
b) ff f) f0
c) f6 g) e0
d) 58 h) 80
E.3. COMPUTER ARITHMETIC 444
3 -11 a) ff d) de
CF = 0 ⇒ unsigned right CF = 0 ⇒ unsigned right
OF = 0 ⇒ signed right OF = 1 ⇒ signed wrong
b) 45 e) 0e
CF = 1 ⇒ unsigned wrong CF = 1 ⇒ unsigned wrong
OF = 0 ⇒ signed right OF = 0 ⇒ signed right
c) fb f) 00
CF = 0 ⇒ unsigned right CF = 1 ⇒ unsigned wrong
OF = 0 ⇒ signed right OF = 1 ⇒ signed wrong
3 -14
1 /*
2 * hexTimesTen.c
3 * Multiplies a hex number by 10.
4 * Bob Plantz - 19 June 2009
5 */
6
7 #include "readLn.h"
8 #include "writeStr.h"
9 #include "hex2int.h"
10 #include "int2hex.h"
11
12 int main(void)
13 {
14 char aString[9];
15 unsigned int x;
16
26 return 0;
27 }
1 /*
2 * hex2int.h
3 *
4 * Converts a hexadecimal text string to corresponding
5 * unsigned int format.
6 * Assumes text string is valid hex chars.
7 *
8 * input:
9 * pointer to null-terminated text string
10 * output:
11 * returns the unsigned int.
12 *
13 * Bob Plantz - 19 June 2009
14 */
15
16 #ifndef HEX2INT_H
17 #define HEX2INT_H
18
21 #endif
1 /*
2 * hex2int.c
3 *
4 * Converts a hexadecimal text string to corresponding
5 * unsigned int format.
6 * Assumes text string is valid hex chars.
7 *
8 * input:
9 * pointer to null-terminated text string
10 * output:
11 * returns the unsigned int.
12 *
13 * Bob Plantz - 19 June 2009
14 */
15
16 #include "hex2int.h"
17
23 x = 0; // initialize result
24 while (*hexString != ’\0’) // end of string?
25 {
26 x = x << 4; // make room for next four bits
27 aChar = *hexString;
E.3. COMPUTER ARITHMETIC 446
39 return x;
40 }
1 /*
2 * int2hex.h
3 *
4 * Converts an unsigned int to corresponding
5 * hex text string format.
6 * Assumes char array is big enough.
7 *
8 * input:
9 * unsigned int
10 * output:
11 * null-terminated text string
12 *
13 * Bob Plantz - 19 June 2009
14 */
15
16 #ifndef INT2HEX_H
17 #define INT2HEX_H
18
21 #endif
1 /*
2 * int2hex.c
3 *
4 * Converts an unsigned int to corresponding
5 * hex text string format.
6 * Assumes char array is big enough.
7 *
8 * input:
9 * unsigned int
10 * output:
11 * null-terminated text string
12 *
13 * Bob Plantz - 19 June 2009
14 */
15
16 #include "int2hex.h"
E.3. COMPUTER ARITHMETIC 447
17
2 * binTimesTen.c
3 * Multiplies a hex number by 10.
4 *
5 * Bob Plantz - 19 June 2009
6 */
7
8 #include "readLn.h"
9 #include "writeStr.h"
10 #include "bin2int.h"
11 #include "int2bin.h"
12
13 int main(void)
14 {
15 char aString[33];
16 unsigned int x;
17
27 return 0;
28 }
1 /*
2 * bin2int.h
E.3. COMPUTER ARITHMETIC 448
3 *
4 * bin2int.c
5 * Converts a binary text string to corresponding
6 * unsigned int format.
7 * Assumes text string contains valid binary chars.
8 *
9 * input:
10 * pointer to null-terminated text string
11 * output:
12 * returns the unsigned int.
13 *
14 * Bob Plantz - 19 June 2009
15 */
16
17 #ifndef BIN2INT_H
18 #define BIN2INT_H
19
22 #endif
1 /*
2 * bin2int.c
3 * Converts a binary text string to corresponding
4 * unsigned int format.
5 * Assumes text string contains valid binary chars.
6 *
7 * input:
8 * pointer to null-terminated text string
9 * output:
10 * returns the unsigned int.
11 *
12 * Bob Plantz - 19 June 2009
13 */
14
15 #include "bin2int.h"
16
22 x = 0; // initialize result
23 while (*binString != ’\0’) // end of string?
24 {
25 x = x << 1; // make room for next bit
26 aChar = *binString;
27 x |= (0x1 & aChar); // sift out the bit
28 binString++;
29 }
30
31 return x;
E.3. COMPUTER ARITHMETIC 449
32 }
1 /*
2 * int2bin.h
3 *
4 * Converts an unsigned int to corresponding
5 * binary text string format.
6 * Assumes char array is big enough.
7 *
8 * input:
9 * unsigned int
10 * output:
11 * null-terminated text string
12 *
13 * Bob Plantz - 19 June 2009
14 */
15
16 #ifndef INT2BIN_H
17 #define INT2BIN_H
18
21 #endif
1 /*
2 * int2bin.c
3 *
4 * Converts an unsigned int to corresponding
5 * binary text string format.
6 * Assumes char array is big enough.
7 *
8 * input:
9 * unsigned int
10 * output:
11 * null-terminated text string
12 *
13 * Bob Plantz - 19 June 2009
14 */
15
16 #include "int2bin.h"
17
2 * uDecTimesTen.c
3 * Multiplies a decimal number by 10.
4 * Bob Plantz - 20 June 1009
5 */
6
7 #include "readLn.h"
8 #include "writeStr.h"
9 #include "udec2int.h"
10 #include "int2bin.h"
11
12 int main(void)
13 {
14 char aString[33];
15 unsigned int x;
16
26 return 0;
27 }
1 /*
2 * uDec2int.h
3 *
4 * Converts a decimal text string to corresponding
5 * unsigned int format.
6 * Assumes text string is valid decimal chars.
7 *
8 * input:
9 * pointer to null-terminated text string
10 * output:
11 * returns the unsigned int.
12 *
13 * Bob Plantz - 19 June 2009
14 */
15
16 #ifndef UDEC2INT_H
17 #define UDEC2INT_H
18
E.3. COMPUTER ARITHMETIC 451
21 #endif
1 /*
2 * uDec2int.c
3 *
4 * Converts a decimal text string to corresponding
5 * unsigned int format.
6 * Assumes text string is valid decimal chars.
7 *
8 * input:
9 * pointer to null-terminated text string
10 * output:
11 * returns the unsigned int.
12 *
13 * Bob Plantz - 19 June 2009
14 */
15
16 #include "uDec2int.h"
17
23 x = 0; // initialize result
24 while (*decString != ’\0’) // end of string?
25 {
26 x *= 10;
27 aChar = *decString;
28 x += (0xf & aChar);
29 decString++;
30 }
31
32 return x;
33 }
See above for int2bin. See Section E.2 for writeStr and readLn.
3 -17
1 /*
2 * sDecTimesTen.c
3 * Multiplies a signed decimal number by 10
4 * and shows result in binary.
5 * Bob Plantz - 21 June 2009
6 */
7
8 #include "readLn.h"
9 #include "writeStr.h"
10 #include "sDec2int.h"
11 #include "int2bin.h"
12
E.3. COMPUTER ARITHMETIC 452
13 int main(void)
14 {
15 char aString[33];
16 int x;
17
27 return 0;
28 }
1 /*
2 * sDec2int.h
3 *
4 * Converts a decimal text string to corresponding
5 * signed int format.
6 * Assumes text string is valid decimal chars.
7 *
8 * input:
9 * pointer to null-terminated text string
10 * output:
11 * returns the signed int.
12 *
13 * Bob Plantz - 19 June 2009
14 */
15
16 #ifndef SDEC2INT_H
17 #define SDEC2INT_H
18
21 #endif
1 /*
2 * sDec2int.c
3 *
4 * Converts a decimal text string to corresponding
5 * signed int format.
6 * Assumes text string is valid decimal chars.
7 *
8 * input:
9 * pointer to null-terminated text string
10 * output:
11 * returns the signed int.
12 *
13 * Bob Plantz - 19 June 2009
E.4. LOGIC GATES 453
14 */
15
16 #include "uDec2int.h"
17 #include "sDec2int.h"
18
24 if (*decString == ’-’)
25 {
26 negative = 1;
27 decString++;
28 }
29 else
30 {
31 if (*decString == ’+’)
32 decString++;
33 }
34
35 x = uDec2int(decString);
36
37 if (negative)
38 x *= -1;
39
40 return x;
41 }
See above for int2bin and uDec2int. See Section E.2 for writeStr and readLn.
x x·1 x x+0
0 1 0 0 0 0
1 1 1 1 0 1
x x·0 x x+1
0 0 0 0 1 1
1 0 0 1 1 1
E.4. LOGIC GATES 454
x x0 x·0 x x0 x+1
0 1 0 0 1 1
1 0 0 1 0 1
x x x·0 x x x+1
0 0 0 0 0 0
1 1 1 1 1 1
x y z x · (y + z) x·y+x·z x y z x+y·z (x + y) · (x + z)
0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 1 0 0
0 1 0 0 0 0 1 0 0 0
0 1 1 0 0 0 1 1 1 1
1 0 0 0 0 1 0 0 1 1
1 0 1 1 1 1 0 1 1 1
1 1 0 1 1 1 1 0 1 1
1 1 1 1 1 1 1 1 1 1
0 m0 m2 m6 m4
z
1 m1 m3 m7 m5
4 -10 Minterms:
F (x, y, z) xz
00 01 11 10
0 m0 m1 m5 m4
y
1 m2 m3 m7 m6
E.4. LOGIC GATES 455
4 -11 The prime numbers correspond to the minterms m2 , m3 , m5 , and m7 . The minterms
m10 , m11 , m12 , m13 , m14 , m15 cannot occur so are marked “don’t care” on the Karnaugh
map.
F (w, x, y, z) yz
00 01 11 10
00 m0 m1 1 1
01 m4 1 1 m6
wx
11 × × × ×
10 m8 m9 × ×
F (w, x, y, z) = x · z + x0 · y
x1 x0 y1 y0 F
0 0 0 0 0
0 0 0 1 1
0 0 1 0 1
0 0 1 1 1
0 1 0 0 0
0 1 0 1 0
0 1 1 0 1
0 1 1 1 1
1 0 0 0 0
1 0 0 1 0
1 0 1 0 0
1 0 1 1 1
1 1 0 0 0
1 1 0 1 0
1 1 1 0 0
1 1 1 1 0
x1 x0 y1 y0
Enable = 0 Enable = 1
Current Next Next
n1 n0 n1 n0 J1 K1 J0 K0 n1 n0 J1 K1 J0 K0
0 0 0 0 0 1 0 1 0 1 0 1 1 0
0 1 0 1 0 1 1 0 1 0 1 0 0 1
1 0 1 0 1 0 0 1 1 1 1 0 1 0
1 1 1 1 1 0 1 0 0 0 0 1 0 1
This leads to the following equations for the inputs to the JK flip-flops (using “E”
for “Enable”):
J0 (E, n1 , n0 ) n1 n0 K0 (E, n1 , n0 ) n1 n0
00 01 11 10 00 01 11 10
0 1 1 0 1 1
E
E
1 1 1 1 1 1
J1 (E, n1 , n0 ) n1 n0 K1 (E, n1 , n0 ) n1 n0
00 01 11 10 00 01 11 10
0 1 1 0 1 1
E
E
1 1 1
1
1
1
J0 = E 0 · n0 + E · n00
K0 = E 0 · n00 + E · n1
J1 = E 0 · n1 + n1 · n00 + E · n01 · n0
K1 = E 0 · n01 + n01 · n00 + E · n1 · n0
E.6. CENTRAL PROCESSING UNIT 457
5 -4 Four-bit up counter.
1 T Q n0
Q0
CLK CK
T Q n1
Q1
CK
T Q n2
Q2
CK
T Q n3
Q3
CK
2 * endian.c
3 * Determines endianess. If endianess cannot be determined
4 * from input value, defaults to "big endian"
5 * Bob Plantz - 22 June 2009
6 */
7
8 #include <stdio.h>
9
10 int main(void)
11 {
E.6. CENTRAL PROCESSING UNIT 458
33 return 0;
34 }
6 -6
1 /*
2 * endianReg.c
3 * Stores user int in memory then copies to register var.
4 * Use gdb to observe endianess.
5 * Bob Plantz - 22 June 2009
6 */
7
8 #include <stdio.h>
9
10 int main(void)
11 {
12 int x;
13 register int y;
14
18 y = x;
19 printf("You entered %i\n", y);
20
21 return 0;
22 }
When I ran this program with the input -1985229329, I got the results:
which shows the value stored in rcx (used as the y variable) is in regular order,
and the value store in memory (the x variable) is in little endian.
5 .text
6 .globl f
7 .type f, @function
8 f:
9 pushq %rbp # save caller’s frame pointer
10 movq %rsp, %rbp # establish ours
11
7 -2
1 # g.s
2 # Does nothing but return to caller.
3 # Bob Plantz - 22 June 2009
4
5 .text
6 .globl g
7 .type g, @function
8 g:
9 pushq %rbp # save caller’s frame pointer
10 movq %rsp, %rbp # establish ours
11
7 -3
1 # h.s
2 # Does nothing but return 123 to caller.
3 # Bob Plantz - 22 June 2009
E.10. PROGRAM FLOW CONSTRUCTS 475
the second byte in the jmp here1 instruction is 03, which is the number of bytes to
the here1 location.
Single-stepping through the program with gdb and examining the contents of rax,
rip, and pointer shows that jmp *%rax and jmp *pointer use the full address,
not just an offset.
10 -3 The program will probably crash. When the write function is called, it returns
the number of characters written. Return values are placed in eax. Hence, the
address is overwritten. In general, it is safer to use variables in the stack frame if
their values must remain the same after another function is called.
10 -4
1 # numerals.s
4 # useful constant
5 .equ STDOUT,1
6 # stack frame
7 .equ theNumeral,-1
8 .equ localSize,-16
9 # read only data
10 .section .rodata
11 newline:
12 .byte ’\n’
13 # code
14 .text
15 .globl main
16 .type main, @function
E.10. PROGRAM FLOW CONSTRUCTS 476
17 main:
18 pushq %rbp # save caller’s base pointer
19 movq %rsp, %rbp # establish ours
20 addq $localSize, %rsp # local vars.
21
33 allDone:
34 movl $1, %edx # do a newline for user
35 movl $newline, %esi
36 movl $STDOUT, %edi
37 call write
38
10 -5
1 # alphaUpper.s
2 # Displays the upper case alphabet on screen
3 # Bob Plantz - 27 June 2009
4 # useful constant
5 .equ STDOUT,1
6 # stack frame
7 .equ theLetter,-1
8 .equ localSize,-16
9 # read only data
10 .section .rodata
11 newline:
12 .byte ’\n’
13 # code
14 .text
15 .globl main
16 .type main, @function
17 main:
18 pushq %rbp # save caller’s base pointer
19 movq %rsp, %rbp # establish ours
20 addq $localSize, %rsp # local vars.
21
33 allDone:
34 movl $1, %edx # do a newline for user
35 movl $newline, %esi
36 movl $STDOUT, %edi
37 call write
38
10 -6
1 # alphaLower.s
2 # Displays the lower case alphabet on screen
3 # Bob Plantz - 27 June 2009
4 # useful constant
5 .equ STDOUT,1
6 # stack frame
7 .equ theLetter,-1
8 .equ localSize,-16
9 # read only data
10 .section .rodata
11 newline:
12 .byte ’\n’
13 # code
14 .text
15 .globl main
16 .type main, @function
17 main:
18 pushq %rbp # save caller’s base pointer
19 movq %rsp, %rbp # establish ours
20 addq $localSize, %rsp # local vars.
21
33 allDone:
34 movl $1, %edx # do a newline for user
35 movl $newline, %esi
36 movl $STDOUT, %edi
37 call write
38
10 -7
1 /*
2 * whileLoop.c
3 * While loop multiplication.
4 *
5 * Bob Plantz - 27 June 2009
6 */
7
8 #include<stdio.h>
9
10 int main ()
11 {
12 int x, y, z;
13 int i;
14
With version 4.7.0 of gcc and no optimization (-O0), they both use the same assembly
language for the loop:
28 jmp .L2
29 .L3:
30 movl -16(%rbp), %eax
31 addl %eax, -8(%rbp)
32 addl $1, -4(%rbp)
33 .L2:
34 movl -12(%rbp), %eax
35 cmpl %eax, -4(%rbp)
36 jl .L3
E.10. PROGRAM FLOW CONSTRUCTS 479
10 -8 After the program executes, the system prompt is displayed twice because the
“return key” is still in the standard in buffer. This can be fixed by reading two
characters.
1 /*
2 * yesNo1a.c
3 * Prompts user to enter a y/n response.
4 *
5 * Bob Plantz - 27 June 2009
6 */
7
8 #include <unistd.h>
9
12 int main(void)
13 {
14 register char *ptr;
15
26 if (*response == ’y’)
27 {
28 ptr = "Changes saved.\n";
29 while (*ptr != ’\0’)
30 {
31 write(STDOUT_FILENO, ptr, 1);
32 ptr++;
33 }
34 }
35 else
36 {
37 ptr = "Changes discarded.\n";
38 while (*ptr != ’\0’)
39 {
40 write(STDOUT_FILENO, ptr, 1);
41 ptr++;
42 }
43 }
44 return 0;
45 }
10 -10
1 # others.s
3 # and letters.
4 # Bob Plantz - 27 June 2009
5 # useful constants
6 .equ STDOUT,1
7 .equ SPACE,’ ’ # lowest printable character
8 .equ SQUIGGLE,’~’ # highest printable character
9 # stack frame
10 .equ theChar,-1
11 .equ localSize,-16
12 # read only data
13 .section .rodata
14 newline:
15 .byte ’\n’
16 # code
17 .text
18 .globl main
19 .type main, @function
20 main:
21 pushq %rbp # save caller’s base pointer
22 movq %rsp, %rbp # establish ours
23 addq $localSize, %rsp # local vars.
24
51 allDone:
52 movl $1, %edx # do a newline for user
53 movl $newline, %esi
54 movl $STDOUT, %edi
E.10. PROGRAM FLOW CONSTRUCTS 481
55 call write
56
10 -11
1 # incChars.s
5 # useful constants
6 .equ STDIN,0
7 .equ STDOUT,1
8 .equ SPACE,’ ’ # lowest printable character
9 .equ SQUIGGLE,’~’ # highest printable character
10 # stack frame
11 .equ theString,-256
12 .equ localSize,-256
13 # read only data
14 .section .rodata
15 prompt:
20 .byte ’\n’
21 # code
22 .text
23 .globl main
24 .type main, @function
25 main:
40 getString:
41 leaq theString(%rbp), %rsi # place to put user input
42 movl $1, %edx # one character
43 movl $STDIN, %edi
44 call read
E.10. PROGRAM FLOW CONSTRUCTS 482
45 readLup:
46 cmpb $’\n’, (%rsi) # end of input?
47 je incChars # yes, process the string
48 incq %rsi # next char
49 movl $1, %edx # one character
50 movl $STDIN, %edi
51 call read
52 jmp readLup # check at top of loop
53
54 incChars:
55 movb $0, (%rsi) # null character for C string
56 leaq theString(%rbp), %rsi # pointer to the string
57 incLoop:
58 cmpb $0, (%rsi) # end of string?
59 je doDisplay # yes, display the results
60 incb (%rsi) # change character
61 cmpb $SQUIGGLE, (%rsi) # did we go too far?
62 jbe okay # no
63 movb $SPACE, (%rsi) # yes, wrap to beginning
64 okay:
65 incq %rsi # next char
66 jmp incLoop # check at top of loop
67
68 doDisplay:
69 movl $msg, %esi # print message for user
70 dispLoop:
71 cmpb $0, (%esi) # end of string?
72 je showString # yes, show results
73 movl $1, %edx # no, one character
74 movl $STDOUT, %edi
75 call write
76 incl %esi # next char
77 jmp dispLoop # check at top of loop
78
79 showString:
80 leaq theString(%rbp), %rsi # pointer to the string
81 showLoop:
82 cmpb $0, (%rsi) # end of string?
83 je allDone # yes, get user input
84 movl $1, %edx # no, one character
85 movl $STDOUT, %edi
86 call write
87 incq %rsi # next char
88 jmp showLoop # check at top of loop
89
90 allDone:
91 movl $1, %edx # do a newline for user
92 movl $newline, %esi
93 movl $STDOUT, %edi
94 call write
95
10 -12
1 # decChars.s
5 # useful constants
6 .equ STDIN,0
7 .equ STDOUT,1
8 .equ SPACE,’ ’ # lowest printable character
9 .equ SQUIGGLE,’~’ # highest printable character
10 # stack frame
11 .equ theString,-256
12 .equ localSize,-256
13 # read only data
14 .section .rodata
15 prompt:
20 .byte ’\n’
21 # code
22 .text
23 .globl main
24 .type main, @function
25 main:
40 getString:
41 leaq theString(%rbp), %rsi # place to put user input
42 movl $1, %edx # one character
43 movl $STDIN, %edi
44 call read
45 readLup:
46 cmpb $’\n’, (%rsi) # end of input?
47 je decChars # yes, process the string
48 incq %rsi # next char
E.10. PROGRAM FLOW CONSTRUCTS 484
54 decChars:
55 movb $0, (%rsi) # null character for C string
56 leaq theString(%rbp), %rsi # pointer to the string
57 decLoop:
58 cmpb $0, (%rsi) # end of string?
59 je doDisplay # yes, display the results
60 decb (%rsi) # change character
61 cmpb $SPACE, (%rsi) # did we go too far?
62 jae okay # no
63 movb $SQUIGGLE, (%rsi) # yes, wrap to beginning
64 okay:
65 incq %rsi # next char
66 jmp decLoop # check at top of loop
67
68 doDisplay:
69 movl $msg, %esi # print message for user
70 dispLoop:
71 cmpb $0, (%esi) # end of string?
72 je showString # yes, show results
73 movl $1, %edx # no, one character
74 movl $STDOUT, %edi
75 call write
76 incl %esi # next char
77 jmp dispLoop # check at top of loop
78
79 showString:
80 leaq theString(%rbp), %rsi # pointer to the string
81 showLoop:
82 cmpb $0, (%rsi) # end of string?
83 je allDone # yes, get user input
84 movl $1, %edx # no, one character
85 movl $STDOUT, %edi
86 call write
87 incq %rsi # next char
88 jmp showLoop # check at top of loop
89
90 allDone:
91 movl $1, %edx # do a newline for user
92 movl $newline, %esi
93 movl $STDOUT, %edi
94 call write
95
10 -13
1 # echoN.s
6 # useful constants
7 .equ STDIN,0
8 .equ STDOUT,1
9 # stack frame
10 .equ count,-8
11 .equ response,-4
12 .equ localSize,-16
13 # read only data
14 .section .rodata
15 instruct:
25 .byte ’\n’
26 # code
27 .text
28 .globl main
29 .type main, @function
30 main:
45 runLoop:
46 movl $prompt, %esi # prompt user
47 promptLup:
48 cmpb $0, (%esi) # end of string?
49 je getChar # yes, get user input
50 movl $1, %edx # no, one character
51 movl $STDOUT, %edi
E.10. PROGRAM FLOW CONSTRUCTS 486
52 call write
53 incl %esi # next char
54 jmp promptLup # check at top of loop
55
56 getChar:
57 leaq response(%rbp), %rsi # place to put user input
58 movl $2, %edx # include newline
59 movl $STDIN, %edi
60 call read
61
85 doChar:
86 movl $1, %edx # one character
87 leaq response(%rbp), %rsi # in this mem location
88 movl $STDOUT, %edi
89 call write
90
100 allDone:
101 movl $bye, %esi # ending message
102 doneLup:
103 cmpb $0, (%esi) # end of string?
E.11. WRITING YOUR OWN FUNCTIONS 487
111 cleanUp:
112 movl $0, %eax # return 0;
113 movq %rbp, %rsp # delete local vars.
114 popq %rbp # restore caller’s base pointer
115 ret # return to caller
5 hiworld:
6 .string "Hello, world!\n"
7
8 .text
9 .globl main
10
11 main:
12 pushq %rbp # save caller base pointer
13 movq %rsp, %rbp # establish our base pointer
14
1 # writeStr.s
2 # Writes a C-style text string to the standard output (screen).
3 # Bob Plantz - 27 June 2009
4
5 # Calling sequence:
6 # rdi <- address of string to be written
7 # call writestr
8 # returns number of characters written
9
10 # Useful constant
11 .equ STDOUT,1
12 # Stack frame, showing local variables and arguments
13 .equ stringAddr,-16
E.11. WRITING YOUR OWN FUNCTIONS 488
14 .equ count,-4
15 .equ localSize,-16
16
17 .text
18 .globl writeStr
19 .type writeStr, @function
20 writeStr:
21 pushq %rbp # save base pointer
22 movq %rsp, %rbp # new base pointer
23 addq $localSize, %rsp # local vars. and arg.
24
11 -4
1 # echoString.s
2 # Prompts user to enter a string, then echoes it.
3 # Bob Plantz - 27 June 2009
4 # stack frame
5 .equ theString,-256
6 .equ localSize,-256
7 # read only data
8 .data
9 usrprmpt:
10 .string "Enter a text string:\n"
11 usrmsg:
12 .string "You entered:\n"
13 newline:
14 .string "\n"
15 # code
16 .text
17 .globl main
18 .type main, @function
19 main:
20 pushq %rbp # save caller base pointer
21 movq %rsp, %rbp # establish our base pointer
E.11. WRITING YOUR OWN FUNCTIONS 489
1 # readLnSimple.s
2 # Reads a line (through the ’\n’ character from standard input. Deletes
3 # the ’\n’ and creates a C-style text string.
4 # Bob Plantz - 27 June 2009
5
6 # Calling sequence:
7 # rdi <- address of place to store string
8 # call readLn
9 # returns number of characters read (not including NUL)
10
11 # Useful constant
12 .equ STDIN,0
13 # Stack frame, showing local variables and arguments
14 .equ stringAddr,-16
15 .equ count,-4
16 .equ localSize,-16
17
18 .text
19 .globl readLn
20 .type readLn, @function
21 readLn:
22 pushq %rbp # save base pointer
23 movq %rsp, %rbp # new base pointer
24 addq $localSize, %rsp # local vars. and arg.
25
32 call read
33 readLoop:
34 movq stringAddr(%rbp), %rax # get pointer
35 cmpb $’\n’, (%rax) # return key?
36 je endOfString # yes, mark end of string
37 incq stringAddr(%rbp) # no, move pointer to next byte
38 incl count(%rbp) # count++;
39 movl $1, %edx # get another character
40 movq stringAddr(%rbp), %rsi # into storage area
41 movl $STDIN, %edi # from keyboard
42 call read
43 jmp readLoop # and look at it
44
45 endOfString:
46 movq stringAddr(%rbp), %rax # current pointer
47 movb $0, (%rax) # mark end of string
48
11 -5 Note: Some students will try to create a nested loop, the outer one being executed
twice. But the display messages are not nearly as nice, unless the student uses some
“goto” statements. In my opinion, two separate change case loops is better software
engineering because it allows maximum flexibility in the user messages. The user
will generally complain about what is seen on the screen, not the cleverness of the
code.
1 # changeCase.s
2 # Prompts user to enter a string, echoes it, changes case of alpha
3 # characters, displays them, changes them back, then displays result.
4 # Bob Plantz - 27 June 2009
5
6 # Stack frame
7 .equ response,-256
8 .equ localSize,-256
9 .data
10 usrprmpt:
11 .string "Enter a text string:\n"
12 usrmsg:
13 .string "You entered:\n"
14 chngmsg:
15 .string "Changing the case gives:\n"
16 newline:
17 .string "\n"
18
19 .text
20 .globl main
21 .type main, @function
22 main:
E.11. WRITING YOUR OWN FUNCTIONS 491
61 showChange:
62 movl $chngmsg, %edi # tell user about it
63 call writeStr
64
89 showOrig:
90 movl $usrmsg, %edi # show original version
91 call writeStr
92
22 main:
23 pushq %rbp # save caller base pointer
24 movq %rsp, %rbp # establish our base pointer
25 addq $localSize, %rsp # local vars.
26
1 # readLn.s
2 # Reads a line (through the ’\n’ character from standard input. Deletes
3 # the ’\n’ and creates a C-style text string.
4 # Bob Plantz - 27 June 2009
5
6 # Calling sequence:
7 # rsi <- length of char array
8 # rdi <- address of place to store string
9 # call readLn
10 # returns number of characters read (not including NUL)
11
12 # Useful constant
13 .equ STDIN,0
14 # Stack frame, showing local variables and arguments
15 .equ maxLength,-24
16 .equ stringAddr,-16
17 .equ count,-4
18 .equ localSize,-32
19
20 .text
21 .globl readLn
22 .type readLn, @function
23 readLn:
24 pushq %rbp # save base pointer
25 movq %rsp, %rbp # new base pointer
26 addq $localSize, %rsp # local vars. and arg.
27
E.12. BIT OPERATIONS; MULTIPLICATION AND DIVISION 494
55 endOfString:
56 movq stringAddr(%rbp), %rax # current pointer
57 movb $0, (%rax) # mark end of string
58
6 # Stack frame
7 .equ theInt,-40
8 .equ buffer,-36
9 .equ localSize,-48
10 # Read only data
11 .section .rodata
Index
541
INDEX 542