Computer Organization
Computer Organization
net/publication/2996794
CITATIONS READS
2 17,989
1 author:
Will Tracz
Lockheed Martin Corporation
135 PUBLICATIONS 1,228 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Will Tracz on 07 January 2014.
January 2011
Copyright notice
Copyright ©2008, ©2009, ©2010, ©2011 by Robert G. Plantz. All rights reserved.
This book may be reproduced and distributed in its entirety (including this authorship, copyright, and permission
notice), provided that no charge is made for the document itself (except for the cost of the printing or copying service),
without the author’s written consent. This includes “fair use” excerpts like reviews and advertising and derivative
works like translations. You may print or copy individual pages for your own use.
Instructors are encouraged to use this book in their classes. The author would appreciate being notified of such
usage.
The author has used his best efforts in preparing this book. The author makes no warranty of any kind, expressed
or implied, with regard to the programs or the documentation contained in this book. The author shall not be liable in
any event from incidental or consequential damages in connection with, or arising out of, the furnishing, performance,
or use of these programs.
All products or services mentioned in this book are the trademarks or service marks of their respective companies
or organizations. Eclipse is a trademark of Eclipse Foundation, Inc.
Contents
Preface xvi
1 Introduction 1
1.1 Computer Subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 How the Subsystems Interact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Computer Arithmetic 28
3.1 Addition and Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 Arithmetic Errors — Unsigned Integers . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Arithmetic Errors — Signed Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 Overflow and Signed Decimal Integers . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4.1 The Meaning of CF and OF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.5 C/C++ Basic Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.5.1 C/C++ Shift Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.5.2 C/C++ Bit Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.5.3 C/C++ Data Type Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.6 Other Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.6.1 BCD Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.6.2 Gray Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4 Logic Gates 55
4.1 Boolean Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 Canonical (Standard) Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3 Boolean Function Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3.1 Minimization Using Algebraic Manipulations . . . . . . . . . . . . . . . . . . 61
4.3.2 Minimization Using Graphic Tools . . . . . . . . . . . . . . . . . . . . . . . . 63
4.4 Crash Course in Electronics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4.1 Power Supplies and Batteries . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.4.2 Resistors, Capacitors, and Inductors . . . . . . . . . . . . . . . . . . . . . . . 70
4.4.3 CMOS Transistors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.5 NAND and NOR Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
iii
iv CONTENTS
5 Logic Circuits 82
5.1 Combinational Logic Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.1.1 Adder Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.1.2 Ripple-Carry Addition/Subtraction Circuits . . . . . . . . . . . . . . . . . . . 85
5.1.3 Decoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.1.4 Multiplexers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.2 Programmable Logic Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.2.1 Programmable Logic Array (PLA) . . . . . . . . . . . . . . . . . . . . . . . . 91
5.2.2 Read Only Memory (ROM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.2.3 Programmable Array Logic (PAL) . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.3 Sequential Logic Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.3.1 Clock Pulses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.3.2 Latches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.3.3 Flip-Flops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.4 Designing Sequential Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.5 Memory Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.5.1 Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.5.2 Shift Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.5.3 Static Random Access Memory (SRAM) . . . . . . . . . . . . . . . . . . . . . 112
5.5.4 Dynamic Random Access Memory (DRAM) . . . . . . . . . . . . . . . . . . . 114
5.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
16 Input/Output 352
16.1 Memory Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
16.2 I/O Device Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
16.3 Bus Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
16.4 I/O Interfacing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
16.5 I/O Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
16.6 Programming Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
16.7 Interrupt-Driven I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
16.8 I/O Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
16.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
Bibliography 485
Index 486
List of Figures
viii
LIST OF FIGURES ix
8.6 Local variable stack area in the program from Listing 8.5. . . . . . . . . . . . . . . 168
9.1 Assembler listing file for the function shown in Listing 9.7. . . . . . . . . . . . . . . 198
9.2 General format of instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
9.3 REX prefix byte. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
9.4 ModRM byte. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
9.5 SIB byte. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
9.6 Machine code for the mov from a register to a register instruction. . . . . . . . . . . 201
9.7 Machine code for the mov immediate data to a register instruction. . . . . . . . . . 202
9.8 Machine code for the add immediate data to the A register . . . . . . . . . . . . . . 203
9.9 Machine code for the add immediate data to a register . . . . . . . . . . . . . . . . 203
9.10 Machine code for the add immediate data to a register instruction. . . . . . . . . . 203
9.11 Machine code for the add register to register instruction. . . . . . . . . . . . . . . . 204
11.1 Arguments and local variables in the stack frame, sumInts function. . . . . . . . . 241
11.2 Arguments 7 – 9 are passed on the stack to the sumNine function. . . . . . . . . . . 246
11.3 Arguments and local variables in the stack frame, sumNine function. . . . . . . . . 247
11.4 Overall layout of the stack frame. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
11.5 Calling function’s stack frame, 32-bit mode. . . . . . . . . . . . . . . . . . . . . . . . 254
13.1 Memory allocation for the variables x and y from the C program in Listing 13.6. . 298
3.1 Correspondence between binary, hexadecimal, and unsigned decimal values for
the hexadecimal digits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Four-bit signed integers, two’s complement notation. . . . . . . . . . . . . . . . . . . 35
3.3 Sizes of some C/C++ data types in 32-bit and 64-bit modes. . . . . . . . . . . . . . . 43
3.4 Hexadecimal characters and corresponding int. . . . . . . . . . . . . . . . . . . . . 48
3.5 BCD code for the decimal digits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.6 Sign codes for packed BCD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.7 Gray code for 4 bits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.1 Effect on other bits in a register when less than 64 bits are changed. . . . . . . . . 141
xi
xii LIST OF TABLES
12.1 Bit patterns (in binary) of the ASCII numerals and the corresponding 32-bit ints. 274
12.2 Register usage for the mul instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . 275
12.3 Register usage for the div instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . 281
15.1 Some system call codes for the syscall instruction. . . . . . . . . . . . . . . . . . . 347
Listings
xiii
xiv LISTINGS
10.5 Displaying a string one character at a time (programmer assembly language). . . 218
10.6 A do-while loop to print 10 characters. . . . . . . . . . . . . . . . . . . . . . . . . . 220
10.7 Get yes/no response from user (C). . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
10.8 Get yes/no response from user (gcc assembly language). . . . . . . . . . . . . . . . 223
10.9 General structure of an if-else construct. . . . . . . . . . . . . . . . . . . . . . . . 224
10.10 Get yes/no response from user (programmer assembly language). . . . . . . . . . 225
10.11 Compound boolean expression in an if-else construct (C). . . . . . . . . . . . . . 227
10.12 Compound boolean expression in an if-else construct (gcc assembly language). 228
10.13 Simple for loop to perform multiplication. . . . . . . . . . . . . . . . . . . . . . . . 234
11.1 Passing arguments to a function (C). . . . . . . . . . . . . . . . . . . . . . . . . . . 238
11.2 Accessing arguments in the sumInts function from Listing 11.1 (gcc assembly
language). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
11.3 Accessing arguments in the sumInts function from Listing 11.1 (programmer as-
sembly language) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
11.4 Passing more than six arguments to a function (C). . . . . . . . . . . . . . . . . . . 243
11.5 Passing more than six arguments to a function (gcc assembly language). . . . . . 244
11.6 Passing more than six arguments to a function (programmer assembly language). 249
11.7 Passing more than six arguments to a function (gcc assembly language, 32-bit). . 252
12.1 Convert letters to upper/lower case (C). . . . . . . . . . . . . . . . . . . . . . . . . . 260
12.2 Convert letters to upper/lower case (gcc assembly language). . . . . . . . . . . . . 262
12.3 Convert letters to upper/lower case (programmer assembly language). . . . . . . 266
12.4 Shifting bits (C). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
12.5 Shifting bits (gcc assembly language). . . . . . . . . . . . . . . . . . . . . . . . . . 271
12.6 Shifting bits (programmer assembly language). . . . . . . . . . . . . . . . . . . . . 272
12.7 Convert decimal text string to int (C). . . . . . . . . . . . . . . . . . . . . . . . . . 277
12.8 Convert decimal text string to int (gcc assembly language). . . . . . . . . . . . . 278
12.9 Convert decimal text string to int (programmer assembly language). . . . . . . . 279
12.10 Convert unsigned int to decimal text string (C). . . . . . . . . . . . . . . . . . . . 282
12.11 Convert unsigned int to decimal text string (gcc assembly language). . . . . . . . 283
12.12 Convert unsigned int to decimal text string (programmer assembly language). . 285
13.1 Storing a value in one element of an array (C). . . . . . . . . . . . . . . . . . . . . 291
13.2 Storing a value in one element of an array (gcc assembly language). . . . . . . . . 292
13.3 Clear an array (C). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
13.4 Clear an array (gcc assembly language). . . . . . . . . . . . . . . . . . . . . . . . . 294
13.5 Clear an array (programmer assembly language). . . . . . . . . . . . . . . . . . . 295
13.6 Two struct variables (C). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
13.7 Two struct variables (gcc assembly language). . . . . . . . . . . . . . . . . . . . . 298
13.8 Two struct variables (programmer assembly language). . . . . . . . . . . . . . . . 299
13.9 Passing struct variables (C). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
13.10 Passing struct variables (gcc assembly language). . . . . . . . . . . . . . . . . . . 303
13.11 Passing struct variables — assembly language version. . . . . . . . . . . . . . . . 305
13.12 Add 1 to user’s’ fraction (C++). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
13.13 Add 1 to user’s’ fraction (C). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
13.14 Add 1 to user’s’ fraction (programmer assembly language). . . . . . . . . . . . . . 314
14.1 Fixed point addition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
14.2 Converting a fraction to a float. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
14.3 Converting a fraction to a float (gcc assembly language, 64-bit). . . . . . . . . . . 329
14.4 Converting a fraction to a float (gcc assembly language, 32-bit). . . . . . . . . . . 333
14.5 Use float for Loop Control Variable? . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
14.6 Are floats accurate? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
14.7 Casting integer to float in C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
14.8 Casting integer to float in assembly language. . . . . . . . . . . . . . . . . . . . . . 340
15.1 Using syscall to cat a file. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
16.1 Sketch of basic I/O functions using memory-mapped I/O — C version. . . . . . . . 356
16.2 Memory-mapped I/O in assembly language. . . . . . . . . . . . . . . . . . . . . . . 358
16.3 Sketch of basic I/O functions, isolated I/O — C version. . . . . . . . . . . . . . . . 361
Preface xv
This book introduces the concepts of how computer hardware works from a programmer‘s point
of view. A programmer‘s job is to design a sequence of instructions that will cause the hardware
to perform operations that solve a problem. This book looks at these instructions by exploring
how C/C++ language constructs are implemented at the instruction set architecture level.
The specific architecture presented in this book is the x86-64 that has evolved over the years
from the Intel 8086 processor. The GNU programming environment is used, and the operating
system kernel is Linux.
The basic guidelines I followed in creating this book are:
• One should avoid writing in assembly language except when absolutely necessary.
• “Real world” hardware and software make a more interesting platform for learning theo-
retical concepts.
• The tools used for teaching should be inexpensive and readily available.
It may seem strange that I would recommend against assembly language programming in
a book largely devoted to the subject. Well, C was introduced in 1978 specifically for low-level
programming. C code is much easier to write and to maintain than assembly language. C
compilers have evolved to a point where they produce better machine code than all but the best
assembly language programmers can. In addition, the hardware technology has increased such
that there is seldom any significant advantage in writing the most efficient machine code. In
short, it is hardly ever worth the effort to write in assembly language.
You might well ask why you should study assembly language, given that I think you should
avoid writing in it. I believe very strongly that the best programmers have a good understanding
of how computer hardware works. I think this principle holds in most fields: the best drivers
understand how automobiles work; the best musicians understand how their instrument works;
etc.
So this is not a book on how to write programs in assembly language. Most of the programs
you will be asked to write will be in assembly language, but they are very simple programs
intended to illustrate the concepts. I believe that this book will help you to become a better
programmer in any programming language, even if you never write another line of assembly
language.
Two issues arise immediately when studying assembly language:
• I/O interaction with a user through even the keyboard and screen is a very complex prob-
lem, well beyond the programming expertise of a beginner.
There are several ways to deal with these problems in a textbook. Some books use a simple
operating system for I/O, e.g., MS-DOS. Others provide libraries of I/O functions that are specific
for the examples in the book. Several textbooks deal with the instruction set issue by presenting
a simplified “idealized” architecture with a small number of instructions that is intended to
illustrate the concepts.
In keeping with the “real world” criterion of this book, it deals with these two issues by:
xvi
Preface xvii
1. showing you how to call the I/O functions already available in the C Standard Library, and
2. presenting only a small subset of the available instructions.
This has the additional advantage of not requiring additional software to be installed. In gen-
eral, all the programming discussed in the book and be done on any of the common Linux dis-
tributions that has been set up for software development with few or no changes.
Stand-alone
Readers who wish to write assembly language programs that do not use the C runtime envi- assembly
ronment should read Sections 8.5 (page 177) and 15.6 (page 345). language
If you do decide to write more complex programs in assembly language there are several programs.
other excellent books on that topic; see the Bibliography on page 485. And, of course, you would
want the manufacturer’s programming manuals; see for example [2] – [6] and [14] – [18]. The
goal here is to provide you with an introductory “look under the hood” of a high-level language
at the hardware that lies below.
This book also provides an introduction to computer hardware architecture. The view is
from a programmer‘s eye. Other excellent books provide implementation details. You need to
understand many of the implementation details, e.g., pipelining, caches, in order to write highly
optimized programs. This book provides the introduction that prepares you for learning about
more advanced architectural concepts.
This is not the place to argue about operating systems. I could rationalize my choice of
GNU/Linux, but I could also rationalize using others. Therefore, I will simply state that I
believe that GNU/Linux provides an excellent environment for studying programming in an
academic setting. One of the more important features of the GNU programming environment
with respect to the goals of this book is the close integration of C/C++ and assembly language.
In addition, I like GNU/Linux.
I wish to comment on my use of “GNU/Linux” instead of the simpler “Linux.” Much has
been written about these names. A good source of the various arguments can be found at
www.wikipedia.org. The two main points are that (a) Linux is only the kernel, and (b) all
general-purpose distributions rely on many GNU components for the remaining systems soft-
ware. Although “Linux” has become essentially a synomym for “GNU/Linux,” this book could
not exist without the GNU components, e.g., the assembler (as), the link editor (ld), the make
program, etc. Therefore, I wish to acknowledge the importance of the GNU project by using the
full “GNU/Linux” name.
In some ways, the x86-64 instruction set architecture is not the best choice for studying
computer architecture. It maintains backwards compatibility and is thus somewhat more com-
plicated at the instruction set level. However, it is by far the most widely deployed architecture
on the desktop and one of the least expensive way to set up a system where these concepts can
be studied.
Assembly language is my favorite subject in computer science, but I have taught the subject
to enough students to know that, realistically, it probably will not be the same for you. However,
please keep your eye on the long term. I am confident that material presented in this book will
help you to become a better programmer, and if you do enjoy assembly language, you will have
a good introduction to a more advanced study of it.
Assumed Background
You should have taken an introductory class in programming, preferably in C, C++, or Java.
The high-level language used in this book is C, however all the C programming is simple. I
am confident that the C programming examples in Chapters 2 and 3 will provide sufficient C
programming concepts to make the rest of the book very usable, regardless of the language you
learned in your introductory class.
I believe that more experienced programmers who wish to write for the x86-64 architecture
can also benefit from reading this book. In principle, these programmers can learn everything
they need to know from reading the appropriate manuals. However, I have found that it is
usually helpful to have an overview of a new architecture before tackling the manuals. This
book should provide that overview. In this sense, I believe that this book can provide a good
“introduction” to using the manuals.
xviii Preface
Development Environment
Most developers use an Integrated Development Environment (IDE), which hides the process of
building a program from source code. In this book we use the component programs individually
so that you can see what is taking place.
The examples in this book were compiled or assembled on a computer running Ubuntu 9.04.
The development programs used were:
• as version 2.19.1
In most cases compilation was done with no optimization (-O0) because the goal is to study
concepts, not create the most efficient code.
The examples should work in any x86_64 GNU development environment with gcc and as
(binutils) installed. However, the machine code generated by the compiler may differ depending
on its specific configuration and version. You will begin looking at compiler-generated assembly
language in Chapter 7. What you see in your environment may differ from the examples in this
book, but the differences should be consistent as you continue through the rest of the book.
You should also keep in mind that the programs used for development may have bugs. Yes,
nobody is perfect. For example, when I upgraded my Ubuntu system from 9.04 to 9.10, the
GNU assembler was upgraded from 2.19 to 2.20. The newer version had a bug that caused the
line numbering in a particular listing file to start from 0 instead of 1. (It affected the C source
code in Listing 7.6 on page 145; the numbers have been corrected in this listing.) Fortunately,
this bug did not affect the quality of the final program, but it could cause some confusion to the
programmer.
If the book is being used for a software-only course, the instructor could consider skipping over
these two chapters
Chapter 6 introduces the central processing unit (CPU) and its relationship to memory and
I/O. There is a description of how to use the gdb debugger to view the registers in the CPU. The
basic set of registers used by programmers in the x86-64 architecture is given in this chapter.
Assembly language programming is introduced in Chapter 7. The topic is introduced by
showing how to create a file containing the assembly language generated by the gcc compiler
from C code. The basic assembly language template for a function is introduced, both for 64-bit
and 32-bit mode. There is an overall sketch of how assemblers and linkers work.
In Chapter 8 we see how automatic variables are allocated on the stack, how values are
assigned to them, and how functions are called. Argument passing, both in registers and on the
stack, is discussed. The chapter shows how to call the write, read, printf, and scanf C Standard
Library functions for user I/O. There is also a section on writing standalone programs that do
not use the C environment and use the syscall instruction for direct operating system I/O.
Chapter 9 gives an introduction to machine code. There is a discussion of the REX codes
used in 64-bit mode. Two instructions, mov and add, are used as examples.
Program control flow, specifically repetition and binary decision, are covered in in Chapter
10. Conditional jumps are discussed in this chapter.
Chapter 11 discusses how to write your own functions and use the arguments passed to it.
Both the 64-bit and 32-bit function interface techniques are described.
Bit-level logical and shift operations are covered in Chapter 12. The multiplication and
division instructions are also discussed.
Arrays and structs are discussed in Chapter 13. This chapter includes a discussion of how
simple C++ objects are implemented at both the C and the assembly language level.
Until this point in the book we have been using integers. In Chapter 14 we introduce formats
for storing fractional values, including some IEEE 754 formats. In 64-bit mode the gcc compiler
uses SSE2 instructions for floating point, but x87 instructions are used in 32-bit mode. The
chapter gives an introduction to both instruction sets.
Exceptions and interrupts are discussed in Chapter 15. Chapter 16 is an introduction to
hardware level I/O. Since most students will never do I/O at this level, this is another chapter
that could be skipped.
A summary of the instructions used in this book is provided in Appendix A.5. At this point,
there is only a list of the instructions. Eventually, there will be a description of each of them.
Appendix B is a highly simplified discussion of the fundamental concepts of the make facility.
Appendix C provides a very brief tutorial on using gdb for assembly language programs.
Appendix D gives a very brief introduction to the gcc syntax for embedding assembly lan-
guage in a C function.
Almost all the solutions to the chapter exercises are provided in Appendix E. These can be
useful for students who wish to use the exercises for self study; if you find yourself getting stuck
on a problem, peek at the solution for some hints. Instructors are encouraged to discuss these
solutions with their students. There is much to be learned from looking at another person’s
solution and thinking about how you might do it better.
The Bibliography lists a small fraction of the many books I have consulted when learning
this material. I urge you to look at this list of books. I believe that you will want at least some
of them in your reference library.
Suggested Usage
• Our course at Sonoma State University covers each chapter approximately in the book’s
order. The programming exercises in Chapters 2 and 3 get the students used to using the
lab right from the beginning of the course. Hardware simulators are used in the lab for
Chapters 4 and 5.
• A pure assembly language course could easily omit Chapters 4 and 5.
• In a curriculum where binary numbers are covered in another course Chapters 2 and 3
could be skimmed. I recommend covering the C coding examples in Chapters 2 and 3 for
xx Preface
students who have not programmed in the language. This would provide an introduction
to C that should be adequate for the rest of the book.
• Experienced programmers who are using this book to learn x86-64 assembly language
on their own should be able to skim the first five chapters. I believe that the remaining
chapters would provide a good “primer” for reading the appropriate manuals.
Acknowledgements
I would like to thank the many students who have taken assembly language from me. They
have asked many questions that caused me to think about the subject and how I can better
explain it. They are the main reason I have written this book.
My special thanks go to David Tran, a student who used this book in a class taught by
Michael Lyle at Santa Rosa Junior College in Fall 2010. David caught many of my typos and
errors, and gave me many helpful suggestions for clarifying my writing. I am very grateful for
his careful reading of the book and the time he spent providing me with his comments. It is
definitely a better book as a result of his diligence.
I wish to thank Richard Gordon, Lynn Stauffer, Allan B. Cruse, Michael Lyle, and Suzanne
Rivoire for their thorough proofreading and critique of the previous versions of this book. By
teaching from this book they have caught many of my errors and provided many excellent sug-
gestions for clarifying the presentation.
In addition, I would like to thank my partner, João Barretto, for encouraging me to write this
book and putting up with my many hours spent at my computer.
Chapter 1
Introduction
My goal is to make this book available as inexpensively as possible, but I would appreciate
being paid for the work I did to write and produce it. As you know, a textbook like this
would ordinarily cost $50 – $100 if it were published through a mainstream publisher.
The author would probably get $5 – $15 of that cost. I am trying a different way to get
paid a “royalty” here.
I have made the book freely available in pdf format at bob.cs.sonoma.edu. Corrections,
updates, etc. for the book will also be posted there. As you can see from my copyright
notice above, you can only be charged the cost of the printing or copying service for a print
copy. I am leaving it up to you to decide how much of a “royalty” this book is worth to you
and how much you can afford to pay.
If you wish to pay me a “royalty” for my work please send it to my personal email account,
[email protected], using either
Unlike most assembly language books, this one does not emphasize writing programs in
assembly language. Higher-level languages, e.g., C, C++, Java, are much better for that. You
should avoid writing in assembly language whenever possible.
You may wonder why you should study assembly language at all. The usual reasons given
are:
1. Assembly language is more efficient. This does not always hold. Modern compilers are
excellent at optimizing the machine code that is generated. Only a very good assembly
language programmer can do better, and only in some situations. Assembly language
programming is very tedious, even for the best programmers. Hence, it is very expensive.
The possible gains in efficiency are seldom worth the added expense.
2. There are situations where it must be used. This is more difficult to evaluate. How do you
know whether assembly language is required or not?
Both these reasons presuppose that you know the assembly language equivalent of the trans-
lation that your compiler does. Otherwise, you would have no way of deciding whether you can
write a more efficient program in assembly language, and you would not know the machine level
limitations of your higher-level language. So this book begins with the fundamental high-level
1
2 CHAPTER 1. INTRODUCTION
language concepts and “looks under the hood” to see how they are implemented at the assembly
language level.
There is a more important reason for reading this book. The interface to the hardware from
a programmer’s view is the instruction set architecture (ISA). This book is a description of the
ISA of the x86 architecture as it is used by the C/C++ programming languages. Higher-level
languages tend to hide the ISA from the programmer, but good programmers need to understand
it. This understanding is bound to make you a better programmer, even if you never write a
single assembly language statement after reading this book.
Some of you will enjoy assembly language programming and wish to carry on. If your inter-
ests take you into systems programming, e.g., writing parts of an operating system, writing a
compiler, or even designing another higher-level language, an understanding of assembly lan-
guage is required. There are many challenging opportunities in programming embedded sys-
tems, and much of the work in this area demands at least an understanding of the ISA. This
book serves as an introduction to assembly language programming and prepares you to move
on to the intermediate and advanced levels.
In his book The Design and Evolution of C++[32] Bjarne Stroustrup nicely lists the purposes
of a programming language:
It is assumed that you have had at least an introduction to programming that covered the
first five items on the list. This book focuses on the first item — instructing machines — by
studying assembly language programming of a 64-bit x86 architecture computer. We will use C
as an example higher-level language and study how it instructs the computer at the assembly
language level. Since there is a one-to-one correspondence between assembly language and
machine language, this amounts to a study of how C is used to instruct a machine (computer).
You have already learned that a compiler (or interpreter) translates a program written in a
higher-level language into machine language, which the computer can execute. But what does
this mean? For example, you might wonder:
• What happens when one function calls another function? How does the computer know
how to return to the statement following the function call statement?
• How is a computer instructed to display a simple character string — for example, “Hello,
world” — on the screen?
It is the goal of this book to answer these and many other questions. The specific higher-level
programming language concepts that are addressed in this book include:
1.1. COMPUTER SUBSYSTEMS 3
This book assumes that you are familiar with these programming concepts in C, C++, and/or
Java.
Data Bus
Address Bus
Control Bus
Figure 1.1: Subsystems of a computer. The CPU, Memory, and I/O subsystems communicate
with one another via the three buses.
Central Processing Unit (CPU) controls most of the activities of the computer, performs the
arithmetic and logical operations, and contains a small amount of very fast memory.
Memory provides storage for the instructions for the CPU and the data they manipulate.
Input/Output (I/O) communicates with the outside world and with mass storage devices (e.g.,
disks).
When you create a new program, you use an editor program to write your new program in
a high-level language, for example, C, C++, or Java. The editor program sees the source code
4 CHAPTER 1. INTRODUCTION
for your new program as data, which is typically stored in a file on the disk. Then you use
a compiler program to translate the high-level language statements into machine instructions
that are stored in a disk file. Just as with the editor program, the compiler program sees both
your source code and the resulting machine code as data.
When it comes time to execute the program, the instructions are read from the machine
code disk file into memory. At this point, the program is a sequence of instructions stored in
memory. Most programs include some constant data that are also stored in memory. The CPU
executes the program by fetching each instruction from memory and executing it. The data are
also fetched as needed by the program.
This computer model — both the program instructions and data are stored in a memory unit
that is separate from the processing unit — is referred to as the von Neumann architecture. It
was described in 1945 by John von Neumann [35], although other computer science pioneers of
the day were working with the same concepts. This is in contrast to a fixed-program computer,
e.g., a calculator. A compiler illustrates one of the benefits of the von Neumann architecture. It
is a program that treats the source file as data, which it translates into an executable binary file
that is also treated as data. But the executable binary file can also be run as a program.
A downside of the von Neumann architecture is that a program can be written to view it-
self as data, thus enabling a self-modifying program. GNU/Linux, like most modern, general
purpose operating systems, prohibits applications from modifying themselves.
Most programs also access I/O devices, and each access must also be programmed. I/O de-
vices vary widely. Some are meant to interact with humans, for example, a keyboard, a mouse,
a screen. Others are meant for machine readable I/O. For example, a program can store a file
on a disk or read a file from a network. These devices all have very different behavior, and their
timing characteristics differ drastically from one another. Since I/O device programming is diffi-
cult, and every program makes use of them, the software to handle I/O devices is included in the
operating system. GNU/Linux provides a rich set of functions that an applications programmer
can use to perform I/O actions, and we will call upon these services of GNU/Linux to perform our
I/O operations. Before tackling I/O programming, you need to gain a thorough understanding of
how the CPU executes programs and interacts with memory.
The goal of this book is study how programs are executed by the computer. We will focus on
how the program and data are stored in memory and how the CPU executes instructions. We
leave I/O programming to more advanced books.
different memories, each with its own bus connected to the CPU. This makes it possible for the
CPU to access both program instructions and data simultaneously. The issues should become
clearer to you in Chapter 6.
In modern computers the bus connecting the CPU to external memory modules cannot keep
up with the execution speed of the CPU. The slowdown of the bus is called the von Neumann
bottleneck. Almost all modern CPU chips include some cache memory, which is connected to
the other CPU components with much faster internal buses. The cache memory closest to the
CPU commonly has a Harvard architecture configuration to achieve higher throughput of data
processing.
CPU interaction with I/O devices is essentially the same as with memory. If the CPU is
instructed to read a piece of data from an input device, the particular device is specified on the
address bus and a “read” signal is placed on the control bus. The device responds by placing the
data item on the data bus. And the CPU can send data to an output device by placing the data
item on the data bus, specifying the device on the address bus, and placing a “write” signal on
the control bus. Since the timing of various I/O devices varies drastically from CPU and memory
timing, special programming techniques must be used. Chapter 16 provides an introduction to
I/O programming techniques.
These few paragraphs are intended to provide you a very general overall view of how com-
puter hardware works. The rest of the book will explore many of these concepts in more depth.
Most of the discussion is at the ISA level, but we will also take a peek at the hardware imple-
mentation. In Chapter 4 we will even look at some transistor circuits. The goal of the book is to
provide you with an introduction to computer architecture as seen from a software point of view.
Chapter 2
In this chapter, we begin exploring how data is encoded for storage in memory and write some
programs in C to explore these concepts. One way to look at a modern computer is that it is
made up of:
• Millions, perhaps billions, of two-state switches. Each of the switches is always in one
state or the other, and it stays in that state until the control unit changes its state or the
power is turned off. and
There is also provision for communicating with the world outside the computer — input and
output.
We need a more concise notation, which leads us to use numbers. When dealing with numbers,
you are most familiar with the decimal system, which is based on ten, and thus uses ten digits.
Decimal digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
Two number systems are useful when talking about the states of switches — the binary system,
which is based on two,
Binary digits: 0, 1
Hexadecimal digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f
Octal digits: 0, 1, 2, 3, 4, 5, 6, 7
6
2.1. BITS AND GROUPS OF BITS 7
“Binary digit” is commonly shortened to “bit.” It is common to bypass the fact that a bit
A bit represents
represents the state of a switch, and simply call the switches “bits.” Using bits (binary digits), the state of an
we can greatly simplify the previous statement about switches as 1101, which you can think of on-off switch.
as representing “on, on, off, on.” It does not matter whether we use 1 to represent “on” and 0 as
“off,” or 0 as “on” and 1 as “off.” We simply need to be consistent. You will see that this will occur
naturally; it will not be an issue.
Hexadecimal is commonly used as a shorthand notation to specify bit patterns. Since there
Hexadecimal is
are sixteen hexadecimal digits, each one can be used to specify uniquely a group of four bits. Ta- shorthand for
ble 2.1 shows the correspondence between each possible group of four bits and one hexadecimal binary.
digit. Thus, the above English statement specifying the state of four switches can be written
with a single hexadecimal digit, d.
When it is not clear from the context, we will indicate the base of a number in this text with
a subscript. For example, 10010 is written in decimal, 10016 is written in hexadecimal, and 1002
is written in binary.
Hexadecimal digits are especially convenient when we need to specify the state of a group of,
say, 16 or 32 switches. In place of each group of four bits, we can write one hexadecimal digit.
For example,
0110 1100 0010 10102 = 6c2a16
and
0000 0001 0010 0011 1010 1011 1100 11012 = 0123 abcd16
A single bit has limited usefulness when we want to store data. We usually need to use
a group of bits to store a data item. This grouping of bits is so common that most modern
computers only allow a program to access bits in groups of eight. Each of these groups is called
a byte.
byte: A contiguous group of bits, usually eight.
Historically, the number of bits in a byte has varied depending on the hardware and the operat-
ing system. For example, the CDC 6000 series of scientific mainframe computers used a six-bit
byte. Nearly everyone uses “byte” to mean eight bits today.
Another important reason to learn hexadecimal is that the programming language may not
allow you to specify a value in binary. Prefixing a number with 0x (zero, lower-case ex) in C/C++
means that the number is expressed in hexadecimal. There is no C/C++ syntax for writing a
number in binary. The syntax for specifying bit patterns in C/C++ is shown in Table 2.2. (The
8 CHAPTER 2. DATA STORAGE FORMATS
32-bit pattern for the decimal value 123 will become clear after you read Sections 2.2 and 2.3.)
Although the GNU assembler, as, includes a notation for specifying bit patterns in binary, it is
usually more convenient to use the C/C++ notation.
Specifying bit
patterns in your Prefix Example 32-bit pattern (binary)
source code. Decimal: none 123 0000 0000 0000 0000 0000 0000 0111 1011
Hexadecimal: 0x 0x123 0000 0000 0000 0000 0000 0001 0010 0011
Octal: 0 0123 00 000 000 000 000 000 000 000 001 010 011
Table 2.2: C/C++ syntax for specifying literal numbers. Octal bits grouped by three for readabil-
ity.
1 × 100 + 2 × 10 + 3 × 1
or
1 × 102 × 101 + 3 × 100
The right-most digit (3 in this example) is the least significant digit because it “counts” the least
in the total value of this number. The left-most digit (1 in this example) is the most significant
digit because it “counts” the most in the total value of this number.
The base or radix of the decimal number system is ten. There are ten symbols for represent-
ing the digits: 0, 1, . . . , 9. Moving a digit one place to the left increases its value by a factor of
ten, and moving it one place to the right decreases its value by a factor of ten. The positional
notation generalizes to any radix, r:
where there are n digits in the number and each di = 0, 1, . . . , r-1. The radix in the binary
number system is 2, so there are only two symbols for representing the digits: di = 0, 1. We can
specialize Equation 2.1 for the binary number system as
1 × 27 + 0 × 26 + 1 × 25 + 0 × 24 + 0 × 23 + 1 × 22 + 0 × 21 + 1 × 20
128 + 0 + 32 + 0 + 0 + 4 + 1 + 1 = 16510
2.3. UNSIGNED DECIMAL TO BINARY CONVERSION 9
This example illustrates the method for converting a number from the binary number system
to the decimal number system. It is stated in Algorithm 2.1.
Algorithm 2.1: Convert binary to unsigned decimal.
input : An integer expressed in binary.
output: Decimal expression of the integer.
1 Compute the value of each power of 2 in Equation 2.2 in decimal.
2 Multiply each power of two by its corresponding di .
3 Sum the terms in Equation 2.2.
Be careful to distinguish the binary number system from writing the state of a bit in binary.
Each switch in the computer can be represented by a bit (binary digit), but the entity that it
represents may not even be a number, much less a number in the binary number system. For
example, the bit pattern 0011 0010 represents the character “2” in the ASCII code for characters.
But in the binary number system 0011 00102 = 5010 .
See Exercises 2-8 and 2-9 for converting hexadecimal to decimal.
Example 2-a
There are times in some programs when it is more natural to specify a bit pattern rather
than a decimal number. We have seen that it is possible to easily convert between the number
bases, so you could convert the bit pattern to a decimal value, then use that. It is usually much
easier to think of the bits in groups of four, then convert the pattern to hexadecimal.
For example, if your algorithm required the use of zeros alternating with ones:
0101 0101 0101 0101 0101 0101 0101 0101
The address of a particular byte never changes. That is, the 957th byte from the beginning
of memory will always remain the 957th byte. However, the state of each of the bits — either 0
or 1 — in any given byte can be changed.
2.4. MEMORY — A PLACE TO STORE DATA (AND OTHER THINGS) 11
Computer scientists typically express the address of each byte in memory in hexadecimal.
So we would say that the 957th byte is at address 0x3bc.
From the discussion of hexadecimal in Section 2.1 (page 6) we can see that the first sixteen
bytes in memory have the addresses 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, and f. Using the
notation
we show the (possible) contents (the state of the bits) of each of the first sixteen bytes of memory
in Figure 2.1.
Figure 2.1: Possible contents of the first sixteen bytes of memory; addresses shown in hexadeci-
mal, contents shown in binary. Note that the addresses are shown as 32-bit values.
(The contents shown here are arbitrary.)
The state of each bit is indicated by a binary digit (bit) and is arbitrary in Figure 2.1. The
bits have been grouped by four for readability. The grouping of the memory bits also shows that
we can use two hexadecimal digits to indicate the state of the bits in each byte, as shown in
Figure 2.2. For example, the contents of memory location 0000000b are 3c. That means the eight
bits that make up the twelfth byte in memory are set to the bit pattern 0011 1100.
Each
Address Contents Address Contents hexadecimal
00000000: 6a 00000008: f0 digit represents
four bits.
00000001: f0 00000009: 02
00000002: 5e 0000000a: 33
00000003: 00 0000000b: 3c
00000004: ff 0000000c: c3
00000005: 51 0000000d: 3c
00000006: cf 0000000e: 55
00000007: 18 0000000f: aa
Figure 2.2: Repeat of Figure 2.1 with contents shown in hex. Two hexadecimal characters are
required to specify one byte.
Once a bit (switch) in memory is set to either zero or one, it stays in that state until the
control unit actively changes it or the power is turned off. There is an exception. Computers
also contain memory in which the bits are permanently set. Such memory is called Read Only
Memory or ROM.
Read Only Memory (ROM) : Each bit is permanently set to either zero or one. The control
unit can read the state of each bit but cannot change it.
You have probably heard the term “RAM” used for memory that can be changed by the control
unit. RAM stands for Random Access Memory. The terminology used here is inconsistent.
“Random access” means that it takes the same amount of time to access any byte in the memory.
This is in contrast to memory that is sequentially accessible, e.g., tape. The length of time it
takes to access a byte on tape depends upon the physical location of the byte with respect to the
current tape position.
12 CHAPTER 2. DATA STORAGE FORMATS
Random Access Memory (RAM) : The control unit can read the state of each bit and can
change it.
A bit can be used to store data. For example, we could use a single bit to indicate whether a
student passes a course or not. We might use 0 for “not passed” and 1 for “passed.” A single bit
allows only two possible values of a data item. We cannot for example, use a single bit to store
a course letter grade — A, B, C, D, or F.
How many bits would we need to store a letter grade? Consider all possible combinations of
two bits:
00
01
10
11
Since there are only four possible bit combinations, we cannot represent all five letter grades
with only two bits. Let’s add another bit and look at all possible bit combinations:
000
001
010
011
100
101
110
111
There are eight possible bit patterns, which is more than sufficient to store any one of the five
letter grades. For example, we may choose to use the code
Letter Grade Bit Pattern
A 000
B 001
C 010
D 011
F 100
This example illustrates two issues that a programmer must consider when storing data in
memory in addition to its location(s):
How many bits are required to store the data? In order to answer this we need to know
how many different values are allowed for the particular data item. Study the two ex-
amples above — two bits and three bits — and you can see that adding a bit doubles the
number of possible values. Also, notice that we might not use all the possible bit patterns.
What is the code for storing the data? Most of the data we deal with in everyday life is not
expressed in terms of zeros and ones. In order to store it in computer memory, the program-
mer must decide upon a code of zeros and ones to use. In the above (three bit) example we
used 000 to represent a letter grade of A, 001 to represent B, etc.
Thus, in the grade example, a programmer may choose to store the letter grade at byte
number bffffed0 in memory. If the grade is “A”, the programmer would set the bit pattern
at location bffffed0 to 0016 . If the grade is “C”, the programmer would set the bit pattern at
location bffffed0 to 0216 . In this example, one of the jobs of an assembly language programmer
would be to determine how to set the bit pattern at byte number bffffed0 to the appropriate bit
pattern.
High-level languages use data types to determine the number of bits and the storage code.
For example, in C you may choose to store the letter grades in the above example in a char
variable and use the characters ’A’, ’B’,. . . ,’F’ to indicate the grade. In Section 2.7 you will learn
that the compiler would use the following storage formats:
2.5. USING C PROGRAMS TO EXPLORE DATA FORMATS 13
And programming languages, even assembly language, allow programmers to create sym-
bolic names for memory addresses. The compiler (or assembler) determines the correspondence
between the programmer’s symbolic name and the numerical address. The programmer can
refer to the address by simply using the symbolic name.
We will use the C programming language to illustrate these concepts because it takes care of
the memory allocation problem, yet still allows us to get reasonably close to the hardware. You
probably learned to program in the higher-level, object-oriented paradigm using either C++ or
Java. C does not support the object-oriented paradigm.
C is a procedural programming language. The program is divided into functions. Since there
are no classes in C, there is no such thing as a member function. The programmer focuses on
the algorithms used in each function, and all data items are explicitly passed to the functions.
We can see how this works by exploring the C Standard Library functions, printf and scanf,
which are used to write to the screen and read from the keyboard. We will develop a program
in C using printf and scanf to illustrate the concepts discussed in the previous sections. The
header file required by either of these functions is:
#include <stdio.h>
Use printf for
which includes the prototype statements for the printf and scanf functions:
formatted output
to the screen and
int printf(const char *format, ...);
scanf for
int scanf(const char *format, ...); formatted input
from the
printf is used to display text on the screen. The first argument, format, controls the text display. keyboard.
At its simplest, format is simply an explicit text string in double quotes.1 For example,
printf("Hello, world.\n");
would display
Hello, world.
If there are additional arguments, the format string must specify how each of these argu-
ments is to be converted for display. This is accomplished by inserting a conversion code within
the format string at the point where the argument value is to be displayed. Each conversion
code is introduced by the ’%’ character. For example, Listing 2.1 shows how to display both an
int variable and a float variable.
1 /*
2 * intAndFloat.c
3 * Using printf to display an integer and a float.
4 * Bob Plantz - 4 June 2009
5 */
1 The text string is a null-terminated array of characters as described in Section 2.7 (page 19). This is not the C++
string class.
14 CHAPTER 2. DATA STORAGE FORMATS
6 #include <stdio.h>
7
8 int main(void)
9 {
10 int anInt = 19088743;
11 float aFloat = 19088.743;
12
15 return 0;
16 }
A run of the program in Listing 2.1 on my computer gave (user input is boldface):
bob$ ./intAndFloat
The integer is 19088743 and the float is 19088.742188
bob$
Yes, the float really is that far off. This will be explained in Chapter 14.
Some common conversion codes are d or i for integer, f for float, x for hexadecimal. The
conversion codes may include other characters to specify properties like the field width of the
display, whether the value is left or right justified within the field, etc. We will not cover the
details here. You should read man page 3 for printf to learn more.
scanf is used to read from the keyboard. The format string typically includes only conversion
codes that specify how to convert each value as it is entered from the keyboard and stored in
the following arguments. Since the values will be stored in variables, it is necessary to pass the
address of the variable to scanf. For example, we can store keyboard-entered values in x (an int
variable) and y (a float variable) thusly
scanf needs the
address of each
scanf("%i %f", &x, &y);
variable.
The use of printf and scanf are illustrated in the C program in Listing 2.2, which will allow
us to explore the mathematical equivalence of the decimal and hexadecimal number systems.
1 /*
2 * echoDecHex.c
3 * Asks user to enter a number in decimal and one
4 * in hexadecimal then echoes both in both bases
5 * Bob Plantz - 4 June 2009
6 */
7
8 #include <stdio.h>
9
10 int main(void)
11 {
12 int x;
13 unsigned int y;
14
15 while(1)
16 {
17 printf("Enter a decimal integer (0 to quit): ");
18 scanf("%i", &x);
19 if (x == 0) break;
20
24
29 printf("End of program.\n");
30
31 return 0;
32 }
Listing 2.2: C program showing the mathematical equivalence of the decimal and hexadecimal
number systems.
• li lineNumber — lists ten lines of the source code, centered at the specified line number.
Useful gdb
commands.
• break sourceFilename:lineNumber — sets a breakpoint at the specified line in the source
file. Control will return to gdb when the line number is encountered.
• run — begins execution of a program that has been loaded under control of gdb.
• printf "format", var1, var2,... — displays the values of the vars, using the format
specified in the format string.2
We will use the program in Listing 2.1 to see how gdb can be used to explore the concepts
in more depth. Here is a screen shot of how I compiled the program then used gdb to control
the execution of the program and observe the memory contents. My typing is boldface and
the session is annotated in italics. Note that you will probably see different addresses if you
replicate this example on your own (Exercise 2-27).
The “-g” option is required. It tells the compiler to include debugger information in
the executable program.
The li command lists ten lines of source code. The display ends with the (gdb) prompt.
Pushing the return key will repeat the previous command, and li is smart enough to
display the next (up to) ten lines.
(gdb) br 13
Breakpoint 1 at 0x400523: file intAndFloat.c, line 13.
I set a breakpoint at line 13. When the program is executing, if it ever gets to this state-
ment, execution will pause before the statement is executed, and control will return to
gdb.
(gdb) run
Starting program: /home/bob/intAndFloat
The run command causes the program to start execution from the beginning. When it
reaches our breakpoint, control returns to gdb.
The print command displays the value currently stored in the named variable. There
is a round off error in the float value. As mentioned above, this will be explained in
Chapter 14.
The printf command can be used to format the displayed values. The formatting
string is essentially the same as for the printf function in the C Standard Library.
Take a moment and convert the hexadecimal values to decimal. The value of anInt is
correct, but the value of aFloat is 1908810 . The reason for this odd behavior is that the
x formatting character in the printf function first converts the value to an int, then
displays that int in hexadecimal. In C/C++, conversion from float to int truncates
the fractional part.
Fortunately, gdb provides another command for examining the contents of memory
directly — that is, the actual bit patterns. In order to use this command, we need
to determine the actual memory addresses where the anInt and aFloat variables are
stored.
18 CHAPTER 2. DATA STORAGE FORMATS
The address-of operator (&) can be used to print the address of a variable. Notice that
the addresses are very large. The system is in 64-bit mode, which uses 64-bit addresses.
(gdb does not display leading zeros.)
(gdb) help x
Examine memory: x/FMT ADDRESS.
ADDRESS is an expression for the memory address to examine.
FMT is a repeat count followed by a format letter and a size letter.
Format letters are o(octal), x(hex), d(decimal), u(unsigned decimal),
t(binary), f(float), a(address), i(instruction), c(char) and s(string).
Size letters are b(byte), h(halfword), w(word), g(giant, 8 bytes).
The specified number of objects of the specified size are printed
according to the format.
Defaults for format and size letters are those previously used.
Default count is 1. Default address is following last thing printed
with this command or "print".
The x command is used to examine memory. Its help message is very brief, but it tells
you everything you need to know.
The x command can be used to display the values in their stored data type.
The display of the aFloat variable in hexadecimal simply looks wrong. This is due to
the storage format of floats, which is very different from ints. It will be explained in
Chapter 14.
The byte by byte display of the aFloat variable in hexadecimal also shows that it is
stored in little endian order.
2.7. ASCII CHARACTER CODE 19
(gdb) cont
Continuing.
The integer is 19088743 and the float is 19088.742188
Finally, I continue to the end of the program. Notice that gdb is still running and I
have to quit the gdb program.
This example illustrates a property of the x86 processors. Data is stored in memory with the
least significant byte in the lowest-numbered address. This is called little endian storage. Look
again at the display of the four bytes beginning at 0x7fff56597b58 above. We can rearrange this
display to show the bit patterns at each of the four locations:
7fff86b6ddfc: 67
7fff86b6ddfd: 45
7fff86b6ddfe: 23
7fff86b6ddff: 01
Yet when we look at the entire 32-bit value in hexadecimal the bytes seem to be arranged in the
proper order:
7fff86b6ddfc: 01234567
When we examine memory one byte at a time, each byte is displayed in numerically ascend-
ing addresses. At first glance, the value appears to be stored backwards.
We should note here that many processors, e.g., the PowerPC architecture, use big endian
storage. As the name suggests, the most significant (“biggest”) byte is stored in the first (lowest-
numbered) memory address. If we ran the program above on a big endian computer, we would
see (assuming the variable is located at the same address):
Generally, you do not need to worry about endianess in a program. It becomes a concern when
data is stored as one data type, then accessed as another.
printf("Hello world\n");
and in C++:
When translating either of these statements into machine code, the compiler must do two things:
• store each of the characters in a location in memory where the control unit can access
them, and
Table 2.3: ASCII code for representing characters. The bit patterns (bit pat.) are shown in
hexadecimal.
We start by considering how a single character is stored in memory. There are many codes for
representing characters, but the most common one is the American Standard Code for Informa-
tion Interchange — ASCII (pronounced “ask’ e”). It uses seven bits to represent each character.
Table 2.3 shows the bit patterns for each character in hexadecimal.
It is not the sort of table that you would memorize. However, you should become familiar
Use the “man
ascii” with some of its general characteristics. In particular, notice that the numerical characters, ‘0’
GNU/Linux . . . ‘9’, are in a contiguous sequence in the code, 0x30 . . . 0x39. The same is true of the lower case
command. alphabetic characters, ‘a’ . . . ‘z’, and of the upper case characters, ‘A’ . . . ‘Z’. Notice that the lower
case alphabetic characters are numerically higher than the upper case.
The codes in the left-hand column of Table 2.3 (00 through 1f) define control characters. The
ASCII code was developed in the 1960s for transmitting data from a sender to a receiver. If you
read some of names of the control characters, you can imagine how they could be used to control
the“dialog” between the sender and receiver. They are generated on a keyboard by holding the
control key down while pressing an alphabetic key. For example, ctrl-d generates an EOT (End
of Transmission) character.
ASCII codes are usually stored in the rightmost seven bits of an eight-bit byte. The eighth bit
(the highest-order bit) is called the parity bit. It can be used for error detection in the following
way. The sender and receiver would agree ahead of time whether to use even parity or odd parity.
2.7. ASCII CHARACTER CODE 21
Even parity means that an even number of ones is always transmitted in each characters; odd
means that an odd number of ones is transmitted. Before transmitting a character in the ASCII
code, the sender would adjust the eighth bit such that the total number of ones matched the
even or odd agreement. When the code was received, the receiver would count the ones in
each eight-bit byte. If the sum did not match the agreement, the receiver knew that one of
the bits in the byte had been received incorrectly. Of course, if two bits had been incorrectly
received, the error would pass undetected, but the chances of this double error are remarkably
small. Modern communication systems are much more reliable, and parity is seldom used when
sending individual bytes.
In some environments the high-order bit is used to provide a code for special characters. A little thought
will show you that even all eight bits will not support all languages, e.g., Greek, Russian, Chinese. The
Unicode character coding has recently been adopted to support documents that use other characters.
Java uses Unicode, and C libraries that support Unicode are also available.
A computer system that uses an ASCII video system (most modern computers) can be pro-
grammed to send a byte to the screen. The video system interprets the bit pattern as an ASCII
code (from Table 2.3) and displays the corresponding character on the screen.
Getting back to the text string, “Hello world\n”, the compiler would store this as a constant
char array. There must be a way to specify the length of the array. In a C-style string this is
accomplished by using the sentinel character NUL at the end of the string. So the compiler must C-style strings
allocate thirteen bytes for this string. An example of how this string is stored in memory is are terminated
with a NUL
shown in Figure 2.3. Notice that C uses the LF character as a single newline character even character, not a
though the C syntax requires that the programmer write two characters — ’\n’. The area of newline.
memory shown includes the three bytes immediately following the text string.
Address Contents
4004a1: 48
4004a2: 65
4004a3: 6c
4004a4: 6c
4004a5: 6f
4004a6: 20
4004a7: 77
4004a8: 6f
4004a9: 72
4004aa: 6c
4004ab: 64
4004ac: 0a
4004ad: 00
4004ae: 25
4004af: 73
4004b0: 00
Figure 2.3: A text string stored in memory by a C compiler, including three “garbage” bytes
after the string. Values are shown in hexadecimal. A different compilation will
likely place the string in a different memory location.
In Pascal the length of the string is specified by the first byte in the string. It is taken to be an 8-bit
unsigned integer. So C-style strings are typically processed by sentinel-controlled loops, and count-
controlled string processing loops are more common in Pascal.
The C++ string class has additional features, but the actual text string is stored as a C-style text string
within the C++ string instance.
22 CHAPTER 2. DATA STORAGE FORMATS
1. STDOUT_FILENO is defined in the system header file, unistd.h.3 It is the GNU/Linux file
descriptor for standard out (usually the screen). GNU/Linux sees all devices as files. When
a program is started the operating system opens a path to standard out and assigns it as
file descriptor number 1.
2. &aLetter is a memory address. The sequence of one-byte bit patterns starting at this
address will be sent to standard out.
3. 1 (one) is the number of bytes that will be sent (to standard out) as a result of this call to
write.
7 #include <unistd.h>
8
9 int main(void)
10 {
11 char aLetter = ’A’;
12 write(STDOUT_FILENO, &aLetter, 1); // STDOUT_FILENO is
13 // defined in unistd.h
14 return 0;
15 }
Now let’s consider a program that echoes each character entered from the keyboard. We will
allocate a single char variable, read one character into the variable, and then echo the character
for the user with a message. The program will repeat this sequence one character at a time until
the user hits the return key. The program is shown in Listing 2.4.
A run of this program gave:
When testing
your programs,
read the screen bob$ ./echoChar1
very carefully. Enter one character: a
You entered: abob$
bob$
3 It is generally better to use symbolic names instead of plain numbers. The names provide implicit documentation,
which probably looks like the program is not working correctly to you.
Look more carefully at the program behaviour. It illustrates some important issues when
using the read function. First, how many keys did the user hit? There were actually two
keystrokes, the “a” key and the return key. In fact, the program waits until the user hits the
return key. The user could have used the delete key to change the character before hitting the
return key.
This shows that keyboard input is line buffered. Even though the application program is
requesting only one character, the operating system does not honor this request until the user
hits the return key, thus entering the entire line. Since the line is buffered, the user can edit
the line before entering it.
Next, the program correctly echoes the first key hit then terminates. Upon program termi-
nation the shell prompt, bob$, is displayed. But the return character is still in the input buffer,
and the shell program reads it. The result is the same as if the user had simply pressed the
return key in response to the shell prompt.
1 /*
2 * echoChar1.c
3 * Echoes a character entered by the user.
4 * Bob Plantz - 4 June 2009
5 */
6
7 #include <unistd.h>
8
9 int main(void)
10 {
11 char aLetter;
12
18 return 0;
19 }
Here is another run where I entered three characters before hitting the return key:
bob$ ./echoChar1
Enter one character: abc You entered: abob$ bc
bc 1.06.94
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY. For details type ‘warranty’.
Again, the program correctly echoes the first character, but the two characters bc remain in the
input line buffer. When echoChar1 terminates the shell program reads the remaining characters
from the line buffer and interprets them as a command. In this case, bc is a program, so the
shell executes that program.
An important point of the program in Listing 2.4 is to illustrate the simplistic behavior of
the write and read functions. They work at a very low level. It is your responsibility to design
your program to interpret each byte that is written to the screen or read from the keyboard.
24 CHAPTER 2. DATA STORAGE FORMATS
2.9 Exercises
2-1 (§2.1) Express the following bit patterns in hexadecimal.
a) 83af c) aaaa
b) 9001 d) 5555
2-3 (§2.1) How many bits are represented by each of the following?
a) ffffffff d) 111116
b) 7fff58b7def0 e) 000000002
c) 11112 f) 0000000016
2-4 (§2.1) How many hexadecimal digits are required to represent each of the following?
2-5 (§2.2) Refering to Equation 2.1, what are the values of r, n and each di for the decimal
number 29458254? The hexadecimal number 29458254?
2-6 (§2.2) Convert the following 8-bit numbers to decimal by hand:
a) 10101010 e) 10000000
b) 01010101 f) 01100011
c) 11110000 g) 01111011
d) 00001111 h) 11111111
a) 1010101111001101 e) 1000000000000000
b) 0001001000110100 f) 0000010000000000
c) 1111111011011100 g) 1111111111111111
d) 0000011111010000 h) 0011000000111001
2-8 (§2.2) In Section 2.2 we developed an algorithm for converting from binary to decimal.
Develop a similar algorithm for converting from hexadecimal to decimal. Use your new
algorithm to convert the following 8-bit numbers to decimal by hand:
a) a0 e) 64
b) 50 f) 0c
c) ff g) 11
d) 89 h) c8
2.9. EXERCISES 25
2-9 (§2.2) In Section 2.2 we developed an algorithm for converting from binary to decimal.
Develop a similar algorithm for converting from hexadecimal to decimal. Use your new
algorithm to convert the following 16-bit numbers to decimal by hand:
a) a000 e) 8888
b) ffff f) 0190
c) 0400 g) abcd
d) 1111 h) 5555
2-10 (§2.3) Convert the following unsigned, decimal integers to 8-bit hexadecimal representa-
tion.
a) 100 e) 255
b) 123 f) 16
c) 10 g) 32
d) 88 h) 128
2-11 (§2.3) Convert the following unsigned, decimal integers to 16-bit hexadecimal representa-
tion.
a) 1024 e) 256
b) 1000 f) 65635
c) 32768 g) 2005
d) 32767 h) 43981
2-12 (§2.3) Invent a code that would allow us to store letter grades with plus or minus. That is,
the grades A, A- B+, B, B-, . . . , D, D-, F. How many bits are required for your code?
2-13 (§2.3) We have shown how to write only the first sixteen addresses in hexadecimal in
Figure 2.1. How would you write the address of the seventeenth byte (byte number sixteen)
in hexadecimal? Hint: If we started with zero in the decimal number system we would use
a ‘9’ to represent the tenth item. How would you represent the eleventh item in the decimal
system?
2-14 (§2.3) Redo the table in Figure 2.2 such that it shows the memory contents in decimal.
2-15 (§2.3) Redo the table in Figure 2.2 such that it shows each of the sixteen bytes containing
its byte number. That is, byte number 0 contains zero, number 1 contains one, etc. Show
the contents in binary.
2-16 (§2.3) Redo the table in Figure 2.2 such that it shows each of the sixteen bytes containing
its byte number. That is, byte number 0 contains zero, number 1 contains one, etc. Show
the contents in hexadecimal.
2-17 (§2.4) You want to allocate an area in memory for storing any number between 0 and
4,000,000,000. This memory area will start at location 0x2fffeb96. Give the addresses of
each byte of memory that will be required.
2-18 (§2.4) You want to allocate an area in memory for storing an array of 30 bytes. The first
byte will have the value 0x00 stored in it, the second 0x01, the third 0x02, etc. This memory
area will start at location 0x001000. Show what this area of memory looks like.
2-19 (§2.4) In Section 2.4 we invented a binary code for representing letter grades. Referring to
that code, express each of the grades as an 8-bit unsigned decimal integer.
2-20 (§2.5) Enter the program in Listing 2.2 and check your answers for Exercise 2-6. Note that
printf and scanf do not have a conversion for binary. Check the answers in hexadecimal.
26 CHAPTER 2. DATA STORAGE FORMATS
2-21 (§2.5) Enter the program in Listing 2.2 and check your answers for Exercise 2-7. Note that
printf and scanf do not have a conversion for binary. Check the answers in hexadecimal.
2-22 (§2.5) Enter the program in Listing 2.2 and check your answers for Exercise 2-8.
2-23 (§2.5) Enter the program in Listing 2.2 and check your answers for Exercise 2-9.
2-24 (§2.5) Enter the program in Listing 2.2 and check your answers for Exercise 2-10.
2-25 (§2.5) Enter the program in Listing 2.2 and check your answers for Exercise 2-11.
2-26 (§2.5) Modify the program in Listing 2.2 so that it also displays the addresses of the x and
y variables. Note that addresses are typically displayed in hexadecimal. How many bytes
does the compiler allocate for each of the ints?
2-27 (§2.6) Enter the program in Listing 2.1. Follow through the program with gdb as in the
example in Section 2.6. Using the numbers you get, explain where the variables anInt and
aFloat are stored in memory and what is stored in each location.
2-28 (§2.7) Write a program in C that creates a display similar to Figure 2.3. Hints: use a char*
variable to process the string one character at a time; use %08x to format the display of the
address.
2-29 (§2.6) Enter the program in Listing 2.4. Explain why there seems to be an extra prompt
in the program. Set breakpoints at both the read statement and at the following write
statement. Examine the contents of the aLetter variable before the read and after it.
Notice that the behavior of gdb seems very strange when dealing with the read statement.
Explain the behavior. Hint: Both gdb and the program you are debugging use the same
keyboard for input.
2-30 (§2.8) Modify the program in Listing 2.4 so that it prompts the user to enter an entire line,
reads the line, then echoes the entire line. Read only one byte at a time from the keyboard.
2-31 (§2.8) This is similar to Exercise 2-30 except that when the newline character is read from
the keyboard (and stored in memory), the program replaces the newline character with
a NUL character. The program has now read a line from the keyboard and stored it as a
C-style text string. If your algorithm is correct, you will be able to read the text string
using the read low-level function and display it with the printf library function thusly
(assuming the variable where the string is stored is named theString),
printf("%s\n", theString);
and have only one newline. Notice that this program discards the newline generated when
the user hits the return key. This is the same behavior you would see if you used
scanf("\%s", theString);
in C, or
cin >> theString;
• writeStr takes one argument, a pointer to the string to be displayed and it returns
the number of characters actually displayed. It uses the write system call function to
write characters to the screen.
2.9. EXERCISES 27
• readLn takes two arguments, one that points to the char array where the characters
are to be stored and one that specifies the maximum number of characters to store in
the char array. Additional keystrokes entered by the user should be read from the OS
input buffer and discarded. readLn should return the number of characters actually
stored in the char array. readLn should not store the newline character (’\n’). It uses
the read system call function to read characters from the keyboard.
Chapter 3
Computer Arithmetic
We next turn our attention to a code for storing decimal integers. Since all storage in a computer
is by means of on/off switches, we cannot simply store integers as decimal digits. Exercises 3-1
and 3-2 should convince you that it will take some thought to come up with a good code that
uses simple on/off switches to represent decimal numbers.
Another very important issue when talking about computer arithmetic was pointed out in
Section 2.3 (page 9). Namely, the programmer must decide how many bits will be used for
storing the numbers before performing any arithmetic operations. This raises the possibility
that some results will not fit into the allocated number of bits. As you will see in Section 9.2
(page 189), the computer hardware provides for this possibility with the Carry Flag (CF) and
Overflow Flag (OF) in the rflags register located in the CPU. Depending on what you intend
the bit patterns to represent, either the Carry Flag or the Overflow Flag (not both) will indicate
the correctness of the result. However, most high level languages, including C and C++, do not
check the CF and OF after performing arithmetic operations.
1 1 ←− carries
67 ←− x
+ 79 ←− y
46 ←− sum
We start by working from the right, adding the two decimal digits in the ones place. 7 + 9
exceeds 10 by 6. We show this by placing a 6 in the ones place in the sum and carrying a 1 to
the tens place. Next we add the three decimal digits in the tens place, 1 (the carry into the tens
place from the ones place) + 6 + 7. The sum of these three digits exceeds 10 by 4, which we show
by placing a 4 in the tens place in the sum and recording the fact that there is an ultimate carry
of one. Recall that we had decided to use only two digits, so there is no hundreds place. Using
the notation of Equation 2.1 (page 8), we describe addition of two decimal integers in Algorithm
1 Most computer architectures provide arithmetic operations in other number systems, but these are somewhat spe-
28
3.1. ADDITION AND SUBTRACTION 29
3.1.
Algorithm 3.1: Add fixed-width decimal integers.
given: N, number of digits.
Starting in the ones place:
1 for i=0 to (N-1) do
2 sumi ⇐ (xi + yi ) % 10 ; // div operation
3 carry ⇐ (xi + yi ) / 10 ; // mod operation
4 i ⇐ i + 1;
Notice that:
• Algorithm 3.1 works because we use a positional notation when writing numbers — a digit
one place to the left counts ten times more.
• Carry from the current position one place to the left is always 0 or 1.
• The reason we use 10 in the / and % operations is that there are exactly ten digits in the
decimal number system : 0, 1, 2, . . . , 9.
• Since we are working in an N-digit system, we must restrict our result to N digits. The
final carry (0 or 1) must be stated in addition to the N-digit result.
By changing “10” to “2" we get Algorithm 3.2 for addition in the binary number system. The
only difference is that a digit one place to the left counts two times more.
Algorithm 3.2: Add fixed-width binary integers.
given: N, number of bits.
Starting in the ones place:
1 for i=0 to (N-1) do
2 sumi ⇐ (xi + yi ) % 2 ; // div operation
3 carry ⇐ (xi + yi ) / 2 ; // mod operation
4 i ⇐ i + 1;
Example 3-a
ones place:
sum0 = (1 + 1) % 2 = 0
carry = (1 + 1) / 2 = 1
twos place:
sum1 = (1 + 1 + 0) % 2 = 0
carry = (1 + 1 + 0) / 2 = 1
fours place:
sum2 = (1 + 0 + 1) % 2 = 0
carry = (1 + 0 + 1) / 2 = 1
eights place:
sum3 = (1 + 1 + 1) % 2 = 1
carry = (1 + 1 + 1) / 2 = 1
sixteens place:
sum4 = (1 + 0 + 0) % 2 = 1
carry = (1 + 0 + 0) / 2 = 0
30 CHAPTER 3. COMPUTER ARITHMETIC
thirty-twos place:
sum5 = (0 + 1 + 0) % 2 = 1
carry = (0 + 1 + 0) / 2 = 0
sixty-fours place:
sum6 = (0 + 0 + 0) % 2 = 1
carry = (0 + 0 + 0) / 2 = 0
one hundred twenty-eights place:
sum7 = (0 + 1 + 0) % 2 = 1
carry = (0 + 1 + 0) / 2 = 0
In this eight-bit example the result is 1111 1000, and there is there is no carry beyond the eight
bits. The lack of carry is recorded in the rflags register by setting the CF bit to zero.
It should not surprise you that this algorithm also works for hexadecimal. In fact, it works
for any radix, as shown in Algorithm 3.3.
Algorithm 3.3: Add fixed-width integers in any radix.
given: N, number of digits.
Starting in the ones place:
1 for i=0 to (N-1) do
2 sumi ⇐ (xi + yi ) % radix ; // div operation
3 carry ⇐ (xi + yi ) / radix ; // mod operation
4 i ⇐ i + 1;
For hexadecimal:
• A digit one place to the left counts sixteen times more.
• We use 16 in the / and % operations because there are sixteen digits in the hexadecimal
number system: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f.
Addition in hexadecimal brings up a notational issue. For example,
d + 9 = ?? Oops, how do we write this?
Although it is certainly possible to perform all the computations using hexadecimal notation,
most people find it a little awkward. After you have memorized Table 3.1 it is much easier to :
• convert the (hexadecimal) digit to its equivalent decimal value
• apply our algorithm
• convert the results back to hexadecimal
Actually, we did this when applying the algorithm to binary addition. Since the conversion of
binary digits to decimal digits is trivial, you probably did not think about it. But the conversion
of hexadecimal digits to decimal is not as trivial. To see how it works, first recall that the
conversion from hexadecimal to binary is straightforward. (You should have memorized Table
2.1 by now.) So we will consider conversion from binary to decimal.
As mentioned above, the relative position of each bit has significance. The rightmost bit
represents the ones place, the next one to the left the fours place, then the eights place, etc. In
other words, each bit represents 2n , where n = 0, 1, 2, 3,... and we start from the right. So the
binary number 1011 represents:
1 × 23 + 0 × 22 + 1 × 21 + 1 × 20
This is easily converted to decimal by simply working out the arithmetic in decimal:
1 × 23 + 0 × 22 + 1 × 21 + 1 × 20 = 8 + 0 + 2 + 1 = 11
3.1. ADDITION AND SUBTRACTION 31
From Table 2.1 on page 7 we see that 10112 = b16 , and we conclude that b16 = 1110 . We can add
a “decimal” column to the table, giving Table 3.1.
Table 3.1: Correspondence between binary, hexadecimal, and unsigned decimal values for the
hexadecimal digits.
Example 3-b
1 011 ←− carries
abcd ←− x
+ 6089 ←− y
0c56 ←− sum
Now we can see how Algorithm 3.3 with radix = 16 was applied in order to add the hexadeci-
mal numbers, abcd and 6089. Having memorized Table 3.1, we will convert between hexadecimal
and decimal “in our heads.”
ones place:
sum0 = (d + 9) % 16 = 6
carry = (d + 9) / 16 = 1
sixteens place:
sum1 = (1 + c + 8) % 16 = 5
carry = (1 + c + 8) / 16 = 1
two hundred fifty-sixes place:
sum2 = (1 + b + 0) % 16 = c
carry = (1 + b + 0) / 16 = 0
four thousand ninety-sixes place:
sum3 = (0 + a + 6) % 16 = 0
carry = (0 + a + 6) / 16 = 1
This four-digit example has an ultimate carry of 1, which is recorded in the rflags register
by setting the CF to one. The arithmetic was performed by first converting each digit to decimal.
It is then a simple matter to convert each decimal value back to hexadecimal (see Table 3.1) to
express the final answer in hexadecimal.
Let us now turn to the subtraction operation. As you recall from subtraction in the decimal
32 CHAPTER 3. COMPUTER ARITHMETIC
number system, you must sometimes borrow from the next higher-order digit in the minuend.
This is shown in Algorithm 3.4.
Algorithm 3.4: Subtract fixed-width integers in any radix.
given: N, number of bits.
Starting in the ones place, subtract Y from X:
1 for i=0 to (N-1) do
2 if yi ≤ xi then
3 differencei ⇐ xi − yi ;
4 borrow ⇐ 0;
5 else
6 j ⇐ i + 1;
7 while xj = 0 do
8 j ⇐ j + 1;
9 for j to i do
10 xj ⇐ xj - 1;
11 j ⇐ j - 1;
12 xj ⇐ xj + radix;
13 i ⇐ i + 1;
Example 3-c
ones place:
difference0 = 1 - 1 = 0
twos place:
Borrow from the fours place in the minuend.
The borrow becomes 2 in the twos place.
difference1 2 - 1 = 1
fours place:
Since we borrowed 1 from here, the minuend has a 0 left.
difference2 = 0 - 0 = 0
eights place:
difference3 = 1 - 1 = 0
sixteens place:
difference4 = 0 - 0 = 0
thirty-twos place:
Borrow from the sixty-fours place in the minuend.
The borrow becomes 2 in the thirty-twos place.
difference5 = 2 - 1 = 1
sixty-fours place:
Since we borrowed 1 from here, the minuend has a 0 left.
difference6 = 0 - 0 = 0
one hundred twenty-eights place:
3.2. ARITHMETIC ERRORS — UNSIGNED INTEGERS 33
difference7 = 1 - 1 = 0
This, of course, also works for hexadecimal, but remember that a digit one place to the left
counts sixteen times more. For example, consider x = 0x6089 and y = 0xab5d:
1 101 ←− borrows
6089 ←− x
− ab5d ←− y
b52c ←− sum
Notice in this second example that we had to borrow from “beyond the width” of the two
values. That is, the two values are each sixteen bits wide, and the result must also be sixteen
bits. Whether there is borrow “from outside” to the high-order digit is recorded in the CF of the
rflags register whenever a subtract operation is performed:
and CF = 0.
So far, the binary number system looks reasonable. Let’s try two larger four-bit numbers:
and CF = 1. The result, 2, is arithmetically incorrect. The problem here is that the addition
has produced carry beyond the fourth bit. Since this is not taken into account in the result, the
answer is wrong.
Now consider subtraction of the two numbers:
01002 = 0 ×23 + 1 ×22 + 0 ×21 + 0 ×20 = 410
- 11102 = 1 ×23 + 1 ×22 + 1 ×21 + 0 ×20 = -1410
01102 = 0 ×23 + 1 ×22 + 1 ×21 + 0 ×20 = 610
34 CHAPTER 3. COMPUTER ARITHMETIC
and CF = 1.
The result, 6, is arithmetically incorrect. The problem in this case is that the subtraction has
had to borrow from beyond the fourth bit. Since this is not taken into account in the result, the
answer is wrong.
From the discussion in Section 3.1 (page 28) you should be able to convince yourself that
these four-bit arithmetic examples generalize to any size arithmetic performed by the computer.
After adding two numbers, the Carry Flag will always be set to zero if there is no ultimate carry,
or it will be set to one if there is ultimate carry. Subtraction will set the Carry Flag to zero if
no borrow from the “outside” is required, or one if borrow is required. These examples illustrate
the principle:
• When adding or subtracting two unsigned integers, the result is arithmetically correct if
and only if the Carry Flag (CF) is set to zero.
It is important to realize that the CF and OF bits in the rflags register are always set to the
appropriate value, 0 or 1, each time an addition or subtraction is performed by the CPU. In
particular, the CPU will not ignore the CF when there is no carry, it will actively set the CF to
zero.
00102 = (+2)10
+ 10102 = + (-2)10
11002 = (-4)10
The result, -4, is arithmetically incorrect. We should note here that the problem is the way
in which the computer does addition — it performs binary addition on the bit patterns that
in themselves have no inherent meaning. There are computers that use this particular code
for storing signed decimal integers. They have a special “signed add” instruction. By the way,
notice that such computers have both a +0 and a -0!
Most computers, including the x86, use another code for representing signed decimal integers
— the two’s complement code. To see how this code works, we start with an example using the
decimal number system.
Say that you have a cassette player and wish to represent both positive and negative posi-
tions on the tape. It would make sense to somehow fast-forward the tape to its center and call
that point “zero.” Most cassette players have a four decimal digit counter that represents tape
position. The counter, of course, does not give actual tape position, but a “coded” representation
of the tape position. Since we wish to call the center of the tape “zero,” we push the counter reset
button to set it to 0000.
Now, moving the tape forward — the positive direction — will cause the counter to increment.
And moving the tape backward — the negative direction — will cause the counter to decrement.
In particular, if we start at zero and move to “+1” the “code” on the tape counter will show 0001.
On the other hand, if we start at zero and move to “-1” the “code” on the tape counter will show
9999.
Using our tape code system to perform the arithmetic in the previous example — (+2) + (-2):
The counter shows 0000, but there is a carry. (9998 + 2 = 0000 with carry = 1.) If we ignore the
carry, the answer is correct. This example illustrates the principle:
• When adding two signed integers in the two’s complement notation, carry is irrelevant.
The two’s complement code uses this pattern for representing signed decimal integers in bit
patterns. The correspondence between signed decimal (two’s complement), hexadecimal, and
binary for four-bit values is shown in Table 3.2.
• However, changing the sign of (negating) a number is more complicated than simply chang-
ing the high-order bit.
• The code allows for one more negative number than positive numbers.
• The range of integers, x, that can be represented in this code (with four bits) is
−810 ≤ x ≤ +710
or
−2(4−1) ≤ x ≤ +(2(4−1) − 1)
−2(n−1) ≤ x ≤ +(2(n−1) − 1)
x + (−x) = 2n (3.1)
Notice that 2n written in binary is “1” followed by n zeros. That is, it requires n+1 bits to
represent. Another way of saying this is, “in the n-bit two’s complement code adding a number
to its negative produces n zeros and carry.”
36 CHAPTER 3. COMPUTER ARITHMETIC
We now derive a method for computing the negative of a number in the two’s complement
code. Solving Equation 3.1 for −x, we get:
− x = 2n − x (3.2)
For example, if we wish to compute -1 in binary (in the two’s complement code) in 8 bits, we
perform the arithmetic:
or in hexadecimal:
−116 = 10016 − 0116 = f16
This subtraction is error prone, so let’s perform a few algebraic manipulations on Equation
3.2, which defines the negation operation. First, we subtract one from both sides:
− x − 1 = 2n − x − 1 (3.3)
Rearranging a little:
−x−1 = 2n − 1 − x
= (2n − 1) − x (3.4)
Now, consider the quantity (2n − 1). Since 2n is written in binary as one (1) followed by n
zeros, (2n − 1) is written as n ones. For example, for n = 8:
28 − 1 = 11111111 2 (3.5)
Thus, we can express the right-hand side of Equation 3.4 as
or in hexadecimal:
f16 − 0116 = fe16
The value of the right-hand side of Equation 3.7 is called the reduced radix complement of x.
Since the radix is two, it is common to call this the one’s complement of x. From Equation 3.4 we
see that this computation — the reduced radix complement of x — gives
This leads us to Algorithm 3.5 for negating any integer stored in the two’s complement, n-bit
code.
Algorithm 3.5: Negate a number in binary (compute 2’s complement).
We use x’ to denote the complement of x.
Negate in binary. 1 x ⇐ x’;
2 x ⇐ x + 1;
3.3. ARITHMETIC ERRORS — SIGNED INTEGERS 37
This process — computing the one’s complement, then adding one — is called computing the
two’s complement.
Be Careful!
• “In two’s complement” describes the storage code.
• “Taking the two’s complement” is an active computation. If the value the computation is ap-
plied to an integer stored in the two’s complement notation, this computation is mathematically
equivalent to negating the number.
Combining Algorithm 3.5 with observations about Table 3.2 above, we can easily compute
the decimal equivalent of any integer stored in the two’s complement notation by applying Al-
gorithm 3.6.
Algorithm 3.6: Signed binary-to-decimal conversion.
1 if the high-order bit is zero then
2 compute the decimal equivalent of the number;
3 else Convert binary to
signed decimal.
4 take the two’s complement (negate the number);
5 compute the decimal equivalent of this result;
6 place a minus sign in front of the decimal equivalent;
Example 3-d
The 16-bit integer 567816 is stored in two’s complement notation. Convert it to a signed, deci-
mal integer.
Since the high-order bit is zero, we simply compute the decimal equivalent:
Example 3-e
The 16-bit integer 876516 is stored in two’s complement notation. Convert it to a signed, decimal
integer.
Since the high-order bit is one, we first negate the number in the two’s complement format.
Place a minus sign in front of the number (since we negated it in the two’s complement domain).
876516 = −3087510
38 CHAPTER 3. COMPUTER ARITHMETIC
Algorithm 3.7 shows how to convert a signed decimal number to two’s complement binary.
Algorithm 3.7: Signed decimal-to-binary conversion.
1 if the number is positive then
2 simply convert it to binary;
3 else
Convert signed
decimal to 4 negate the number;
binary. 5 convert the result to binary;
6 compute the two’s complement of result in the binary domain;
Example 3-f
Convert the signed, decimal integer +31693 to a 16-bit integer in two’s complement notation.
Give the answer in hexadecimal.
Since this is a positive number, we simply convert it. The answer is to be given in hexadecimal,
so we will repetitively divide by 16 to get the answer.
So the answer is
3169310 = 7bcd16
Example 3-g
Convert the signed, decimal integer -250 to a 16-bit integer in two’s complement notation. Give
the answer in hexadecimal.
Since this is a negative number, we first negate it, giving +250. Then we convert this value. The
answer is to be given in hexadecimal, so we will repetitively divide by 16 to get the answer.
This gives us
25010 = 00fa16
−25010 = ff0616
3.4. OVERFLOW AND SIGNED DECIMAL INTEGERS 39
In this example, there is a carry of zero and a penultimate (next to last) carry of one. The OF
flag is equal to the exclusive or of carry and penultimate carry:
OF = CF ˆ penultimate carry
OF = 0 ˆ 1 = 1
Case 1: The two numbers are of opposite sign. We will let x be the negative number and y
the positive number. Then we can express x and y in binary as:
x = 1...
y = 0...
That is, the high-order bit of one number is 1 and the high-order bit of the other is 0,
regardless of what the other bits are. Now, if we add x and y, there are two possible results
with respect to carry:
We conclude that adding two integers of opposite sign always yields 0 for the overflow flag.
Next, notice that since y is positive and x negative:
0≤ y ≤ +(2(n−1) − 1) (3.11)
(n−1)
−2 ≤ x <0 (3.12)
x = 0...
y = 0...
That is, the high-order bit is 0, regardless of what the other bits are. Now, if we add x and
y, there are two possible results with respect to carry:
1. If the penultimate carry is zero:
carry −→ 0 0 ←− penultimate carry
0 . . . ←− x
+ 0 . . . ←− y
0 . . . ←− sum
this addition would produce OF = 0 ˆ 0 = 0. The high-order bit of the sum is zero, so it
is a positive number, and the sum is within range.
2. If the penultimate carry is one:
carry −→ 0 1 ←− penultimate carry
0 . . . ←− x
+ 0 . . . ←− y
1 . . . ←− sum
this addition would produce OF = 0 ˆ 1 = 1. The high-order bit of the sum is one, so it is
a negative number. Adding two positive numbers cannot yield a negative sum, so this
sum has exceeded the allocated range.
Case 3: Both numbers are negative. Since both are negative, we can express x and y in bi-
nary as:
x = 1...
y = 1...
That is, the high-order bit is 1, regardless of what the other bits are. Now, if we add x and
y, there are two possible results with respect to carry:
1. If the penultimate carry is zero:
carry −→ 1 0 ←− penultimate carry
1 . . . ←− x
+ 1 . . . ←− y
0 . . . ←− sum
3.4. OVERFLOW AND SIGNED DECIMAL INTEGERS 41
this addition would produce OF = 1 ˆ 0 = 1. The high-order bit of the sum is zero, so it
is a positive number. Adding two negative numbers cannot yield a negative sum, so
this sum has exceeded the allocated range.
2. If the penultimate carry is one:
carry −→ 1 1 ←− penultimate carry
1 . . . ←− x
+ 1 . . . ←− y
1 . . . ←− sum
this addition would produce OF = 1 ˆ 1 = 0. The high-order bit of the sum is one, so it
is a negative number, and the sum is within range.
Be Careful! Do not to confuse positive signed numbers with unsigned numbers. The range for un-
signed 32-bit integers is 0 – 4294967295, and for signed 32-bit integers the range is -2147483648 –
+2147483647.
The codes used for both unsigned integers and signed integers are circular in nature. That
is, for a given number of bits, each code “wraps around.” This can be seen pictorially in the
“Decoder Ring” shown in Figure 3.1 for three-bit numbers.
Example 3-h
Using the “Decoder Ring” (Figure 3.1), add the unsigned integers 3 + 4.
Working only in the inner ring, start at the tic mark for 3, which corresponds to the bit pat-
tern 011. The bit pattern corresponding to 4 is 100, which is four tic marks CW from zero. So
move four tic marks CW from the 3 tic mark. This places us at the tic mark labeled 111, which
corresponds to 7. Since we did not pass the tic mark at the top of the Decoder Ring, CF = 0. Thus,
the result is correct.
Example 3-i
Using the “Decoder Ring” (Figure 3.1), add the unsigned integers 5 + 6.
Working only in the inner ring, start at the tic mark for 5, which corresponds to the bit pattern
101. The bit pattern corresponding to 6 is 110, which is six tic marks CW from zero. So move six
tic marks CW from the 5 tic mark. This places us at the tic mark labeled 011, which corresponds
to 3. Since we have crossed the tic mark at the top of the Decoder Ring, the CF becomes 1. Thus,
the result is incorrect.
42 CHAPTER 3. COMPUTER ARITHMETIC
Figure 3.1: “Decoder Ring” for three-bit signed and unsigned integers. Move clockwise when
adding numbers, counter-clockwise when subtracting. Crossing over 000 sets the CF
to one, indicating an error for unsigned integers. Crossing over 100 sets the OF to
one, indicating an error for signed integers.
3.5. C/C++ BASIC DATA TYPES 43
Example 3-j
Using the “Decoder Ring” (Figure 3.1), add the signed integers (+1) + (+2).
Working only in the outer ring, start at the tic mark for +1, which corresponds to the bit pattern
001. The bit pattern corresponding to +2 is 010, which is two tic marks CW from zero. So move
two tic marks CW from the +1 tic mark. This places us at the tic mark labeled 011, which
corresponds to +3. Since we did not pass the tic mark at the bottom of the Decoder Ring, OF = 0.
Thus, the result is correct.
Example 3-k
Using the “Decoder Ring” (Figure 3.1), add the signed integers (+3) + (-4).
Working only in the outer ring, start at the tic mark for +3, which corresponds to the bit pattern
011. The bit pattern corresponding to -4 is 100, which is four tic marks CCW from zero. So move
four tic marks CCW from the +3 tic mark. This places us at the tic mark labeled 111, which
corresponds to -1. Since we did not pass the tic mark at the bottom of the Decoder Ring, OF = 0.
Thus, the result is correct.
Example 3-l
Using the “Decoder Ring” (Figure 3.1), add the signed integers (+3) + (+1).
Working only in the outer ring, start at the tic mark for +3, which corresponds to the bit pattern
011. The bit pattern corresponding to +1 is 001, which is one tic mark CW from zero. So move one
tic mark CW from the +3 tic mark. This places us at the tic mark labeled 100, which corresponds
to -4. Since we did pass the tic mark at the bottom of the Decoder Ring, OF = 1. Thus, the result
is incorrect.
Table 3.3: Sizes (in bits) of some C/C++ data types in 32-bit and 64-bit modes. The size of a long
depends on the mode. Pointers (addresses) are 32 bits in 32-bit mode and can be 32
or 64 bits in 64-bit mode.
are taken from the System V Application Binary Interface specifications, reference [33] for 32-
bit and reference [25] for 64-bit, and are used by the gcc compiler for the x86-64 architecture.
Language specifications tend to be more permissive in order to accommodate other hardware
architectures. For example, see reference [10] for the specifications for C.
44 CHAPTER 3. COMPUTER ARITHMETIC
A given “real world” value can usually be represented in more than one data type. For
example, most people would think of “123” as representing “one hundred twenty-three.” This
value could be stored in a computer in int format or as a text string. An int in our C/C++
environment is stored in 32 bits, and the bit pattern would be
0x0000007b
As a C-style text string, it would also require four bytes of memory, but their bit patterns would
be
The int format is easier to use in arithmetic and logical expressions, but the interface with
the outside world through the screen and the keyboard uses the char format. If a user entered
123 from the keyboard, the operating system would read the individual characters, each in char
format. The text string must be converted to int format. After the numbers are manipulated,
the result must be converted from the int format to char format for display on the screen.
C programmers use functions in the stdio library and C++ programmers use functions in
the iostream library to do these conversions between the int and char formats. For example,
the C code sequence
scanf("%i", &x);
x += 100;
printf("%i", x);
cin >> x;
x += 100;
cout << x;
• reads characters from the keyboard and converts the character sequence into the corre-
sponding int format.
• converts the resulting int into a character sequence and displays it on the screen.
The C or C++ I/O library functions in the code segments above do the necessary conversions
between character sequences and the int storage format. However, once the conversion is per-
formed, they ultimately call the read system call function to read bytes from the keyboard and
the write system call function to write bytes to the screen. As shown in Figure 3.2, an applica-
tion program can call the read and write functions directly to transfer bytes.
When using the read and write system call functions for I/O, it is the programmer’s respon-
sibility to do the conversions between the char type used for I/O and the storage formats used
within the program. We will soon be writing our own functions in assembly language to convert
between the character format used for screen display and keyboard input, and the internal stor-
age format of integers in the binary number system. The purpose of writing our own functions
is to gain a thorough understanding of how data is represented internally in the computer.
Aside: If the numerical data are used primarily for display, with few arithmetic operations, it makes
more sense to store numerical data in character format. Indeed, this is done in many business data
processing environments. But this makes arithmetic operation more complicated.
application
C I/O libraries
write read
OS
screen/keyboard
Figure 3.2: Relationship of I/O libraries to application and operating system. An application
can use functions in the I/O libraries to convert between keyboard/screen chars and
basic data types, or it can directly use the read /write system calls to transfer raw
bytes.
corresponding int storage format as shown in Algorithm 3.8. This conversion algorithm involves
manipulating data at the bit level.
Algorithm 3.8: Read hexadecimal value from keyboard.
1 x ⇐ 0;
2 Read character from keyboard;
3 while more characters do
4 x ⇐ x shifted left four bit positions;
5 y ⇐ new character converted to an int;
6 x ⇐ x + y;
7 Read character from keyboard;
8 Display the integer;
Let us examine this algorithm. Each character read from the keyboard represents a hex-
adecimal digit. That is, each character is one of ‘0’, . . . ,‘9’,‘a’, . . . ,‘f’. (We assume that the user
does not make mistakes.) Since a hexadecimal digit represents four bits, we need to shift the
accumulated integer four bits to the left in order to make room for the new four-bit value.
You should recognize that shifting an integer four bits to the left multiplies it by 16. As
you will see in Sections 12.3 and 12.4 (pages 273 and 280), multiplication and division are
complicated operations, and they can take a great deal of processor time. Using left/right shifts
to effect multiplication/division by powers of two is very efficient. More importantly, the four-bit
shift is more natural in this application.
The C/C++ operator for shifting bits to the left is «.2 For example, if x is an int, the statement
x = x << 4;
shifts the value in x four bits to the left, thus multiplying it by sixteen. Similarly, the C/C++
operator for shifting bits to the right is ». For example, if x is an int, the statement
x = x >> 3;
shifts the value in x three bits to the right, thus dividing it by eight. Note that the three right-
most bits are lost, so this is an integer div operation. The program in Listing 3.1 illustrates the
use of the C shift operators to multiply and divide by powers of two.
2 In C++ the » and « operators have been overloaded for use with the input and output streams.
46 CHAPTER 3. COMPUTER ARITHMETIC
1 /*
2 * mulDiv.c
3 * Asks user to enter an integer. Then prompts user to enter
4 * a power of two to multiply the integer, then another power
5 * of two to divide. Assumes that user does not request more
6 * than 32 as the power of 2.
7 * Bob Plantz - 4 June 2009
8 */
9
10 #include <stdio.h>
11
12 int main(void)
13 {
14 int x;
15 int leftShift, rightShift;
16
28 return 0;
29 }
Figure 3.3: Truth table for adding two bits with carry from a previous bit addition. x[i] is the
ith bit of x; carry[(i-1)] is the carry from adding the (i-1)th bits.
The bitwise logical operators act on the corresponding bits of two operands as shown in
Figure 3.4.
x[i] ∼x[i]
complement
0 1
1 0
Figure 3.4: Truth tables showing bitwise C/C++ operations. x[i] is the ith bit in the variable x.
Example 3-m
Let int x = 0x1234abcd. Compute the and, or, and xor with 0xdcba4321.
Make sure that you distinguish these bitwise logical operators from the C/C++ logical opera-
tors, &&, ||, and !. The logical operators work on groups of bits organized into integral data types
rather than individual bits. For comparison, the truth tables for the C/C++ logical operators are
shown in Figure 3.5
48 CHAPTER 3. COMPUTER ARITHMETIC
x y x && y
and
0 0 0
0 non-zero 0
non-zero 0 0
non-zero non-zero 1
x y x || y
or
0 0 0
0 non-zero 1
non-zero 0 1
non-zero non-zero 1
x !x
complement
0 1
non-zero 0
Figure 3.5: Truth tables showing C/C++ logical operations. x and y are variables of integral data
type.
Table 3.4: Hexadecimal characters and corresponding int. Note the change in pattern from ‘9’
to ‘a’.
Well, we still have an 8-bit value (with the four high-order bits zero), but we will work on this
in a moment.
Next consider the alphabetic hexadecimal digits in Table 3.4. Notice that the low-order four
bits are the same whether the character is upper case or lower case. We can use the same &
operation to obtain these four bits, then add 9 to the result:
Conversion from the 8-bit char type to the 32-bit int type is accomplished by a type cast in C.
3.6. OTHER CODES 49
The resulting program is shown in Listing 3.2. Notice that we use the printf function to
display the resulting stored value, both in hexadecimal and decimal. The conversion from stored
int format to hexadecimal display is left as an exercise (Exercise 3-13).
1 /*
2 * convertHex.c
3 * Asks user to enter a number in hexadecimal
4 * then echoes it in hexadecimal and in decimal.
5 * Assumes that user does not make mistakes.
6 * Bob Plantz - 4 June 2009
7 */
8
9 #include <stdio.h>
10 #include <unistd.h>
11
12 int main(void)
13 {
14 int x;
15 unsigned char aChar;
16
20 x = 0; // initialize result
21 read(STDIN_FILENO, &aChar, 1); // get first character
22 while (aChar != ’\n’) // look for return key
23 {
24 x = x << 4; // make room for next four bits
25 if (aChar <= ’9’)
26 {
27 x = x + (int)(aChar & 0x0f);
28 }
29 else
30 {
31 aChar = aChar & 0x0f;
32 aChar = aChar + 9;
33 x = x + (int)aChar;
34 }
35 read(STDIN_FILENO, &aChar, 1);
36 }
37
40 return 0;
41 }
BCD code as
0001 0010 0011 0100 # BCD
and in binary as
0000 0100 1101 0010 # binary
From Table 3.5 we can see that six bit patterns are “wasted.” The effect of this inefficiency is
that a 16-bit storage location has a range of 0 – 9999 if we use BCD, but the range is 0 – 65535
if we use binary.
BCD is important in specialized systems that deal primarily with numerical data. There
are I/O devices that deal directly with numbers in BCD without converting to/from a character
code, for example, ASCII. The COBOL programming language supports a packed BCD format
where two BCD characters are stored in each 8-bit byte. The last (4-bit) digit is used to store the
sign of the number as shown in Table 3.6. The specific codes used depend upon the particular
implementation.
one where there is only one bit that differs between any two adjacent values. As you will see in
Section 4.3, this property also allows for a very useful visual tool for simplifying Boolean algebra
expressions.
The Gray code is easily constructed. Start with one bit:
decimal Gray code
0 0
1 1
then add a zero to the beginning of each of the original bit patterns and a 1 to each of the
reflected ones:
decimal Gray code
0 00
1 01
2 11
3 10
Let us repeat these two steps to add another bit. Reflect the pattern:
Gray code
00
01
11
10
10
11
01
00
then add a zero to the beginning of each of the original bit patterns and a 1 to each of the
reflected ones:
decimal Gray code
0 000
1 001
2 011
3 010
4 110
5 111
6 101
7 100
The Gray code for four bits is shown in Table 3.7. Notice that the pattern of only changing
one bit between adjacent values also holds when the bit pattern “wraps around.” That is, only
one bit is changed when going from the highest value (15 for four bits) to the lowest (0).
52 CHAPTER 3. COMPUTER ARITHMETIC
3.7 Exercises
3-1 (§3.1) How many bits are required to store a single decimal digit?
3-2 (§3.1) Using the answer from Exercise 1, invent a code for storing eight decimal digits in
a thirty-two bit register. Using your new code, does binary addition produce the correct
results?
3-3 (§3.3) Select several pairs of signed integers from Table 3.2, convert each to binary using
the table, perform the binary addition, and check the results. Does this code always work?
3-4 (§3.3) If you did not select them in Exercise 3, add +4 and +5 using the four-bit, two’s
complement code (from Table 3.2). What answer do you get?
3-5 (§3.3) If you did not select them in Exercise 3, add -4 and -5 using the four-bit, two’s
complement code (from Table 3.2). What answer do you get?
3-6 (§3.3) Select any positive integer from Table 3.2. Add the binary representation for the
positive value to the binary representation for the negative value. What is the four-bit
result? What is the value of the CF? The OF? If you do the addition “on paper” (that is,
you can use as many digits as you wish), how could you express, in English, the result of
adding the positive representation of an integer to its negative representation in the two’s
complement notation? The negative representation to the positive representation? Which
two integers do not have a representation of the opposite sign?
3-7 (§3.3) The following 8-bit hexadecimal values are stored in two’s complement format. What
are the equivalent signed decimal numbers?
a) 55 e) 80
b) aa
f) 63
c) f0
d) 0f g) 7b
3.7. EXERCISES 53
3-8 (§3.3) The following 16-bit hexadecimal values are stored in two’s complement format.
What are the equivalent signed decimal numbers?
a) 1234 e) 8000
b) edcc f) 0400
c) fedc g) ffff
d) 07d0 h) 782f
3-9 (§3.3) Show how each of the following signed, decimal integers would be stored in 8-bit
two’s complement format. Give your answer in hexadecimal.
a) 100 e) 127
b) -1 f) -16
c) -10 g) -32
d) 88 h) -128
3-10 (§3.3) Show how each of the following signed, decimal integers would be stored in 16-bit
two’s complement format. Give your answer in hexadecimal.
a) 1024 e) -256
b) -1024 f) -32768
c) -1 g) -32767
d) 32767 h) -128
3-11 (§3.4) Perform binary addition of the following pairs of 8-bit numbers (shown in hexadeci-
mal) and indicate whether your result is “right” or “wrong.” First treat them as unsigned
values, then as signed values (stored in two’s complement format). Thus, you will have two
“right/wrong” answers for each sum. Note that the computer performs only one addition,
setting both the CF and OF according to the results of the addition. It is up to the program
to test the appropriate flag depending on whether the numbers are being considered as
unsigned or signed in the program.
a) 55 + aa d) 63 + 7b
b) 55 + f0 e) 0f + ff
c) 80 + 7b f) 80 + 80
3-12 (§3.4, 3.5) Perform binary addition of the following pairs of 16-bit numbers (shown in hex-
adecimal) and indicate whether your result is “right” or “wrong.” First treat them as un-
signed values, then as signed values (stored in two’s complement format). Thus, you will
have two “right/wrong” answers for each sum. Note that the computer performs only one
addition, setting both the CF and OF according to the results of the addition. It is up to
the program to test the appropriate flag depending on whether the numbers are being
considered as unsigned or signed in the program.
3-13 (§3.5) Enter the program in Figure 3.1 and get it to work. Use the program to compute 1
(one) multiplied by 2 raised to the 31st power. What result do you get for 1 (one) multiplied
by 2 raised to the 32nd power? Explain the results.
54 CHAPTER 3. COMPUTER ARITHMETIC
3-14 (§3.5) Write a C program that prompts the user to enter a hexadecimal value, multiplies
it by ten, then displays the result in hexadecimal. Your main function should
a) declare a char array,
b) call the readLn function to read from the keyboard,
c) call a function to convert the input text string to an int,
d) multiply the int by ten,
e) call a function to convert the int to its corresponding hexadecimal text string,
f) call writeStr to display the resulting hexadecimal text string.
Use the readLn and writeStr functions from Exercise 2 -32 to read from the keyboard and
display on the screen. Place the functions to perform the conversions in separate files.
Hint: review Figure 3.2.
3-15 (§3.5) Write a C program that prompts the user to enter a binary value, multiplies it by
ten, then displays the result in binary. (“Binary” here means that the user communicates
with the program in ones and zeros.) Your main function should
a) declare a char array,
b) call the readLn function to read from the keyboard,
c) call a function to convert the input text string to an int,
d) multiply the int by ten,
e) call a function to convert the int to its corresponding binary text string,
f) call writeStr to display the resulting binary text string.
Use the readLn and writeStr functions from Exercise 2 -32 to read from the keyboard and
display on the screen. Your functions to convert from a binary text string to an int and
back should be placed in separate functions.
3-16 (§3.5) Write a C program that prompts the user to enter unsigned decimal integer, mul-
tiplies it by ten, then displays the result in binary. (“Binary” here means that the user
communicates with the program in ones and zeros.) Your main function should
a) declare a char array,
b) call the readLn function to read from the keyboard,
c) call a function to convert the input text string to an int,
d) multiply the int by ten,
e) call a function to convert the int to its corresponding decimal text string,
f) call writeStr to display the resulting decimal text string.
Use the readLn and writeStr functions from Exercise 2 -32 to read from the keyboard and
display on the screen. Your function to convert from a decimal text string to an int should
be placed in a separate function. Hint: this problem cannot be solved by simply shifting
bit patterns. Think carefully about the mathematical equivalence of shifting bit patterns
left or right.
3-17 (§3.5) Modify the program in Exercise 3-16 so that it works with signed decimal integers.
Chapter 4
Logic Gates
This chapter provides an overview of the hardware components that are used to build a com-
puter. We will limit the discussion to electronic computers, which use transistors to switch
between two different voltages. One voltage represents 0, the other 1. The hardware devices
that implement the logical operations are called logic gates.
• AND — a binary operator; the result is 1 if and only if both operands are 1; otherwise the
result is 0. We will use ’·’ to designate the AND operation. It is also common to use the
’∧’ symbol or simply “AND”. The hardware symbol for the AND gate is shown in Figure
4.1. The inputs are x and y. The resulting output, x · y, is shown in the truth table in this
figure.
x y x·y
0 0 0
0 1 0
x 1 0 0
x·y
y
1 1 1
We can see from the truth table that the AND operator follows similar rules as multiplica-
tion in elementary algebra.
55
56 CHAPTER 4. LOGIC GATES
• OR — a binary operator; the result is 1 if at least one of the two operands is 1; otherwise
the result is 0. We will use ’+’ to designate the OR operation. It is also common to use the
’∨’ symbol or simply “OR”. The hardware symbol for the OR gate is shown in Figure 4.2.
The inputs are x and y. The resulting output, x + y, is shown in the truth table in this
figure. From the truth table we can see that the OR operator follows the same rules as
x y x+y
0 0 0
0 1 1
x 1 0 1
x+y
y
1 1 1
1 + 1 = 1
in Boolean algebra. Unlike elementary algebra, there is no carry from the OR operation.
Since addition of integers can produce a carry, you will see in Section 5.1 that implementing
addition requires more than a simple OR gate.
• NOT — a unary operator; the result is 1 if the operand is 0, or 0 if the operand is 1. Other
names for the NOT operation are complement and invert. We will use x′ to designate the
NOT operation. It is also common to use ¬x, or x. The hardware symbol for the NOT gate
is shown in Figure 4.3. The input is x. The resulting output, x′ , is shown in the truth table
in this figure.
x x′
0 1
x x′ 1 0
The NOT operation has no analog in elementary algebra. Be careful to notice that in-
version of a value in elementary algebra is a division operation, which does not exist in
Boolean algebra.
Two-state variables can be combined into expressions with these three operators in the same
way that you would use the C/C++ operators &&, ||, and ! to create logical expressions commonly
used to control if and while statements. We now examine some Boolean algebra properties for
manipulating such expressions. As you read through this material, keep in mind that the same
techniques can be applied to logical expressions in programming languages.
These properties are commonly presented as theorems. They are easily proved from applica-
tion of truth tables.
There is a duality between the AND and OR operators. In any equality you can interchange
AND and OR along with the constants 0 and 1, and the equality still holds. Thus the properties
will be presented in pairs that illustrate their duality. We first consider properties that are the
same as in elementary algebra.
x · (y · z) = (x · y) · z (4.1)
x + (y + z) = (x + y) + z (4.2)
It is straightforward to prove these equations with truth tables. For example, for Equation
4.1:
4.1. BOOLEAN ALGEBRA 57
x y z (y · z) (x · y) x · (y · z) = (x · y) · z
0 0 0 0 0 0 0
0 0 1 0 0 0 0
0 1 0 0 0 0 0
0 1 1 1 0 0 0
1 0 0 0 0 0 0
1 0 1 0 0 0 0
1 1 0 0 1 0 0
1 1 1 1 1 1 1
x y z (y + z) (x + y) x + (y + z) = (x + y) + z
0 0 0 0 0 0 0
0 0 1 1 0 1 1
0 1 0 1 1 1 1
0 1 1 1 1 1 1
1 0 0 0 1 1 1
1 0 1 1 1 1 1
1 1 0 1 1 1 1
1 1 1 1 1 1 1
Now we consider properties where Boolean algebra differs from elementary algebra.
• AND and OR are commutative:
x·y = y·x (4.5)
x+y = y+x (4.6)
This is easily proved by looking at the second and third lines of the respective truth tables.
In elementary algebra, only the addition and multiplication operators are commutative.
• AND and OR have a null value:
x·0 = 0 (4.7)
x+1 = 1 (4.8)
The null value for the AND is the same as multiplication in elementary algebra. But
addition in elementary algebra does not have a null constant, while OR in Boolean algebra
does.
• AND and OR have a complement value:
x · x′ = 0 (4.9)
x + x′ = 1 (4.10)
Complement does not exist in elementary algebra.
• AND and OR are idempotent:
x·x = x (4.11)
x+x = x (4.12)
That is, repeated application of either operator to the same value does not change it. This
differs considerably from elementary algebra — repeated application of addition is equiv-
alent to multiplication and repeated application of multiplication is the power operation.
58 CHAPTER 4. LOGIC GATES
literal A presence of a variable or its complement in an expression. For example, the expression
x · y + x′ · z + x′ · y ′ · z ′
From the context of the discussion you should be able to tell which meaning of “literal” is in-
tended and when the “·” operator is omitted.
A Boolean expression is created from the numbers 0 and 1, and literals. Literals can be com-
bined using either the “·” or the “+” operators, which are multiplicative and additive operations,
respectively. We will use the following terminology.
product term: A term in which the literals are connected with the AND operator. AND is
multiplicative, hence the use of “product.”
minterm or standard product: A product term that contains each of the variables in the
problem, either in its complemented or uncomplemented form. For example, if a prob-
lem involves three variables (say, x, y, and z), x · y · z, x′ · y · z ′ , and x′ · y ′ · z ′ are all minterms,
but x · y is not.
sum of products (SoP): One or more product terms connected with OR operators. OR is addi-
tive, hence the use of “sum.”
sum of minterms (SoM) or canonical sum: An SoP in which each product term is a minterm.
Since all the variables are present in each minterm, the canonical sum is unique for a given
problem.
When first defining a problem, starting with the SoM ensures that the full effect of each
variable has been taken into account. This often does not lead to the best implementation. In
Section 4.3 we will see some tools to simplify the expression, and hence, the implementation.
It is common to index the minterms according to the values of the variables that would cause
that minterm to evaluate to 1. For example, x′ · y ′ · z ′ = 1 when x = 0, y = 0, and z = 0, so this
would be m0 . The minterm x′ · y · z ′ evaluates to 1 when x = 0, y = 1, and z = 0, so is m2 . Table
4.1 lists all the minterms for a three-variable expression.
minterm x y z
m0 = x′ · y ′ · z ′ 0 0 0
m1 = x′ · y ′ · z 0 0 1
m2 = x′ · y · z ′ 0 1 0
m3 = x′ · y · z 0 1 1
m4 = x · y ′ · z ′ 1 0 0
m5 = x · y ′ · z 1 0 1
m6 = x · y · z ′ 1 1 0
m7 = x · y · z 1 1 1
Table 4.1: Minterms for three variables. mi is the ith minterm. The x, y, and z values cause the
corresponding minterm to evaluate to 1.
P
A convenient notation for expressing a sum of minterms is to use the symbol with a
numerical list of the minterm indexes. For example,
F (x, y, z) = x′ · y ′ · z ′ + x′ · y ′ · z + x · y ′ · z + x · y · z ′
= m0 + m1 + m5 + m6
X
= (0, 1, 5, 6) (4.18)
As you might expect, each of the terms defined above has a dual definition.
sum term: A term in which the literals are connected with the OR operator. OR is additive,
hence the use of “sum.”
60 CHAPTER 4. LOGIC GATES
maxterm or standard sum: A sum term that contains each of the variables in the problem, ei-
ther in its complemented or uncomplemented form. For example, if an expression involves
three variables, x, y, and z, (x + y + z), (x′ + y + z ′ ), and (x′ + y ′ + z ′ ) are all maxterms, but
(x + y) is not.
product of sums (PoS): One or more sum terms connected with AND operators. AND is mul-
tiplicative, hence the use of “product.”
product of maxterms (PoM) or canonical product: A PoS in which each sum term is a max-
term. Since all the variables are present in each maxterm, the canonical product is unique
for a given problem.
It also follows that any Boolean function can be uniquely expressed as a product of max-
terms (PoM) that evaluate to 1. Starting with the product of maxterms ensures that the full
effect of each variable has been taken into account. Again, this often does not lead to the best
implementation, and in Section 4.3 we will see some tools to simplify PoMs.
It is common to index the maxterms according to the values of the variables that would cause
that maxterm to evaluate to 0. For example, x + y + z = 0 when x = 0, y = 0, and z = 0, so this
would be M0 . The maxterm x′ + y + z ′ evaluates to 0 when x = 1, y = 0, and z = 1, so is m5 .
Table 4.2 lists all the maxterms for a three-variable expression.
M axterm x y z
M0 = x + y + z 0 0 0
M1 = x + y + z ′ 0 0 1
M2 = x + y ′ + z 0 1 0
M3 = x + y ′ + z ′ 0 1 1
M4 = x′ + y + z 1 0 0
M5 = x′ + y + z ′ 1 0 1
M6 = x′ + y ′ + z 1 1 0
M7 = x′ + y ′ + z ′ 1 1 1
Table 4.2: Maxterms for three variables. Mi is the ith maxterm. The x, y, and z values cause
the corresponding maxterm to evaluate to 0.
Q
The similar notation for expressing a product of maxterms is to use the symbol with a
numerical list of the maxterm indexes. For example (and see Exercise 4-8),
The names “minterm” and “maxterm” may seem somewhat arbitrary. But consider the two
functions,
F1 (x, y, z) = x · y · z
F2 (x, y, z) = x + y + z
There are eight (23 ) permutations of the three variables, x, y, and z. F1 has one minterm and
evaluates to 1 for only one of the permutations, x = y = z = 1. F2 has one maxterm and
evaluates to 1 for all permutations except when x = y = z = 0. This is shown in the following
truth table:
4.3. BOOLEAN FUNCTION MINIMIZATION 61
minterm maxterm
x y z F1 = (x · y · z) F2 = (x + y + z)
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 0 1
1 0 0 0 1
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1
ORing more minterms to an SoP expression expands the number of cases where it evaluates
to 1, and ANDing more maxterms to a PoS expression reduces the number of cases where it
evaluates to 1.
x y
(x · y ′ ) + (x′ · y) + (x · y)
which you recognize as the simple OR operation. It is easy to see that this is a minimal sum of
products for this function. We can implement Equation 4.20 with a single OR gate — see Figure
4.2 on page 56. This is clearly a less expensive, faster circuit than the one shown in Figure 4.4.
To illustrate how a product of sums expression can be minimized, consider the function:
The expression on the right-hand side is a PoM. The circuit for this function is shown in Figure
4.5. It requires three OR gates, one AND gate, and two NOT gates.
x y
(x + y ′ ) · (x′ + y) · (x′ + y ′ )
We will use the distributive property (Equation 4.14) on the right two factors and recognize
the complement (Equation 4.9):
Now, use the distributive (Equation 4.13) and complement (Equation 4.9) properties to obtain:
F2 (x, y, z) = x · x′ + x′ · y ′ (4.28)
= x′ · y ′ (4.29)
Thus, the function can be implemented with two NOT gates and a single AND gate, which is
clearly a minimal product of sums. Again, with a little algebraic manipulation we have arrived
at a much simpler solution.
4.3. BOOLEAN FUNCTION MINIMIZATION 63
Example 4-a
F (w, x, y, z) = w′ · x′ · y ′ · z ′ + w′ · x′ · y · z ′ + w′ · x · y ′ · z ′ + w′ · x · y · z ′
+w · x′ · y ′ · z ′ + w · x′ · y · z ′ + w · x · y ′ · z ′ + w · x · y · z ′
F (w, x, y, z) = z ′ · (w′ · x′ · y ′ + w′ · x′ · y + w′ · x · y ′ + w′ · x · y
+w · x′ · y ′ + w · x′ · y + w · x · y ′ + w · x · y)
= z ′ · (w′ · (x′ · y ′ + x′ · y + x · y ′ + x · y) + w · (x′ · y ′ + x′ · y + x · y ′ + x · y))
= z ′ · (w′ + w) · (x′ · y ′ + x′ · y + x · y ′ + x · y)
= z ′ · (w′ + w) · (x′ · (y ′ + y) + x · (y ′ + y))
= z ′ · (w′ + w) · (x′ + x) · (y ′ + y)
F (x, y, z) = z′
F (x, y) y
0 1
0 m0 m1
x
1 m2 m3
each row is shown by the number (0 or 1) immediately to the left of the row, and the value of y
for each column appears at the top of the column.
The procedure for simplifying an SoP expression using a Karnaugh map is:
1. Place a 1 in each cell that corresponds to a minterm that evaluates to 1 in the expression.
2. Combine cells with 1s in them and that share edges into the largest possible groups. Larger
groups result in simpler expressions. The number of cells in a group must be a power of
2. The edges of the Karnaugh map are considered to wrap around to the other side, both
vertically and horizontally.
3. Groups may overlap. In fact, this is common. However, no group should be fully enclosed
by another group.
4. The result is the sum of the product terms that represent each group.
64 CHAPTER 4. LOGIC GATES
The simplification comes from the fact that the number of variables needed to specify a group
of cells is reduced by 2ng where ng is the number of cells in the group. Thus the number of
variables required to specify an entire group of cells in an n-variable Karnaugh map is:
where:
Let us use a Karnaugh map to find a minimal sum of products for Equation 4.20 (repeated
here):
F1 (x, y) = x · y ′ + x′ · y + x · y
We start by placing a 1 in each cell corresponding to a minterm that appears in the equation as
shown in Figure 4.7. It is easy to see two groups of two cells each. They are circled in Figure
F1 (x, y) y
0 1
0 1
x
1 1 1
4.8. The group in the bottom row represents the product term x, and the one in the right-hand
F1 (x, y) y
0 1
0 1
x
1 1
1
Notice that the two encircled groups overlap with the x · y minterm. This is the term that
we added to the function in Equation 4.21 when performing the algebraic simplification. The
Karnaugh map provides a graphical means to find the same simplification as the algebraic
manipulations (see Equation 4.24). Many people find it easier to spot simplification patterns on
a Karnaugh map.
Although it is not obvious in a two-variable Karnaugh map, the cells must be arranged such
that only one variable changes between two cells that share an edge. This is called the adjacency
property. We can see this in a three-variable Karnaugh map. Table 4.1 (page 59) lists all the
minterms for three variables, x, y, and z, numbered from 0 – 8. A total of eight cells are needed,
so we will draw it four cells wide and two high. Our Karnaugh map will be drawn with y and z
on the horizontal axis, and x on the vertical. Figure 4.9 shows how the three-variable minterms
map onto a Karnaugh map. Notice the order of the bit patterns along the top of the Karnaugh
map. It is the same as a two-variable Gray code (Table 3.7, page 52). That is, the order of the
columns is such that the yz values follow the Gray code.
4.3. BOOLEAN FUNCTION MINIMIZATION 65
F(x,y,z) yz
00 01 11 10
0 m0 m1 m3 m2
x
1 m4 m5 m7 m6
A four-variable Karnaugh map is shown in Figure 4.10. The y and z variables are on the
horizontal axis, w and x on the vertical. From this four-variable Karnaugh map we see that the
order of the rows is such that the wx values also follow the Gray code.
F(w,x,y,z) yz
00 01 11 10
00 m0 m1 m3 m2
01 m4 m5 m7 m6
wx
11 m12 m13 m15 m14
10 m8 m9 m11 m10
Other axis labeling schemes also work. The only requirement is that entries in adjacent cells
differ by only one bit (which is a property of the Gray code). See Exercises 4-9 and 4-10.
Example 4-b
F (x, y, z) = x′ · y ′ · z ′ + x′ · y ′ · z + x′ · y · z ′
+ x · y′ · z ′ + x · y · z ′ + x · y · z (4.32)
F (x, y, z) yz
00 01 11 10
0 1 1 1
x
1 1 1 1
Several groupings are possible. Keep in mind that groupings can wrap around. We will work
with
F (x, y, z) yz
#
00 01 11 10
1 1
1
0
x
1 1 ! 1 "
1
F (x, y, z) = z ′ + x′ · y ′ + x · y
66 CHAPTER 4. LOGIC GATES
1. place a 0 in each cell of the Karnaugh map corresponding to a missing minterm in the
expression,
we will have the desired expression expressed as a product of sums. Let us use the previous
example to illustrate.
Example 4-c
F (x, y, z) yz
00 01 11 10
0 0
x
1 0
F ′ (x, y, z) = x′ · y · z + x · y ′ · z
F (x, y, z) = (x + y ′ + z ′ ) · (x′ + y + z ′ )
Example 4-d
F (x, y, z) = w′ · x′ · y ′ · z ′ + w′ · x′ · y · z ′ + w′ · x · y ′ · z
+ w′ · x · y · z + w · x · y ′ · z + w · x · y · z
+ w · x′ · y ′ · z ′ + w · x′ · y · z ′ (4.33)
F (w, x, y, z) yz
00 01 11 10
1 ! "
#
00 1
01 1 1
wx
"
1 1 !
#
11
10 1 1
4.3. BOOLEAN FUNCTION MINIMIZATION 67
F (w, x, y, z) = x′ · z ′ + x · z
Not only have we greatly reduced the number of AND and OR gates, we see that the two vari-
ables w and y are not needed. By the way, you have probably encountered a circuit that imple-
ments this function. A light controlled by two switches typically does this.
As you probably expect by now a Karnaugh map also works when a function is specified as a
product of sums. The differences are:
1. maxterms are numbered 0 for uncomplemented variables and 1 for complemented, and
2. a 0 is placed in each cell of the Karnaugh map that corresponds to a maxterm.
To see how this works let us first compare the Karnaugh maps for two functions,
F1 (x, y, z) = (x′ · y ′ · z ′ )
F2 (x, y, z) = (x + y + z)
F1 is a sum of products with only one minterm, and F2 is a product of sums with only one
maxterm. Figure 4.11(a) shows how the minterm appears on a Karnaugh map, and Figure
Karnaugh map
4.11(b) shows the maxterm. — “minterm”
versus
yz yz “maxterm.”
00 01 11 10 F2 (x, y, z) 00 01 11 10
F1 (x, y, z)
0 1 0
x x
1 1
(a) (b)
Figure 4.11: Comparison of one minterm (a) versus one maxterm (b) on a Karnaugh map.
Figure 4.12 shows how three-variable maxterms map onto a Karnaugh map. As with minterms,
x is on the vertical axis, y and z on the horizontal. To use the Karnaugh map for maxterms, place
a 0 is in each cell corresponding to a maxterm.
F (x, y, z) yz
00 01 11 10
0 M0 M1 M3 M2
x
1 M4 M5 M7 M6
A four-variable Karnaugh map of maxterms is shown in Figure 4.13. The w and x variables
are on the vertical axis, y and z on the horizontal.
68 CHAPTER 4. LOGIC GATES
F (w, x, y, z) yz
00 01 11 10
00 M0 M1 M3 M2
01 M4 M5 M7 M6
wx
11 M12 M13 M15 M14
10 M8 M9 M11 M10
Example 4-e
Find a minimal product of sums for the function of Equation 4.25. That function is
F (x, y, z) = (x + y + z) · (x + y + z ′ ) · (x + y ′ + z ′ )
· (x′ + y + z) · (x′ + y ′ + z ′ )
So this expression includes maxterms 0, 1, 3, 4, and 7. These appear in a Karnaugh map:
F (x, y, z) yz
00 01 11 10
0 0 0 0
x
1 0 0
Next we encircle the largest adjacent blocks, where the number of cells in each block is a power
of two. Notice that maxterm M0 appears in two groups.
F (x, y, z) yz
00 01 11 10
0 0 0 0
x
1
0
0
From this Karnaugh map it is very easy to write the function as a minimal product of sums:
F (x, y, z) = (x + y) · (y + z) · (y ′ + z ′ )
which is the same as we found in Equation 4.28.
There are situations where some minterms (or maxterms) are irrelevant in a function. This
might occur, say, if certain input conditions are impossible in the design. As an example, assume
that you have an application where the exclusive or (XOR) operation is required. The symbol for
the operation and its truth table are shown in Figure 4.14. The minterms required to implement
x y x⊕y
0 0 0
0 1 1
x 1 0 1
x⊕y
y
1 1 0
This is the simplest form of the XOR operation. It requires two AND gates, two NOT gates, and
an OR gate for realization.
But let us say that we have the additional information that the two inputs, x and y can never
be 1 at the same time. Then we can draw a Karnaugh map with an “×” for the minterm that
cannot exist as shown in Figure 4.15. The “×” represents a “don’t care” cell — we don’t care
whether this cell is grouped with other cells or not.
F (x, y) y
0 1
0 1
x
1 1 ×
Figure 4.15: A “don’t care” cell on a Karnaugh map. Since x and y cannot both be 1 at the same
time, we don’t care if the cell xy = 11 is included in our groupings or not.
Since the cell that represents the minterm x · y is a “don’t care”, we can include it in our
minimization groupings, leading to the two groupings shown in Figure 4.16. We easily recognize
F (x, y) y
0 1
0 1
x
1
×
1
Figure 4.16: Karnaugh map for xor function if we know x = y = 1 cannot occur.
this Karnaugh map as being realizable with a single OR gate, which saves one OR gate and an
AND gate.
Voltage is a difference in electrical potential between two points in an electrical circuit. One
volt is defined as the potential difference between two points on a conductor when one
ampere of current flowing through the conductor dissipates one watt of power.
• Active elements that switch between various combinations of the power source, passive
elements, and other active elements.
We will look at how each of these three categories of electronic components behaves.
70 CHAPTER 4. LOGIC GATES
c c
+ +
voltage
- Power voltage
-
time c Supply c time
- AC - DC
Figure 4.17: AC/DC power supply.
• Capacitance stores energy in an electric field. Voltage across a capacitance cannot change
instantaneously.
All three of these electro-magnetic properties are distributed throughout any electronic circuit.
In computer circuits they tend to limit the speed at which the circuit can operate and to consume
power, collectively known as impedance. Analyzing their effects can be quite complicated and is
beyond the scope of this book. Instead, to get a feel for the effects of each of these properties,
we will consider the electronic devices that are used to add one of these properties to a specific
location in a circuit; namely, resistors, capacitors, and inductors. Each of these circuit devices
has a different relationship between the voltage difference across the device and the current
flowing through it.
A resistor irreversibly transforms electrical energy into heat. It does not store energy. The
relationship between voltage and current for a resistor is given by the equation
v = iR (4.34)
where v is the voltage difference across the resistor at time t, i is the current flowing through it
at time t, and R is the value of the resistor. Resistor values are specified in ohms. The circuit
shown in Figure 4.18 shows two resistors connected in series through a switch to a battery. The
battery supplies 2.5 volts. The Greek letter Ω is used to indicate ohms, and kΩ indicates 103
ohms. Since current can only flow in a closed path, none flows until the switch is closed.
4.4. CRASH COURSE IN ELECTRONICS 71
A i 1.0 kΩ B
+
2.5 v 1.5 kΩ
−
C
Both resistors are in the same path, so when the switch is closed the same current flows
through each of them. The resistors are said to be connected in series. The total resistance in
the path is their sum:
R = 1.0 kΩ + 1.5 kΩ
= 2.5 × 103 ohms
The amount of current can be determined from the application of Equation 4.34. Solving for i,
v
i =
R
2.5 volts
=
2.5 × 103 ohms
= 1.0 × 10−3 amps
= 1.0 ma
where “ma” means “milliamps.”
We can now use Equation 4.34 to determine the voltage difference between points A and B.
vAB = iR
= 1.0 × 10−3 amps × 1.0 × 103 ohms
= 1.0 volts
Similarly, the voltage difference between points B and C is
vBC = iR
= 1.0 × 10−3 amps × 1.5 × 103 ohms
= 1.5 volts
Figure 4.19 shows the same two resistors connected in parallel. In this case, the voltage
it A
i1 i2
+
2.5 v 1.0 kΩ 1.5 kΩ
−
across the two resistors is the same: 2.5 volts when the switch is closed. The current in each one
depends upon its resistance. Thus,
v
i1 =
R1
2.5 volts
=
1.0 × 103 ohms
= 2.5 × 10−3 amps
= 2.5 ma
72 CHAPTER 4. LOGIC GATES
and
v
i2 =
R2
2.5 volts
=
1.5 × 103 ohms
= 1.67 × 10−3 amps
= 1.67 ma
The total current, it , supplied by the battery when the switch is closed is divided at point A to
supply both the resistors. It must equal the sum of the two currents through the resistors,
it = i1 + i2
= 2.5 ma + 1.67 ma
= 4.17 ma
A capacitor stores energy in the form of an electric field. It reacts slowly to voltage changes,
requiring time for the electric field to build. The voltage across a capacitor changes with time
according to the equation
Z
1 t
v = i dt (4.35)
C 0
where C is the value of the capacitor in farads.
Figure 4.20 shows a 1.0 microfarad capacitor being charged through a 1.0 kiloohm resistor.
This circuit is a rough approximation of the output of one transistor connected to the input of
A i 1.0 kΩ B
+
2.5 v 1.0 µf
−
C
Figure 4.20: Capacitor in series with a resistor; vAB is the voltage across the resistor and vBC is
the voltage across the capacitor.
another. (See Section 4.4.3.) The output of the first transistor has resistance, and the input to
the second transistor has capacitance. The switching behavior of the second transistor depends
upon the voltage across the (equivalent) capacitor, vBC .
Assuming the voltage across the capacitor, vBC , is 0.0 volts when the switch is first closed,
current flows through the resistor and capacitor. The voltage across the resistor plus the voltage
across the capacitor must be equal to the voltage available from the battery. That is,
If we assume that the voltage across the capacitor, vBC , is 0.0 volts when the switch is first
closed, the full voltage of the battery, 2.5 volts, will appear across the resistor. Thus, the initial
current flow in the circuit will be
2.5 volts
iinitial =
1.0 kΩ
= 2.5 ma
As the voltage across the capacitor increases, according to Equation 4.35, the voltage across the
resistor, vAB , decreases. This results in an exponentially decreasing build up of voltage across
the capacitor. When it finally equals the voltage of the battery, the voltage across the resistor
is 0.0 volts and current flow in the circuit becomes zero. The rate of the exponential decrease is
given by the product RC, called the time constant.
4.4. CRASH COURSE IN ELECTRONICS 73
Thus, assuming the capacitor in Figure 4.20 has 0.0 volts across it when the switch is closed,
the voltage that develops over time is given by
−3
vBC = 2.5 (1 − e−t/10 ) (4.37)
This is shown in Figure 4.21. At the time t = 1.0 millisecond (one time constant), the voltage
2.5 0
2 0.5
1.5 1
vBC mathrm−−volts vAB mathrm−−volts
1 1.5
0.5 2
0 2.5
0 2 4 6 8 10
msec.
Figure 4.21: Capacitor charging over time in the circuit in Figure 4.20. The left-hand y-axis
shows voltage across the capacitor, the right-hand voltage across the resistor.
After 6 time constants of time have passed, the voltage across the capacitor has reached
−3
/10−3
vBC = 2.5 (1 − e−6×10 )
= 2.5 (1 − e−6 )
= 2.5 × 0.9975
= 2.49 volts
At this time the voltage across the resistor is essentially 0.0 volts and current flow is very low.
Inductors are not used in logic circuits. In the typical PC, they are found as part of the CPU
power supply circuitry. If you have access to the inside of a PC, you can probably see a small (∼1
cm. in diameter) donut-shaped device with wire wrapped around it on the motherboard near
the CPU. This is an inductor used to smooth the power supplied to the CPU.
An inductor stores energy in the form of a magnetic field. It reacts slowly to current changes,
requiring time for the magnetic field to build. The relationship between voltage at time t across
an inductor and current flow through it is given by the equation
di
v = L (4.38)
dt
where L is the value of the inductor in henrys.
Figure 4.22 shows an inductor connected in series with a resistor. When the switch is open
74 CHAPTER 4. LOGIC GATES
A i 1.0 µh B
+
2.5v 1.0 kΩ
−
C
no current flows through this circuit. Upon closing the switch, the inductor initially impedes
the flow of current, taking time for a magnetic field to be built up in the inductor.
At this initial point no current is flowing through the resistor, so the voltage across it, vBC , is
0.0 volts. The full voltage of the battery, 2.5 volts, appears across the inductor, vAB . As current
begins to flow through the inductor the voltage across the resistor, vBC , grows. This results in
an exponentially decreasing voltage across the inductor. When it finally reaches 0.0 volts, the
voltage across the resistor is 2.5 volts and current flow in the circuit is 2.5 ma.
The rate of the exponential voltage decrease is given by the time constant L/R. Using the
values of R and L in Figure 4.22 we get
When the switch is closed, the voltage that develops across the inductor over time is given by
−9
vAB = 2.5 × e−t/10 (4.39)
This is shown in Figure 4.23. Note that after about 6 nanoseconds (6 time constants) the voltage
2.5 0
2 0.5
1.5 1
vAB , volts vBC , volts
1 1.5
0.5 2
0 2.5
0 2 4 6 8 10
nanosec.
Figure 4.23: Inductor building a magnetic field over time in the circuit in Figure 4.22. The left-
hand y-axis shows voltage across the inductor, the right-hand voltage across the
resistor.
across the inductor is essentially equal to 0.0 volts. At this time the full voltage of the battery is
across the resistor and a steady current of 2.5 ma flows.
This circuit in Figure 4.22 illustrates how inductors are used in a CPU power supply. The
battery in this circuit represents the computer power supply, and the resistor represents the load
provided by the CPU. The voltage produced by a power supply includes noise, which consists of
small, high-frequency fluctuations added to the DC level. As can be seen in Figure 4.23, the
voltage supplied to the CPU, vBC , changes little over short periods of time.
4.4. CRASH COURSE IN ELECTRONICS 75
R
output
input
VSS
Figure 4.24: A single n-type MOSFET transistor switch.
device. The input terminal is called the gate. The terminal connected to the output is the drain,
and the terminal connected to VSS is the source. In this circuit the drain is connected to positive
(high) voltage of a DC power supply, VDD , through a resistor, R. The source is connected to the
zero voltage, VSS .
When the input voltage to the transistor is high, the gate acquires an electrical charge, thus
turning the transistor on. The path between the drain and the source of the transistor essen-
tially become a closed switch. This causes the output to be at the low voltage. The transistor
acts as a pull down device.
The resulting circuit is equivalent to Figure 4.25(a). In this circuit current flows from VDD to
VDD VDD
R R
input = high output input = low output
VSS VSS
(a) (b)
Figure 4.25: Single transistor switch equivalent circuit; (a) switch closed; (b) switch open.
VSS through the resistor R. The output is connected to VSS , that is, 0.0 volts. The current flow
through the resistor and transistor is
VDD − VSS
i = (4.40)
R
The problem with this current flow is that it uses power just to keep the output low.
If the input is switched to the low voltage, the transistor turns off, resulting in the equivalent
circuit shown in Figure 4.25(b). The output is typically connected to another transistor’s input
(its gate), which draws essentially no current, except during the time it is switching from one
state to the other. In the steady state condition the output connection does not draw current.
Since no current flows through the resistor, R, there is no voltage change across it. So the output
connection will be at VDD volts, the high voltage. The resistor is acting as the pull up device.
These two states can be expressed in the truth table
76 CHAPTER 4. LOGIC GATES
input output
low high
high low
which is the logic required of a NOT gate.
There is another problem with this hardware design. Although the gate of a MOSFET tran-
sistor draws essentially no current in order to remain in either an on or off state, current is
required to cause it to change state. The gate of the transistor that is connected to the output
must be charged. The gate behaves like a capacitor during the switching time. This charging
requires a flow of current over a period of time. The problem here is that the resistor, R, re-
duces the amount of current that can flow, thus taking larger to charge the transistor gate. (See
Section 4.4.2.)
From Equation 4.40, the larger the resistor, the lower the current flow. So we have a dilemma
— the resistor should be large to reduce power consumption, but it should be small to increase
switching speed.
This problem is solved with Complementary Metal Oxide Semiconductor (CMOS) technology.
This technology packages a p-type MOSFET together with each n-type. The p-type works in the
opposite way — a high value on the gate turns it off, and a low value turns it on. The circuit in
Figure 4.26 shows a NOT gate using a p-type MOSFET as the pull up device.
input output
0 1
VDD
1 0
input output
VSS
Figure 4.27(a) shows the equivalent circuit with a high voltage input. The pull up transistor
(a p-type) is off, and the pull down transistor (an n-type) is on. This results in the output being
pulled down to the low voltage. In Figure 4.27(b) a low voltage input turns the pull up transistor
VDD VDD
VSS VSS
(a) (b)
Figure 4.27: CMOS inverter equivalent circuit; (a) pull up open and pull down closed; (b) pull
up closed and pull down open.
on and the pull down transistor off. The result is the output is pulled up to the high voltage.
Figure 4.28 shows an AND gate implemented with CMOS transistors. (See Exercise 4-12.)
Notice that the signal at point A is NOT(x AND y). The circuit from point A to the output is
4.5. NAND AND NOR GATES 77
x y A output
0 0 1 0
0 1 1 0
VDD
1 0 1 0
VDD 1 1 0 1
x
A
output
VSS
y
VSS
a NOT gate. It requires two fewer transistors than the AND operation. We will examine the
implications of this result in Section 4.5.
• NAND — a binary operator; the result is 0 if and only if both operands are 1; otherwise
the result is 1. We will use (x · y)′ to designate the NAND operation. It is also common to
use the ’↑’ symbol or simply “NAND”. The hardware symbol for the NAND gate is shown
in Figure 4.29. The inputs are x and y. The resulting output, (x · y)′ , is shown in the truth
table in this figure.
x y (x · y)′
0 0 1
0 1 1
x 1 0 1
(x · y)′
y
1 1 0
• NOR — a binary operator; the result is 0 if at least one of the two operands is 1; otherwise
the result is 1. We will use (x + y)′ to designate the NOR operation. It is also common to
use the ’↓’ symbol or simply “NOR”. The hardware symbol for the NOR gate is shown in
Figure 4.30. The inputs are x and y. The resulting output, (x + y)′ , is shown in the truth
table in this figure.
The small circle at the output of the NAND and NOR gates signifies “NOT”, just as with the
NOT gate (see Figure 4.3). Although we have explicitly shown NOT gates when inputs to gates
78 CHAPTER 4. LOGIC GATES
x y (x + y)′
0 0 1
0 1 0
x 1 0 0
(x + y)′
y
1 1 0
are complemented, it is common to simply use these small circles at the input. For example,
Figure 4.31 shows an OR gate with both inputs complemented. As the truth table in this figure
x y (x′ + y ′ )
0 0 1
0 1 1
x 1 0 1
(x′ + y ′ )
y
1 1 0
shows, this is an alternate way to draw a NAND gate. See Exercise 4-14 for an alternate way to
draw a NOR gate.
One of the interesting properties about NAND gates is that it is possible to build AND, OR,
and NOT gates from them. That is, the NAND gate is sufficient to implement any Boolean
function. In this sense, it can be though of as a universal gate.
First, we construct a NOT gate. To do this, simply connect the signal to both inputs of a
NAND gate, as shown in Figure 4.32.
x (x · x)′ = x′
(x · y)′ = x′ + y ′
′ ′ ′
(x + y ) = (x′ )′ · (y ′ )′
= x·y
(x · y)′
x
x·y
y
(x′ · y ′ )′ = (x′ )′ + (y ′ )′
= x+y
we use three NAND gates connected as shown in Figure 4.34 to create an OR gate.
4.5. NAND AND NOR GATES 79
x
x+y
y
It may seem like we are creating more complexity in order to build circuits from NAND gates.
But consider the function
F (w, x, y, z) = (w · x) + (y · z) (4.41)
Without knowing how logic gates are constructed, it would be reasonable to implement this
function with the circuit shown in Figure 4.35. Using the involution property (Equation 4.15) it
w
x
(w · x) + (y · z)
y
z
Figure 4.35: The function in Equation 4.41 using two AND gates and one OR gate.
is clear that the circuit in Figure 4.36 is equivalent to the one in Figure 4.35.
w
x
(w · x) + (y · z)
y
z
Figure 4.36: The function in Equation 4.41 using two AND gates, one OR gate and four NOT
gates.
Next, comparing the AND-gate/NOT-gate combination with Figure 4.29, we see that each
is simply a NAND gate. Similarly, comparing the NOT-gates/OR-gate combination with Figure
4.31, it is also a NAND gate. Thus we can also implement the function in Equation 4.41 with
three NAND gates as shown in Figure 4.37.
w
x
(w · x) + (y · z)
y
z
Figure 4.37: The function in Equation 4.41 using only three NAND gates.
From simply viewing the circuit diagrams, it may seem that we have not gained anything
in this circuit transformation. But we saw in Section 4.4.3 that a NAND gate requires fewer
transistors than an AND gate or OR gate due to the signal inversion properties of transistors.
Thus, the NAND gate implementation is a less expensive and faster implementation.
The conversion from an AND/OR/NOT gate design to one that uses only NAND gates is
straightforward:
2. Convert the products (AND terms) and the final sum (OR) to NANDs.
3. Add a NAND gate for any product with only a single literal.
80 CHAPTER 4. LOGIC GATES
As with software, hardware design is an iterative process. Since there usually is not a unique
solution, you often need to develop several designs and analyze each one within the context of
the available hardware. The example above shows that two solutions that look the same on
paper may be dissimilar in hardware.
In Chapter 6 we will see how these concepts can be used to construct the heart of a computer
— the CPU.
4.6 Exercises
4-1 (§4.1) Prove the identity property expressed by Equations 4.3 and 4.4.
4-2 (§4.1) Prove the commutative property expressed by Equations 4.5 and 4.6.
4-3 (§4.1) Prove the null property expressed by Equations 4.7 and 4.8.
4-4 (§4.1) Prove the complement property expressed by Equations 4.9 and 4.10.
4-5 (§4.1) Prove the idempotent property expressed by Equations 4.11 and 4.12.
4-6 (§4.1) Prove the distributive property expressed by Equations 4.13 and 4.14.
F (x, y, z) xy
00 01 11 10
0
z
1
4-10 (§4.3.2) Show where each minterm is located with this Karnaugh map axis labeling using
the notation of Figure 4.9.
F (x, y, z) xz
00 01 11 10
0
y
1
4-11 (§4.3.2) Design a logic function that detects the prime single-digit numbers. Assume that
the numbers are coded in 4-bit BCD (see Section 3.6.1, page 50). The function is 1 for each
prime number.
4-12 (§4.4.3) Using drawings similar to those in Figure 4.27, verfy that the logic circuit in Figure
4.28 is an AND gate.
4-13 (§4.5) Show that the gate in Figure 4.31 is a NAND gate.
4-14 (§4.5) Give an alternate way to draw a NOR gate, similar to the alternate NAND gate in
Figure 4.31.
4.6. EXERCISES 81
4-15 (§4.5) Design a circuit using NAND gates that detects the “below” condition for two 2-bit
values. That is, given two 2-bit variables x and y, F (x, y) = 1 when the unsigned integer
value of x is less than the unsigned integer value of y.
a) Give a truth table for the output of the circuit, F.
b) Find a minimal sum of products for F.
c) Implement F using NAND gates.
Chapter 5
Logic Circuits
In this chapter we examine how the concepts in Chapter 4 can be used to build some of the logic
circuits that make up a CPU, Memory, and other devices. We will not describe an entire unit,
only a few small parts. The goal is to provide an introductory overview of the concepts. There
are many excellent books that cover the details. For example, see [20], [23], or [24] for circuit
design details and [28], [31], [34] for CPU architecture design concepts.
Logic circuits can be classified as either
• Combinational Logic Circuits — the output(s) depend only on the input(s) at any spe-
cific time and not on any previous input(s).
• Sequential Logic Circuits — the output(s) depend both on previous and current in-
put(s).
An example of the two concepts is a television remote control. You can enter a number
and the output (a particular television channel) depends only on the number entered. It does
not matter what channels been viewed previously. So the relationship between the input (a
number) and the output is combinational.
The remote control also has inputs for stepping either up or down one channel. When using
this input method, the channel selected depends on what channel has been previously selected
and the sequence of up/down button pushes. The channel up/down buttons illustrate a sequen-
tial input/output relationship.
Although a more formal definition will be given in Section 5.3, this television example also
illustrates the concept of state. My television remote control has a button I can push that will
show the current channel setting. If I make a note of the beginning channel setting, and keep
track of the sequence of channel up and down button pushes, I will know the ending channel
setting. It does not matter how I originally got to the beginning channel setting. The channel
setting is the state of the channel selection mechanism because it tells me everything I need
to know in order to select a new channel by using a sequence of channel up and down button
pushes.
0+0 = 00
0+1 = 01
82
5.1. COMBINATIONAL LOGIC CIRCUITS 83
1+0 = 01
1+1 = 10
As you have seen in Section 3.1 (page 28), the 1 in the last sum (1 + 1 = 10) is carried to the
next higher-order bit when performing multi-bit addition. This implies that addition of the next
higher-order bits must be able to add three bits:
0+0+0 = 00
0+0+1 = 01
0+1+0 = 01
0+1+1 = 10
1+0+0 = 01
1+0+1 = 10
1+1+0 = 10
1+1+1 = 11
xi yi Carryi+1 Sumi
0 0 0 0
0 1 0 1
1 0 0 1
1 1 1 0
full adder: A combinational logic device that has three 1-bit inputs, Carryi , xi , and yi , and two
outputs that are related:
Carryi+1 is the carry from adding the next-lower significant bits, xi , yi , and Carryi .
The terms half adder and full adder come from the fact that a full adder can be constructed
from two half adders, with the addition of a carry input.
We begin with a half adder. Looking at the sum in the definition of half adder, it is easy to
see that this is simply the XOR of the two inputs. The carry is the AND of the two inputs. This
leads to the circuit in Figure 5.1.
xi
Sumi
yi
Carryi+1
The full adder is not as obvious. First, let us look at the Karnaugh map for the sum:
84 CHAPTER 5. LOGIC CIRCUITS
Sumi xi yi
00 01 11 10
0 1 1
Carryi
1 1 1
The diagonal pattern suggests that the XOR operation can be used in our solution, and the
half adder used an XOR operation. But the pattern is difficult to see, because we have a map
that shows minterms and there are no obvious groupings.
So let us try an algebraic approach. We can write the function as a sum of product terms
from the Karnaugh map.
a′ · b + a · b ′
where
a = Carryi
b = xi ⊕ yi
Thus, we conclude:
Carryi+1 xi yi
00 01 11 10
0 1
Carryi
1 1 1 1
Carryi+1 xi yi
00 01 11 10
0 1
Carryi
1 1
1
1
5.1. COMBINATIONAL LOGIC CIRCUITS 85
You should be able to see two other possible groupings on this Karnaugh map and may wonder
why they are not circled here. The two ungrouped minterms, Carryi · x′i · yi and Carryi · xi · yi′ ,
form a pattern that suggests an exclusive or operation.
This grouping yields a three-term function that defines when Carryi+1 = 1:
Notice that the first product term in Equation 5.4, xi · yi , is generated by the Carry portion of
a half-adder, and that the exclusive or portion, xi ⊕ yi , of the second product term is generated
by the Sum portion. A logic gate implementation of a full adder is shown in Figure 5.2. You can
xi
yi Sumi
Carryi+1
Carryi
c3 c2 c1
c4
s3 s2 s1 s0
s=x+y
CF = c4
OF = c3 ⊕ c4
Figure 5.3: Four-bit adder.
begins with the full adder on the right receiving the two lowest-order bits, x0 and y0 . Since this
is the lowest-order bit there is no carry and c0 = 0. The bit sum is s0 , and the carry from this
addition, c1 , is connected to the carry input of the next full adder to the left, where it is added to
x1 and y1 .
So the ith full adder adds the two ith bits of the operands, plus the carry (which is either 0
or 1) from the (i − 1)th full adder. Thus, each full adder handles one bit (often referred to as a
“slice”) of the total width of the values being added, and the carry “ripples” from the lowest-order
place to the highest-order.
The final carry from the highest-order full adder, c4 in the 4-bit adder of Figure 5.3, is stored
in the CF bit of the Flags register (see Section 6.2). And the exclusive or of the final carry and
penultimate carry, c4 ⊕ c3 in the 4-bit adder of Figure 5.3, is stored in the OF bit.
Recall that in the 2’s complement code for storing integers a number is negated by taking its
2’s complement. So we can subtract y from x by doing:
86 CHAPTER 5. LOGIC CIRCUITS
Thus, subtraction can be performed with our adder in Figure 5.3 if we complement each yi
and set the initial carry in to 1 instead of 0. Each yi can be complemented by XOR-ing it with 1.
This leads to the 4-bit circuit in Figure 5.4 that will add two 4-bit numbers when f unc = 0 and
subtract them when f unc = 1.
x3 y3 x2 y2 x1 y1 x0 y0
c3 c2 c1
c4
s3 s2 s1 s0
if (f unc == 0)
s=x+y
else // f unc == 1
s=x−y
CF = c4
OF = c3 ⊕ c4
Figure 5.4: Four-bit adder/subtracter.
There is, of course, a time delay as the sum is computed from right to left. The computation
time can be significantly reduced through more complex circuit designs that pre-compute the
carry.
5.1.3 Decoders
Each instruction must be decoded by the CPU before the instruction can be carried out. In the
x86-64 architecture the instruction for copying the 64 bits of one register to another register is
0100 0s0d 1000 1001 11ss sddd
where “ssss” specifies the source register and “dddd” specifies the destination register. (Yes, the
bits that specify the registers are distributed through the instruction in this manner. You will
learn more about this seemingly odd coding pattern in Chapter 9.) For example,
0100 0001 1000 1001 1100 0101
causes the ALU to copy the 64-bit value in register 0000 to register 1101. You will see in Chapter
9 that this instruction is written in assembly language as:
movq %rax, %r13
The Control Unit must select the correct two registers based on these two 4-bit patterns in the
instruction. It uses a decoder circuit to perform this selection.
decoder: A device with n binary inputs and 2n binary outputs. Each bit pattern at the input
causes exactly one of the 2n to equal 1.
A decoder can be thought of as converting an n-bit input to a 2n output. But while the input can
be an arbitrary bit pattern, each corresponding output value has only one bit set to 1.
5.1. COMBINATIONAL LOGIC CIRCUITS 87
In some applications not all the 2n outputs are used. For example, Table 5.1 is a truth table
that shows how a decoder can be used to convert a BCD value to its corresponding decimal
numeral display. A 1 in a “display” column means that is the numeral that is selected by the
input display
′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′
x3 x2 x1 x0 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 1 0 0 0 0 0 0 0 0 1 0
0 0 1 0 0 0 0 0 0 0 0 1 0 0
0 0 1 1 0 0 0 0 0 0 1 0 0 0
0 1 0 0 0 0 0 0 0 1 0 0 0 0
0 1 0 1 0 0 0 0 1 0 0 0 0 0
0 1 1 0 0 0 0 1 0 0 0 0 0 0
0 1 1 1 0 0 1 0 0 0 0 0 0 0
1 0 0 0 0 1 0 0 0 0 0 0 0 0
1 0 0 1 1 0 0 0 0 0 0 0 0 0
Table 5.1: BCD decoder. The 4-bit input causes the numeral with a 1 in its column to be dis-
played.
corresponding 4-bit input value. There are six other possible outputs corresponding to the input
values 1010 – 1111. But these input values are illegal in BCD, so these outputs are simply
ignored.
It is common for decoders to have an additional input that is used to enable the output. The
truth table in Table 5.2 shows a decoder with a 3-bit input, an enable line, and an 8-bit (23 )
output. The output is 0 whenever enable = 0. When enable = 1, the ith output bit is 1 if and
enable x2 x1 x0 y7 y6 y5 y4 y3 y2 y1 y0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 1
1 0 0 1 0 0 0 0 0 0 1 0
1 0 1 0 0 0 0 0 0 1 0 0
1 0 1 1 0 0 0 0 1 0 0 0
1 1 0 0 0 0 0 1 0 0 0 0
1 1 0 1 0 0 1 0 0 0 0 0
1 1 1 0 0 1 0 0 0 0 0 0
1 1 1 1 1 0 0 0 0 0 0 0
Table 5.2: Truth table for a 3 × 8 decoder with enable. If enable = 0, y = 0. If enable = 1,
x = i ⇒ yi = 1 and yj = 0 for all j 6= i.
only if the binary value of the input is equal to i. For example, when enable = 1 and x = 0112 ,
y = 000010002. That is,
y3 = x′2 · x1 · x0
= m3
This clearly generalizes such that we can give the following description of a decoder:
1. For n input bits (excluding an enable bit) there are 2n output bits.
88 CHAPTER 5. LOGIC CIRCUITS
2. The ith output bit is equal to the ith minterm for the n input bits.
The 3 × 8 decoder specified in Table 5.2 can be implemented with 4-input AND gates as shown
in Figure 5.5.
enable x2 x2 x1 x1 x0 x0
y0
y1
y2
y3
y4
y5
y6
y7
Decoders are more versatile than it might seem at first glance. Each possible input can be
seen as a minterm. Since each output is one only when a particular minterm evaluates to one,
a decoder can be viewed as a “minterm generator.” We know that any logical expression can be
represented as the OR of minterms, so it follows that we can implement any logical expression
by ORing the output(s) of a decoder.
For example, let us rewrite Equation 5.1 for the Sum expression of a full adder using minterm
notation (see Section 4.3.2):
where the subscripts on x, y, and Carry refer to the bit slice and the subscripts on m are part of
the minterm notation. We can implement a full adder with a 3 × 8 decoder and two 4-input OR
gates, as shown in Figure 5.6.
5.1.4 Multiplexers
There are many places in the CPU where one of several signals must be selected to pass onward.
For example, as you will see in Chapter 9, a value to be added by the ALU may come from a CPU
register, come from memory, or actually be stored as part of the instruction itself. The device
that allows this selection is essentially a switch.
multiplexer: A device that selects one of multiple inputs to be passed on as the output based
on one or more selection lines. Up to 2n inputs can be selected by n selection lines. Also
called a mux.
5.1. COMBINATIONAL LOGIC CIRCUITS 89
m0
m1
xi 3×8 m2 Sumi
yi m3
Carryi decoder m4
m5
m6
m7 Carryi+1
Enable
Figure 5.6: Full adder implemented with 3 × 8 decoder. This is for one bit slice. An n-bit adder
would require n of these circuits.
Figure 5.7 shows a multiplexer that can switch between two different inputs, x and y. The select
input, s, determines which of the sources, either x or y, is passed on to the output. The action of
this 2-way multiplexer is most easily seen in a truth table:
s Output
1 x
0 y
x
y
Output
s
Figure 5.7: A 2-way multiplexer.
Here is a truth table for a multiplexer that can switch between four inputs, w, x, y, and z:
s1 s0 Output
0 0 w
0 1 x
1 0 y
1 1 z
That is,
which is implemented as shown in Figure 5.8. The symbol for this multiplexer is shown in
Figure 5.9. Notice that the selection input, s, must be 2 bits in order to select between four
inputs. In general, a 2n -way multiplexer requires an n-bit selection input.
90 CHAPTER 5. LOGIC CIRCUITS
w
x
y
z
Output
s0 s1
Figure 5.8: A 4-way multiplexer.
w 0
x 1
Output
y 2
z 3 Sel
S0 , S1
F1 (x, y) F2 (x, y)
Figure 5.10: Simplified circuit for a programmable logic array. The “S” shaped line at the inputs
to each gate represent fuses. The fuses are “blown” to remove that input.
5.2. PROGRAMMABLE LOGIC DEVICES 91
form, are inputs to AND gates through fuses. (The “S” shaped lines in the circuit diagram
represent fuses.) The fuses can be “blown” or left in place in order to program each AND gate to
output a product. Since every input, plus its complement, is input to each AND gate, any of the
AND gates can be programmed to output a minterm.
The products produced by the array of AND gates are all connected to OR gates, also through
fuses. Thus, depending on which OR-gate fuses are left in place, the output of each OR gate is a
sum of products. There may be additional logic circuitry to select between the different outputs.
We have already seen that any Boolean function can be expressed as a sum of products, so this
logic device can be programmed by “blowing” the fuses to implement any Boolean function.
PLDs come in many configurations. Some are pre-programmed at the time of manufacture.
Others are programmed by the manufacturer. And there are types that can be programmed by
a user. Some can even be erased and reprogrammed. Programming technologies range from
specifying the manufacturing mask (for the pre-programmed devices) to inexpensive electronic
programming systems. Some devices use “antifuses” instead of fuses. They are normally open.
Programming such devices consists of completing the connection instead of removing it.
There are three general categories of PLDs:
Programmable Logic Array (PLA): Both the AND gate plane and the OR gate plane are
programmable.
Read Only Memory (ROM): Only the OR gate plane is programmable.
Programmable Array Logic (PAL): Only the AND gate plane is programmable.
We will now look at each category in more detail.
F1 F2 F3
Figure 5.11: Programmable logic array schematic. The horizontal lines to the AND gate inputs
represent multiple wires — one for each input variable and its complement. The
vertical lines to the OR gate inputs also represent multiple wires — one for each
AND gate output. The dots represent connections.
diagram deserves some explanation. Note in Figure 5.10 that each input variable and its com-
plement is connected to the inputs of all the AND gates through a fuse. The AND gates have
multiple inputs — one for each variable and its complement. Thus, the horizontal line leading
92 CHAPTER 5. LOGIC CIRCUITS
to the inputs of the AND gates represent multiple wires. The diagram of Figure 5.11 has four
input variables. So each AND gate has eight inputs, and the horizontal lines each represent the
eight wires coming from the inputs and their complements.
The dots at the intersections of the vertical and horizontal line represent places where the
fuses have been left intact. For example, the three dots on the topmost horizontal line indicate
that there are three inputs to that AND gate The output of the topmost AND gate is
w′ · y · z
Referring again to Figure 5.10, we see that the output from each AND gate is connected to
each of the OR gates. Therefore, the OR gates also have multiple inputs — one for each AND
gate — and the vertical lines leading to the OR gate inputs represent multiple wires. The PLA
in Figure 5.11 has been programmed to provide the three functions:
F1 (w, x, y, z) = w′ · y · z + w · x · z ′
F2 (w, x, y, z) = w′ · x′ · y ′ · z ′
F3 (w, x, y, z) = w′ · y · z + w · x · z ′
× ×
× ×
× ×
× ×
d7 d6 d5 d4 d3 d2 d1 d0
Figure 5.12: Eight-byte Read Only Memory (ROM). The “×” connections represent permanent
connections. Each AND gate can be thought of as producing an address. The eight
OR gates produce one byte. The connections (dots) in the OR plane represent the
bit pattern stored at the address.
And the OR gate plane has been programmed to store the four characters (in ASCII code):
minterm address contents
a′1 a′0 00 ′ ′
0
′ ′ ′
a1 a0 01 1
a1 a′0 10 ′ ′
2
′ ′
a1 a0 11 3
You can see from this that the terminology “Read Only Memory” is perhaps a bit misleading.
It is actually a combinational logic circuit. Strictly speaking, memory has a state that can be
changed by inputs. (See Section 5.3.)
F1 (w, x, y, z) = w · x′ · z + w′ · x + w · x · y ′ + w′ · x′ · y ′ · z ′
F2 (w, x, y, z) = w′ · y · z + w · x · z ′ + w · x · y · z + w · x · y ′ · z ′
94 CHAPTER 5. LOGIC CIRCUITS
w x y z
F1 F2
Figure 5.13: Two-function Programmable Array Logic (PAL). The “×” connections represent per-
manent connections. Each AND gate can be thought of as producing an address.
The eight OR gates produce one byte. The connections (dots) in the OR plane rep-
resent the bit pattern stored at the address.
state: The state of a system is the description of the system such that knowing
uniquely determines
This definition means that knowing the state of a system at a given time tells you everything
you need to know in order to specify its behavior from that time on. How it got into this state is
irrelevant.
This definition implies that the system has memory in which the state is stored. Since there
are a finite number of states, the term finite state machine(FSM) is commonly used. Inputs to
the system can cause the state to change.
5.3. SEQUENTIAL LOGIC CIRCUITS 95
If the output(s) depend only on the state of the FSM, it is called a Moore machine. And if the
output(s) depend on both the state and the current input(s), it is called a Mealy machine.
The most commonly used sequential circuits are synchronous — their action is controlled by
a sequence of clock pulses. The clock pulses are created by a clock generator circuit. The clock
pulses are applied to all the sequential elements, thus causing them to operate in synchrony.
Asynchronous sequential circuits are not based on a clock. They depend upon a timing delay
built into the individual elements. Their behavior depends upon the order in which inputs are
applied. Hence, they are difficult to analyze and will not be discussed in this book.
6 6 6 6
? ? ? ?
(c) Negative-edge trigger.
Time -
Figure 5.14: Clock signals. (a) For level-triggered circuits. (b) For positive-edge triggering. (c)
For negative-edge triggering.
In Figure 5.14(a), the circuit operations take place during the entire time the clock is at the
1 level. As will be explained below, this can lead to unreliable circuit behavior. In order to
achieve more reliable behavior, most circuits are designed such that a transition of the clock
signal triggers the circuit elements to start their respective operations. Either a positive-going
(Figure 5.14(b)) or negative-going (Figure 5.14(c)) transition may be used. The clock frequency
must be slow enough such that all the circuit elements have time to complete their operations
before the next clock transition (in the same direction) occurs.
5.3.2 Latches
A latch is a storage device that can be in one of two states. That is, it stores one bit. It can be
constructed from two or more gates connected such that feedback maintains the state as long as
power is applied. The most fundamental latch is the SR (Set-Reset).
A simple implementation using NOR gates is shown in Figure 5.15. When Q = 1 (⇔ Q′ = 0)
it is in the Set state. When Q = 0 (⇔ Q′ = 1) it is in the Reset state.
There are four possible input combinations.
96 CHAPTER 5. LOGIC CIRCUITS
S
Q′
Q
R
S = 0, R = 0: Keep current state. If Q = 0 and Q′ = 1, the output of the upper NOR gate is
(0 + 0)′ = 1, and the output of the lower NOR gate is (1 + 0)′ = 0.
If Q = 1 and Q′ = 0, the output of the upper NOR gate is (0 + 1)′ = 0, and the output of the
lower NOR gate is (0 + 0)′ = 1.
Thus, the cross feedback between the two NOR gates maintains the state — Set or Reset
— of the latch.
S = 1, R = 0: Set. If Q = 1 and Q′ = 0, the output of the upper NOR gate is (1 + 1)′ = 0, and the
output of the lower NOR gate is (0 + 0)′ = 1. The latch remains in the Set state.
If Q = 0 and Q′ = 1, the output of the upper NOR gate is (1 + 0)′ = 0. This is fed back
to the input of the lower NOR gate to give (0 + 0)′ = 1. The feedback from the output of
the lower NOR gate to the input of the upper keeps the output of the upper NOR gate at
(1 + 1)′ = 0. The latch has moved into the Set state.
S = 0, R = 1: Reset. If Q = 1 and Q′ = 0, the output of the lower NOR gate is (0 + 1)′ = 0. This
causes the output of the upper NOR gate to become (0 + 0)′ = 1. The feedback from the
output of the upper NOR gate to the input of the lower keeps the output of the lower NOR
gate at (1 + 1)′ = 0. The latch has moved into the Reset state.
If Q = 0 and Q′ = 1, the output of the lower NOR gate is (1 + 1)′ = 0, and the output of the
upper NOR gate is (0 + 0)′ = 1. The latch remains in the Reset state.
S = 1, R = 1: Not allowed. If Q = 0 and Q′ = 1, the output of the upper NOR gate is (1+0)′ = 0.
This is fed back to the input of the lower NOR gate to give (0 + 1)′ = 0 as its output. The
feedback from the output of the lower NOR gate to the input of the upper maintains its
output as (1 + 0)′ = 0. Thus, Q = Q′ = 0, which is not allowed.
If Q = 1 and Q′ = 0, the output of the lower NOR gate is (0 + 1)′ = 0. This is fed back to
the input of the upper NOR gate to give (1 + 0)′ = 0 as its output. The feedback from the
output of the upper NOR gate to the input of the lower maintains its output as (0 + 1)′ = 0.
Thus, Q = Q′ = 0, which is not allowed.
The state table in Table 5.3 summarizes the behavior of a NOR-based SR latch. The inputs
Current Next
S R State State
0 0 0 0
0 0 1 1
0 1 0 0
0 1 1 0
1 0 0 1
1 0 1 1
1 1 0 X
1 1 1 X
Table 5.3: SR latch state table. “X” indicates an indeterminate state. A circuit using this latch
must be designed to prevent this input combination.
to a NOR-based SR latches are normally held at 0, which maintains the current state, Q. Its
5.3. SEQUENTIAL LOGIC CIRCUITS 97
current state is available at the output. Momentarily changing S or R to 1 causes the state to
change to Set or Reset, respectively, as shown in the Qnext column.
Notice that placing 1 on both the Set and Reset inputs at the same time causes a problem.
Then the outputs of both NOR gates would become 0. In other words, Q = Q′ = 0, which is
logically impossible. The circuit design must be such to prevent this input combination.
The behavior of an SR latch can also be shown by the state diagram in Figure 5.16 A state
10
00 0 SR 1 00
01 10
01
Figure 5.16: State diagram for an SR latch. There are two possible inputs, 00 or 01, that cause
the latch to remain in state 0. Similarly, 00 or 10 cause it to remain in state 1. Since
the output is simply the state, it is not shown in this state diagram. Notice that
the input 11 is not allowed, so it is not shown on the diagram.
diagram is a directed graph. The circles show the possible states. Lines with arrows show the
possible transitions between the states and are labeled with the input that causes the transition.
The two circles in Figure 5.16 show the two possible states of the SR latch — 0 or 1. The
labels on the lines show the two-bit inputs, SR, that cause each state transition. Notice that
when the latch is in state 0 there are two possible inputs, SR = 00 and SR = 01, that cause
it to remain in that state. Similarly, when it is in state 1 either of the two inputs, SR = 00 or
SR = 10, cause it to remain in that state.
The output of the SR latch is simply the state so is not shown separately on this state dia-
gram. In general, if the output of a circuit is dependent on the input, it is often shown on the
directed lines of the state diagram in the format “input/output.” If the output is dependent on
the state, it is more common to show it in the corresponding state circle in “state/output” format.
NAND gates are more commonly used than NOR gates, and it is possible to build an SR
latch from NAND gates. Recalling that NAND and NOR have complementary properties, we
will think ahead and use S ′ and R′ as the inputs, as shown in Figure 5.17. Consider the four
S′
Q
Q′
R′
S’ = 1, R’ = 1: Keep current state. If Q = 0 and Q′ = 1, the output of the upper NAND gate is
(1 · 1)′ = 0, and the output of the lower NAND gate is (0 · 1)′ = 1.
If Q = 1 and Q′ = 0, the output of the upper NAND gate is (1 · 0)′ = 1, and the output of
the lower NAND gate is (1 · 1)′ = 0.
Thus, the cross feedback between the two NAND gates maintains the state — Set or Reset
— of the latch.
S’ = 0, R’ = 1: Set. If Q = 1 and Q′ = 0, the output of the upper NAND gate is (0 · 0)′ = 1, and
the output of the lower NAND gate is (1 · 1)′ = 0. The latch remains in the Set state.
If Q = 0 and Q′ = 1, the output of the upper NAND gate is (0 · 1)′ = 1. This causes the
output of the lower NAND gate to become (1 · 1)′ = 0. The feedback from the output of the
lower NAND gate to the input of the upper keeps the output of the upper NAND gate at
(0 · 0)′ = 1. The latch has moved into the Set state.
98 CHAPTER 5. LOGIC CIRCUITS
S’ = 1, R’ = 0: Reset. If Q = 0 and Q′ = 1, the output of the lower NAND gate is (0 · 0)′ = 1, and
the output of the upper NAND gate is (1 · 1)′ = 0. The latch remains in the Reset state.
If Q = 1 and Q′ = 0, the output of the lower NAND gate is (1 · 0)′ = 1. This is fed back to
the input of the upper NAND gate to give (1 · 1)′ = 0. The feedback from the output of the
upper NAND gate to the input of the lower keeps the output of the lower NAND gate at
(0 · 0)′ = 1. The latch has moved into the Reset state.
S’ = 0, R’ = 0: Not allowed. If Q = 0 and Q′ = 1, the output of the upper NAND gate is (0·1)′ =
1. This is fed back to the input of the lower NAND gate to give (1 · 0)′ = 1 as its output.
The feedback from the output of the lower NAND gate to the input of the upper maintains
its output as (0 · 0)′ = 1. Thus, Q = Q′ = 1, which is not allowed.
If Q = 1 and Q′ = 0, the output of the lower NAND gate is (1 · 0)′ = 1. This is fed back to
the input of the upper NAND gate to give (0 · 1)′ = 1 as its output. The feedback from the
output of the upper NAND gate to the input of the lower maintains its output as (1 ·1)′ = 0.
Thus, Q = Q′ = 1, which is not allowed.
Figure 5.18 shows the behavior of a NAND-based S’R’ latch. The inputs to a NAND-based
S’R’ latch are normally held at 1, which maintains the current state, Q. Its current state is
available at the output. Momentarily changing S ′ or R′ to 0 causes the state to change to Set or
Reset, respectively, as shown in the “Next State” column.
01
Current Next 11 11
S′ R′ State State 10 0 S’R’ 1 01
1 1 0 0
10
1 1 1 1
1 0 0 0
1 0 1 0
0 1 0 1
0 1 1 1
0 0 0 X
0 0 1 X
Figure 5.18: State table and state diagram for an S’R’ latch. There are two possible inputs, 11 or
10, that cause the latch to remain in state 0. Similarly, 11 or 01 cause it to remain in
state 1. Since the output is simply the state, it is not shown in this state diagram.
Notice that the input 00 is not allowed, so it is not shown on the diagram.
Notice that placing 0 on both the Set and Reset inputs at the same time causes a problem.
Then the outputs of both NOR gates would become 0. In other words, Q = Q′ = 0, which is
logically impossible. The circuit design must be such to prevent this input combination.
So the S’R’ latch implemented with two NAND gates can be thought of as the complement of
the NOR gate SR latch. The state is maintained by holding both S ′ and ′ at 1. S ′ = 0 causes the
state to be 1 (Set), and R′ = 0 causes the state to be 0 (Reset). Using S ′ and R′ as the activating
signals are usually called active-low signals.
You have already seen that ones and zeros are represented by either a high or low voltage
in electronic logic circuits. A given logic device may be activated by combinations of the two
voltages. To show which is used to cause activation at any given input, the following definitions
are used:
Warning! The definitions of active-high versus active-low signals vary in the literature. Make sure
that you and the people you are working with have a clear agreement on the definitions you are using.
5.3. SEQUENTIAL LOGIC CIRCUITS 99
An active-high signal can be connected to an active-low input, but the hardware designer
must take the difference into account. For example, say that the required logical input is 1 to
an active-low input. Since it is active-low, that means the required voltage is the lower of the
two. If the signal to be connected to this input is active-high, then a logical 1 is the higher of the
two voltages. So this signal must first be complemented in order to be interpreted as a 1 at the
active-low input.
We can get better control over the SR latch by adding two NAND gates to provide a Control
input, as shown in Figure 5.19. In this circuit the outputs of both the control NAND gates
S
Q
Control
Q′
R
remain at 1 as long as Control = 0. Table 5.4 shows the state behavior of the SR latch with
control.
Current Next
Control S R State State
0 − − 0 0
0 − − 1 1
1 0 0 0 0
1 0 0 1 1
1 0 1 0 0
1 0 1 1 0
1 1 0 0 1
1 1 0 1 1
1 1 1 0 X
1 1 1 1 X
Table 5.4: SR latch with Control state table. “–” indicates that the value does not matter. “X” in-
dicates an indeterminate state. A circuit using this latch must be designed to prevent
this input combination.
It is clearly better if we could find a design that eliminates the possibility of the “not allowed”
inputs. Table 5.5 is a state table for a D latch. It has two inputs, one for control, the other for
data, D. D = 1 sets the latch to 1, and D = 0 resets it to 0.
Current Next
Control D State State
0 − 0 0
0 − 1 1
1 0 0 0
1 0 1 0
1 1 0 1
1 1 1 1
Table 5.5: D latch with Control state table. “–” indicates that the value does not matter.
The D latch can be implemented as shown in Figure 5.20. The one data input, D, is fed to
the “S” side of the SR latch; the complement of the data value is fed to the “R” side.
Now we have a circuit that can store one bit of data, using the D input, and can be syn-
chronized with a clock signal, using the Control input. Although this circuit is reliable by itself,
100 CHAPTER 5. LOGIC CIRCUITS
D G1 S
G3
Q
Control
G4
G2 R Q′
the issue is whether it is reliable when connected with other circuit elements. The D signal
almost certainly comes from an interconnection of combinational and sequential logic circuits.
If it changes while the Control is still 1, the state of the latch will be changed.
Each electronic element in a circuit takes time to activate. It is a very short period of time,
but it can vary slightly depending upon precisely how the other logic elements are intercon-
nected and the state of each of them when they are activated. The problem here is that the
Control input is being used to control the circuit based on the clock signal level. The clock level
must be maintained for a time long enough to allow all the circuit elements to complete their
activity, which can vary depending on what actions are being performed. In essence, the circuit
timing is determined by the circuit elements and their actions instead of the clock. This makes
it very difficult to achieve a reliable design.
It is much easier to design reliable circuits if the time when an activity can be triggered is
made very short. The solution is to use edge-triggered logic elements. The inputs are applied
and enough time is allowed for the electronics to settle. Then the next clock transition activates
the circuit element. This scheme provides concise timing under control of the clock instead of
timing determined more of less by the particular circuit design.
5.3.3 Flip-Flops
Although the terminology varies somewhat in the literature, it is generally agreed that (see
Figure 5.14.):
At each “tick” of the clock, there are four possible actions that might be taken on a single bit —
store 0, store 1, complement the bit (also called toggle), or leave it as is.
A D flip-flop is a common device for storing a single bit. We can turn the D latch into a D
flip-flop by using two D latches connected in a master/slave configuration as shown in Figure
5.21. Let us walk through the operation of this circuit.
Master Slave
D
Q′
CK
The bit to be stored, 0 or 1, is applied to the D input of the Master D latch. The clock signal
is applied to the CK input. It is normally 0. When the clock signal makes a transition from 0 to
1, the Master D latch will either Reset or Set, following the D input of 0 or 1, respectively.
5.3. SEQUENTIAL LOGIC CIRCUITS 101
While the CK input is at the 1 level, the control signal to the Slave D latch is 1, which
deactivates this latch. Meanwhile, the output of this flip-flop, the output of the Slave D latch, is
probably connected to the input of another circuit, which is activated by the same CK. Since the
state of the Slave does not change during this clock half-cycle, the second circuit has enough time
to read the current state of the flip-flop connected to its input. Also during this clock half-cycle,
the state of the Master D latch has ample time to settle.
When the CK input transitions back to the 0 level, the control signal to the Master D latch
becomes 1, deactivating it. At the same time, the control input to the Slave D latch goes to 0,
thus activating the Slave D latch to store the appropriate value, 0 or 1. The new input will be
applied to the Slave D latch during the second clock half-cycle, after the circuit connected to its
output has had sufficient time to read its previous state. Thus, signals travel along a path of
logic circuits in lock step with a clock signal.
There are applications where a flip-flop must be set to a known value before the clocking
begins. Figure 5.22 shows a D flip-flop with an asynchronous preset input added to it. When a 1
PR
D
Q
Q′
CK
is applied to the P R input, Q becomes 1 and Q′ 0, regardless of what the other inputs are, even
CLK. It is also common to have an asynchronous clear input that sets the state (and output) to
0.
There are more efficient circuits for implementing edge-triggered D flip-flops, but this discus-
sion serves to show that they can be constructed from ordinary logic gates. They are economical
and efficient, so are widely used in very large scale integration circuits. Rather than draw the
details for each D flip-flop, circuit designers use the symbols shown in Figure 5.23. The various
CLR CLR
D Q D Q
Q1 Q2
CK Q CK Q
PR PR
(a) (b)
Figure 5.23: Symbols for D flip-flops. Includes asynchronous clear (CLR) and preset (PR). (a)
Positive-edge triggering; (b) Negative-edge triggering.
inputs and outputs are labeled in this figure. Hardware designers typically use Q instead of
Q′ . It is common to label the circuit as “Qn,” with n = 1, 2,. . . for identification. The small circle
at the clock input in Figure 5.23(b) means that this D flip-flop is triggered by a negative-going
clock transition. The D flip-flop circuit in Figure 5.21 can be changed to a negative-going trigger
by simply removing the first NOT gate at the CK input.
102 CHAPTER 5. LOGIC CIRCUITS
The flip-flop that simply complements its state, a T flip-flop, is easily constructed from a D
flip-flop. The state table and state diagram for a T flip-flop are shown in Figure 5.24.
1
Current Next
T State State 0 0 T 1 0
0 0 0
1
0 1 1
1 0 1
1 1 0
Figure 5.24: T flip-flop state table and state diagram. Each clock tick causes a state transition,
with the next state depending on the current state and the value of the input, T .
To determine the value that must be presented to the D flip-flop in order to implement a T
flip-flop, we add a column for D to the state table as shown in Table 5.6. By simply looking in
Current Next
T State State D
0 0 0 0
0 1 1 1
1 0 1 1
1 1 0 0
Table 5.6: T flip-flop state table showing the D flip-flop input required to place the T flip-flop in
the next state.
the “Next State” column we can see what the input to the D flip-flop must be in order to obtain
the correct state. These values are entered in the D column. (We will generalize this design
procedure in Section 5.4.)
From Table 5.6 it is easy to write the equation for D:
D = T ′ · Q + T · Q′
= T ⊕Q (5.8)
D Q Q T Q
T
Q1 Q2
CK CK Q Q′ CK Q
(a) (b)
Figure 5.25: T flip-flop. (a) Circuit using a D flip-flop. (b) Symbol for a T flip-flop.
Implementing all four possible actions — set, reset, keep, toggle — requires two inputs, J
and K, which leads us to the JK flip-flop. The state table and state diagram for a JK flip-flop
are shown in Figure 5.26.
5.3. SEQUENTIAL LOGIC CIRCUITS 103
10
Current Next 11
J K State State 00 00
01 0 JK 1 10
0 0 0 0
0 0 1 1 01
0 1 0 0 11
0 1 1 0
1 0 0 1
1 0 1 1
1 1 0 1
1 1 1 0
In order to determine the value that must be presented to the D flip-flop we add a column for
D to the state table as shown in Table 5.7. shows what values must be input to the D flip-flop.
Current Next
J K State State D
0 0 0 0 0
0 0 1 1 1
0 1 0 0 0
0 1 1 0 0
1 0 0 1 1
1 0 1 1 1
1 1 0 1 1
1 1 1 0 0
Table 5.7: JK flip-flop state table showing the D flip-flop input required to place the JK flip-flop
in the next state.
D = J ′ · K ′ · Q + J · K ′ · Q′ + J · K ′ · Q + J · K · Q′
= J · Q′ · (K ′ + K) + K ′ · Q · (J + J ′ )
= J · Q′ + K ′ · Q (5.9)
J CLR
J Q
D Q Q Q2
CK
K Q1
K Q
CK CK Q Q′ PR
(a) (b)
Figure 5.27: JK flip-flop. (a) Circuit using a D flip-flop. (b) Symbol for a JK flip-flop with asyn-
chronous CLR and PR inputs.
104 CHAPTER 5. LOGIC CIRCUITS
Design a counter that has an Enable input. When Enable = 1 it increments through the
sequence 0, 1, 2, 3, 0, 1,. . . with each clock tick. Enable = 0 causes the counter to remain in
its current state.
1. First we create a state table and state diagram:
1
Enable = 0 Enable = 1 0 1 2 0
Current Next Next
n n n 1
0 0 1 1
1 1 2
2 2 3 0 0 3 0
3 3 0 1
At each clock tick the counter increments by one if Enable = 1. If Enable = 0 it remains in
the current state. We have only shown the inputs because the output is equal to the state.
2. A reasonable choice is to use the binary numbering system for each state. With four states
we need two bits. We will let n = n1 n0 , giving the state table:
Enable = 0 Enable = 1
Current Next Next
n1 n0 n1 n0 n1 n0
0 0 0 0 0 1
0 1 0 1 1 0
1 0 1 0 1 1
1 1 1 1 0 0
1I wish to thank Dr. Lynn Stauffer for her valuable suggestions for this section.
5.4. DESIGNING SEQUENTIAL CIRCUITS 105
Notice the “don’t care” entries in the state table. Since the JK flip-flop is so versatile,
including the “don’t cares” helps find simpler circuit realizations. (See Exercise 5-3.)
5. We use Karnaugh maps, using E for Enable.
J0 (E, n1 , n0 ) n1 n0 K0 (E, n1 , n0 ) n1 n0
00 01 11 10 00 01 11 10
0 X X 0 X X
E E
1 1 X X 1 1 X 1 1 X
J1 (E, n1 , n0 ) n1 n0 K1 (E, n1 , n0 ) n1 n0
00 01 11 10 00 01 11 10
0 X X 0 X X
E E
1 1 X X 1 X X 1
J0 (E, n1 , n0 ) = E
K0 (E, n1 , n0 ) = E
J1 (E, n1 , n0 ) = E · n0
K1 (E, n1 , n0 ) = E · n0
Enable J Q n0
Q0
CK
J Q n1
Q1
CK
CLK
106 CHAPTER 5. LOGIC CIRCUITS
The timing of the binary counter is shown here when counting through the sequence 3, 0, 1, 2,
3 (11, 00, 01, 10, 11).
1
n1
0
1
Q1 .JK
0
1
n0
0
1
Q0 .JK
0
CLK 0
Q1 Q0 11 00 01 10 11
Qi .JK is the input to the ith JK flip-flop, and ni is its output. (Recall that J = K in this design.)
When the ith input, Qi .JK, is applied to its JK flip-flop, remember that the state of the flip-flop
does not change until the second half of the clock cycle. This can be seen when comparing the
trace for the corresponding output, ni , in the figure.
Note the short delay after a clock transition before the value of each ni actually changes.
This represents the time required for the electronics to completely settle to the new values.
Except for very inexpensive microcontrollers, most modern CPUs execute instructions in
stages. An instruction passes through each stage in an assembly-line fashion, called a pipeline.
The action of the first stage is to fetch the instruction from memory, as will be explained in
Chapter 6.
After an instruction is fetched from memory, it passes onto the next stage. Simultaneously,
the first stage of the CPU fetches the next instruction from memory. The result is that the
CPU is working on several instructions at the same time. This provides some parallelism, thus
improving execution speed.
Almost all programs contain conditional branch points — places where the next instruction
to be fetched can be in one of two different memory locations. Unfortunately, the decision of
which of the two instructions to fetch is not known until the decision-making instruction has
moved several stages into the pipeline. In order to maintain execution speed, as soon as a
conditional branch instruction has passed on from the fetch stage, the CPU needs to predict
where to fetch the next instruction from.
In this next example we will design a circuit to implement a prediction circuit.
Example 5-b
Design a circuit that predicts whether a conditional branch is taken or not. The predictor
continues to predict the same outcome, take the branch or do not take the branch, until it
makes two mistakes in a row.
1. We use “Yes” to indicate when the branch is taken and “No” to indicate when it is not. The
state diagram shows four states:
5.4. DESIGNING SEQUENTIAL CIRCUITS 107
No fromYes
N No Yes
N
Y N N Y
fromNo Yes
No Yes Y
Y
Let us begin in the “No” state. The prediction is that the next branch will also not be
state
taken. The notation in the state bubbles is output , showing that the output in this state is
also “No.”
The input to the circuit is whether or not the branch was actually taken. The arc labeled
“N” shows the transition when the branch was not taken. It loops back to the “No” state,
with the prediction (the output) that the branch will not be taken the next time. If the
branch is taken, the “Y” arc shows that the circuit moves into the “fromNo” state, but still
predicting no branch the next time.
From the “fromNo” state, if the branch is not taken (the prediction is correct), the circuit
returns to the “No” state. However, if the branch is taken, the “Y” shows that the circuit
moves into the “Yes” state. This means that the circuit predicted incorrectly twice in a row,
so the prediction is changed to “Yes.”
You should be able to follow this state diagram for the other cases and convince yourself
that both the “fromNo” and “fromYes” states are required.
Next we look at the state table:
Actual = No Actual = Yes
Current Next Next
State Prediction State Prediction State Prediction
No No No No fromNo No
fromNo No No No Yes Yes
fromYes Yes No No Yes Yes
Yes Yes fromYes Yes Yes Yes
2. Since there are four states, we need two bits. We will let 0 represent “No” and 1 represent
“Yes.” The input is whether the branch is actually taken (1) or not (0). And the output is
the prediction of whether it will be taken (1) or not (0).
We choose a binary code for the state, s1 s0 , such that the high-order bit represents the
prediction, and the low-order bit what the last input was. That is:
State Prediction s1 s0
No No 0 0
f romN o No 0 1
f romY es Y es 1 0
Y es Y es 1 1
Input = 0 Input = 1
Current N ext N ext
s1 s0 s1 s0 s1 s0
0 0 0 0 0 1
0 1 0 0 1 1
1 0 0 0 1 1
1 1 1 0 1 1
108 CHAPTER 5. LOGIC CIRCUITS
4. Next we add columns to the binary state table showing the JK inputs required in order to
cause the correct state transitions.
Input = 0 Input = 1
Current N ext N ext
s1 s0 s1 s0 J1 K1 J0 K0 s1 s0 J1 K1 J0 K0
0 0 0 0 0 X 0 X 0 1 0 X 1 X
0 1 0 0 0 X X 1 1 1 1 X X 0
1 0 0 0 X 1 0 X 1 1 X 0 1 X
1 1 1 0 X 0 X 1 1 1 X 0 X 0
J0 (In, s1 , s0 ) s1 s0 K0 (In, s1 , s0 ) s1 s0
00 01 11 10 00 01 11 10
0 X X 0 X 1 1 X
In In
1 1 X X 1 1 X X
J1 (In, s1 , s0 ) s1 s0 K1 (In, s1 , s0 ) s1 s0
00 01 11 10 00 01 11 10
0 X X 0 X X 1
In In
1 1 X X 1 X X
J0 (In, s1 , s0 ) = In
K0 (In, s1 , s0 ) = In′
J1 (In, s1 , s0 ) = In · s0
K1 (In, s1 , s0 ) = In′ · s′0
Actual J Q s0
Q0
CK
K Q
J Q s1 = P rediction
Q1
CK
CLK
5.5. MEMORY ORGANIZATION 109
5.5.1 Registers
Registers are used in places where small amounts of very fast memory is required. Many are
found in the CPU where they are used for numerical computations, temporary data storage, etc.
They are also used in the hardware that serves to interface between the CPU and other devices
in the computer system.
We begin with a simple 4-bit register, which allows us to store four bits. Figure 5.28 shows
a design for implementing a 4-bit register using D flip-flops. As described above, each time the
d3 d2 d1 d0
D Q r0
Q0
CK
D Q r1
Q1
CK
D Q r2
Q2
CK
D Q r3
Q3
CLK CK
Figure 5.28: A 4-bit register. A D flip-flop is used to hold each bit. The state of the ith bit is
set by the value of di at each clock tick. The 4-bit value stored in the register is
r = r3 r2 r1 r0 .
clock cycles the state of each of the D flip-flops is set according to the value of d = d3 d2 d1 d0 .
The problem with this circuit is that any changes in any of the di s will change the state of the
corresponding bit in the next clock cycle, so the contents of the register are essentially valid for
only one clock cycle.
One-cycle buffering of a bit pattern is sufficient for some applications, but there is also a
need for registers that will store a value until it is explicitly changed, perhaps billions of clock
cycles later. The circuit in Figure 5.29 uses adds a load signal and feedback from the output of
each bit. When load = 1 each bit is set according to its corresponding input, di . When load = 0
the output of each bit, ri , is used as the input, giving no change. So this register can be used to
store a value for as many clock cycles as desired. The value will not be changed until load is set
to 1.
Most computers need many general purpose registers. When two or more registers are
grouped together, the unit is called a register file. A mechanism must be provided for addressing
one of the registers in the register file.
Consider a register file composed of eight 4-bit registers, r0 – r7. We could build eight copies
of the circuit shown in Figure 5.29. Let the 4-bit data input, d, be connected in parallel to all of
110 CHAPTER 5. LOGIC CIRCUITS
d3 d2 d1 d0
load
D Q r0
Q0
CK
D Q r1
Q1
CK
D Q r2
Q2
CK
D Q r3
Q3
CK
CLK
Figure 5.29: A 4-bit register with load. The storage portion is the same as in Figure 5.28. When
load = 1 each bit is set according to its corresponding input, di . When load = 0 the
output of each bit, ri , is used as the input, giving no change.
the corresponding data pins, d3 d2 d1 d0 , of each of the eight registers. Three bits are required to
address one of the registers (23 = 8). If the 8-bit output from a 3 × 8 decoder is connected to the
eight load inputs of each of the registers, d will be loaded into one, and only one, of the registers
during the next clock cycle. All the other registers will have load = 0, and they will simply
maintain their current state. Selecting the output from one of the eight registers can be done
with four 8-input multiplexers. One such multiplexer is shown in Figure 5.30. The inputs r0i
– r7i are the ith bits from each of eight registers, r0 – r7. One of the eight registers is selected
r0i
r1i
0
r2i 1
2
r3i 3
4
Reg_Outi
r4i 5
r5i 6 Sel
7
r6i 3
r7i Reg_Sel
Figure 5.30: 8-way mux to select output of register file. This only shows the output of the ith
bit. n are required for n-bit registers. Reg_Sel is a 3-bit signal that selects on of the
eight inputs.
5.5. MEMORY ORGANIZATION 111
for the 1-bit output, Reg_Outi , by the 3-bit input Reg_Sel. Keep in mind that four of these
output circuits would be required for 4-bit registers. The same Reg_Sel would be applied to all
four multiplexers simultaneously in order to output all four bits of the same register. Larger
registers would, of course, require correspondingly more multiplexers.
There is another important feature of this design that follows from the master/slave property
of the D flip-flops. The state of the slave portion does not change until the second half of the
clock cycle. So the circuit connected to the output of this register can read the current state
during the first half of the clock cycle, while the master portion is preparing to change the state
to the new contents.
si D Q r0
Q0
CK
D Q r1
Q1
CK
D Q r2
Q2
CK
D Q r3
Q3
CLK CK
Figure 5.31: Four-bit serial-to-parallel shift register. A D flip-flop is used to hold each bit. Bits
arrive at the input, si , one at a time. The last four input bits are available in
parallel at r3 – r0 .
is input at si . At each clock tick, the output of Q0 is applied to the input of Q1 , thus copying
the previous value of r0 to the new r1 . The state of Q0 changes to the value of the new si , thus
copying this to be the new value of r0 . The serial stream of bits continues to ripple through the
four bits of the shift register. At any time, the last four bits in the serial stream are available in
parallel at the four outputs, r3 ,. . . ,r0 .
The same circuit could be used to provide a time delay of four clock ticks in a serial bit
stream. Simply use r3 as the serial output.
112 CHAPTER 5. LOGIC CIRCUITS
to the input, is connected to whatever circuit element follows the tri-state buffer. But when
Enable = 0, the output is essentially disconnected. Be careful to realize that this is different
from 0; being disconnected means it has no effect on the circuit element to which it is connected.
A 4-way multiplexer using a 2 × 4 decoder and four tri-state buffers is illustrated in Figure
5.33. Compare this design with the 4-way multiplexer shown in Figure 5.8, page 90. The tri-
x
Output
y
s0 2×4
s1 decoder
Figure 5.33: Four way multiplexer built from tri-state buffers. Output = w, x, y, or z, depending
on which one is selected by s1 s0 fed into the decoder. Compare with Figure 5.8,
page 90.
state buffer design may not be an advantage for small multiplexers. But an n-way multiplexer
without tri-state buffers requires an n-input OR gate, which presents some technical electronic
problems.
Figure 5.34 shows how tri-state buffers can be used to implement a single memory cell.
This circuit shows only one 4-bit memory cell so you can compare it with the register design
in Figure 5.28, but it scales to much larger memories. W rite is asserted to store data in the
D flip-flops. Read enables the output tri-state buffer in order to connect the single output line
to M em_data_out. The address decoder is also used to enable the tri-state buffers to connect a
memory cell to the output, r3 r2 r1 r0 .
This type of memory is called Static Random Access Memory (SRAM). “Static” because the
memory retains its stored values as long as power to the circuit is maintained. “Random access”
because it takes the same length of time to access the memory at any address.
5.5. MEMORY ORGANIZATION 113
d3 d2 d1 d0
Read_enable
addrj
W rite_enable
D Q r0
Q0
CK
D Q r1
Q1
CK
D Q r2
Q2
CK
D Q r3
Q3
CK
CLK
Figure 5.34: 4-bit memory cell. Each is output through a tri-state buffer. addri is one output
from a decoder corresponding to an address.
A 1 MB memory requires a 20 bit address. This requires a 20 × 220 address decoder as shown
in Figure 5.35. Recall from Section 5.1.3 (page 86) that an n × 2n decoder requires 2n AND
Write
20 20 × 220 220
Address 1 MB Mem.
Decoder
Read
Data
20
Figure 5.35: Addressing 1 MB of memory with one 20 × 2 address decoder. The short line
through the connector lines indicates the number of bits traveling in parallel in
that connection.
gates. We can simplify the circuitry by organizing memory into a grid of rows and columns as
114 CHAPTER 5. LOGIC CIRCUITS
shown in Figure 5.36. Although two decoders are required, each requires 2n/2 AND gates, for a
10 10 × 210
Decoder
210
Write
20
Address
10 10 × 210 210
1 MB Mem.
Decoder
Read
Data
Figure 5.36: Addressing 1 MB of memory with two 10 × 210 address decoders.
total of 2 × 2n/2 = 2(n/2)+1 AND gates for the decoders. Of course, memory cell access is slightly
more complex, and some complexity is added in order to split the 20-bit address into two 10-bit
portions.
Data Latch
When the “Row Address Select” line is asserted all the transistors in that row are turned on,
thus connecting the respective capacitor to the Data Latch. The value stored in the capacitor,
high voltage or low voltage, is stored in the Data Latch. There, it is available to be read from
the memory. Since this action tends to discharge the capacitors, they must be refreshed from
the values stored in the Data Latch.
5.6. EXERCISES 115
When new data is to be stored in DRAM, the current values are first stored in the Data
Latch, just as in a read operation. Then the appropriate changes are made in the Data Latch
before the capacitors are refreshed.
These operations take more time than simply switching flip-flops, so DRAM is appreciably
slower than SRAM. In addition, capacitors lose their charge over time. So each row of capacitors
must be read and refreshed in the order of every 60 msec. This requires additional circuitry and
further slows memory access. But the much lower cost of DRAM compared to SRAM warrants
the slower access time.
This has been only an introduction to how switching transistors can be connected into circuits
to create a CPU. We leave the details to more advanced books, e.g., [20], [23], [24], [28], [31], [34].
5.6 Exercises
The greatest benefit will be derived from these exercises if you either build the circuits with
hardware or using a simulation program. Several free circuit simulation applications are avail-
able that run under GNU/Linux.
5-1 (§5.1) Build a four-bit adder.
5-2 (§5.1) Build a four-bit adder/subtractor.
5-3 (§5.4) Redesign the 2-bit counter of Example 5-a using only the “set” and “reset” inputs of
the JK flip-flops. So your state table will not have any “don’t cares.”
5-4 (§5.4) Design a 4-bit up counter — 0, 1, 2,. . . ,15, 0,. . .
5-5 (§5.4) Design a 4-bit down counter — 15, 14, 13,. . . ,0, 15,. . .
5-6 (§5.4) Design a decimal counter — 0, 1, 2,. . . ,9, 0,. . .
5-7 (§5.5) Build the register file described in Section 5.5.1. It has eight 4-bit registers. A 3 × 8
decoder is used to select a register to be loaded. Four 8-way multiplexers are used to select
the four bits from one register to be output.
Chapter 6
In this chapter we move on to consider a programmer’s view of the Central Processing Unit
(CPU) and how it interacts with memory. X86-64 CPUs can be used with either a 32-bit or a 64-
bit operating system. The CPU features available to the programmer depend on the operating
mode of the CPU. The modes of interest to the applications programmer are summarized in
Table 6.1. With a 32-bit operating system, the CPU behaves essentially the same as an x86-32
CPU.
Table 6.1: X86-64 operating modes. Intel manuals use the term “IA-32e” and AMD manuals
use “Long” when running a 64-bit operating system. Both manuals use the same
terminology for the two sub-modes. Adapted from Table 1-1 in [2].
In this book we describe the view of the CPU when running a 64-bit operating system. Intel
manuals call this the IA-32e mode and the AMD manuals call it the long mode. The CPU can
run in one of two sub-modes under a 64-bit operating system. Both manuals use the same
terminology for the two sub-modes.
• Compatibility mode – Most programs compiled for a 32-bit or 16-bit environment can be
run without re-compiling.
116
6.1. CPU OVERVIEW 117
Instruction Pointer
L1 Cache
Memory
Instruction Register
Control Unit
Registers
Arithmetic
/Logic Unit
Bus Interface
Flags Register
Figure 6.1: CPU block diagram. The CPU communicates with the Memory and I/O subsystems
via the Address, Data, and Control buses. See Figure 1.1 (page 3).
We will now describe briefly each of the subsystems in Figure 6.1. The descriptions provided
here are generic and apply to most CPUs. Components that are of particular interest to a
programmer are described within the context of the x86 ISA later in this chapter.
Bus Interface: This is the means for the CPU to communicate with the rest of the computer
system — Memory and I/O Devices. It contains circuitry to place addresses on the address
bus, read and write data on the data bus, and read and write signals on the control bus.
The bus interface on many CPUs interfaces with external bus control units that in turn
interface with memory and with different types of I/O buses, e.g., SATA, PCI-E, etc. The
external control units are transparent to the programmer.
L1 Cache Memory: Although it could be argued that this is not a part of the CPU, most mod-
ern CPUs include very fast cache memory on the CPU chip. As you will see in Section 6.4,
each instruction must be fetched from memory. The CPU can execute instructions much
faster than they can be fetched. The interface with memory makes it more efficient to fetch
several instructions at one time, storing them in L1 cache where the CPU has very fast
access to them. Many modern CPUs use two L1 cache memories organized in a Harvard
architecture — one for instructions, the other for data. (See Section 1.2, page 4.) It’s use is
generally transparent to an applications programmer.
Instruction Pointer: This is a 64-bit register that always contains the address of the next
instruction to be executed. See Section 6.2 for more details.
Instruction Register: This register contains the instruction that is currently being executed.
Its bit pattern determines what the Control Unit is causing the CPU to do. Once that
action has been completed, the bit pattern in the instruction register can be changed, and
the CPU will perform the operation specified by this next bit pattern.
118 CHAPTER 6. CENTRAL PROCESSING UNIT
Most modern CPUs use an instruction queue that is built into the chip. Several instructions are
waiting in the queue, ready to be executed. Separate electronic circuitry keeps the instruction
queue full while the regular control unit is executing the instructions. But this is simply an
implementation detail that allows the control unit to run faster. The essence of how the control
unit executes a program is represented by the single instruction register model.
Control Unit: The bits in the Instruction Register are decoded in the Control Unit. It gener-
ates the signals that control the other subsystems in the CPU to carry out the action(s)
specified by the instruction. It is typically implemented as a finite-state machine and con-
tains Decoders (Section 5.1.3), Multiplexers (Section 5.1.4), and other logic components.
Arithmetic Logic Unit (ALU): A device that performs arithmetic and logic operations on groups
of bits. The logic circuitry to perform addition is discussed in Section 5.1.1.
Flags Register: Each operation performed by the ALU results in various conditions that must
be recorded. For example, addition can produce a carry. One bit in the Flags Register will
be set to either zero (no carry) or one (carry) after the ALU has completed any operation
that may produce a carry.
We will now look at how the logic circuits discussed in Chapter 4 can be used to implement some
of these subsystems.
Storing 32 bits
• Byte — the low-order 8 bits [7 – 0] (and in four registers bits [15 – 8]).
sets top half of
register to zero. The assembler uses a different name for each group of bits in a register. The assembler
names for the groups of the bits are given in Table 6.3. In 64-bit mode, writing to an 8-bit or
16-bit portion of a register does not affect the other 56 or 48 bits in the register. However, when
writing to the low-order 32 bits, the high-order 32 bits are set to zero.
6.2. CPU REGISTERS 119
Table 6.2: The x86-64 registers. Not all the registers shown here are discussed in this chapter.
Some are discussed in subsequent chapters that deal with the related topic.
bits 63- bits 31- bits 15- bits 15- bits 7-0
0 0 0 8
rax eax ax ah al
rbx ebx bx bh bl
rcx ecx cx ch cl
rdx edx dx dh dl
rsi esi si sil
rdi edi di dil
rbp ebp bp bpl
rsp esp sp spl
r8 r8d r8w r8b
r9 r9d r9w r9b
r10 r10d r10w r10b
r11 r11d r11w r11b
r12 r12d r12w r12b
r13 r13d r13w r13b
r14 r14d r14w r14b
r15 r15d r15w r15b
Table 6.3: Assembly language names for portions of the general-purpose CPU registers. Pro-
grams running in 32-bit mode can only use the registers above the line in this table.
64-bit mode allows the use of all the registers. The ah, bh, ch, and dh registers cannot
be used with any of the (8-bit) registers below the line.
120 CHAPTER 6. CENTRAL PROCESSING UNIT
rax -
eax -
ax -
ah - al -
rsi -
esi -
si -
sil -
r8 -
r8d -
r8w -
r8b -
Figure 6.2: Graphical representation of general purpose registers. The three shown here are
representative of the pattern of all the general purpose registers.
The 8-bit register portions ah, bh, ch, and dh are a holdover from the Intel® 8086/8088 ar-
chitecture. It had four 16-bit registers, ax, bx, cx, and dx. The low-order bytes were named al,
bl, cl, and dl and the high-order bytes named ah, bh, ch, and dh. Access to these registers has
been maintained in 32-bit mode for backward compatibility but is limited in 64-bit mode. Access
to the 8-bit low-order portions of the rsi, rdi, rsp, and rbp registers was added along with the
move to 64 bits in the x86-64 architecture but cannot be used in the same instruction with the
8-bit register portions of the xh registers.
When using less than the entire 64 bits in a register, it is generally bad to write code that assumes
the remaining portion is in any particular state. Such code is difficult to read and leads to errors
during its maintenance phase.
Although these are called “general purpose,” the descriptions in Table 6.4 show that some
of them have some special significance, depending upon how they are used. (Some of the de-
scriptions may not make sense to you at this point.) In this book, we will use the rax, rdx, rdi,
esi, and r8 – r15 registers for general-purpose storage. They will be used just like variables in
a high-level language. Usage of the rsp and rbp registers follows a very strict discipline. You
should not use either of them for your assembly language programs until you understand how
to use them.
The instruction pointer register, rip1 , always points to the next instruction to be executed.
As explained in Section 6.4 (page 123), every time an instruction is fetched, the rip register is
automatically incremented by the control unit to contain the address of the next instruction.
Thus, the rip register is never directly accessed by the programmer. On the other hand, every
instruction that is executed affects the contents of the rip register. Thus, the rip register is not
a general-purpose register, but it guides the flow of the entire program.
1 In many other environments, the equivalent register is called the program counter.
6.2. CPU REGISTERS 121
Most arithmetic and logical operations affect the condition codes in the rflags register. The
bits that are affected are shown in Figure 6.3.
11 10 9 8 7 6 5 4 3 2 1 0
OF SF ZF AF PF CF
Figure 6.3: Condition codes portion of the rflags register. The high-order 32 bits (32 – 63) are
reserved for other use and are not shown here. Neither are bits 12 – 31, which are
for system flags (see [3]).
OF Overflow Flag
SF Sign Flag
ZF Zero Flag
AF Auxiliary carry or Adjust Flag
PF Parity Flag
CF Carry Flag
The OF, SF, ZF, and CF are described at appropriate places in this book. See [3] and [14] for
descriptions of the other flags.
Two other registers are very important in a program. The rsp register is used as a stack
pointer, as will be discussed in Section 8.2 (page 158). The rbp register is typically used as a
base pointer; it will be discussed in Section 8.3 (page 164).
122 CHAPTER 6. CENTRAL PROCESSING UNIT
The “e” prefix on the 32-bit portion of each register name comes from the history of the x86 architec-
ture. The introduction of the 80386 in 1986 brought an increase of register size from 16 bits to 32
bits. There were no new registers. The old ones were simply “extended.”
Data Bus
Address Bus
Control Bus
Figure 6.4: Subsystems of a computer. The CPU, Memory, and I/O subsystems communicate
with one another via the three bussed. (Repeat of Figure 1.1.)
As an example of how data can be stored in memory, let us imagine that we have some data
in one of the CPU registers. Storing this data in memory is effected by setting the states of a
group of bits in memory to match those in the CPU register. The control unit can be programmed
to do this by
1. sending the memory address on the address bus,
2. sending a copy of the register bit states on the data bus, then
3. sending a “write” signal on the control bus.
For example, if the eight bits in memory at address 0x7fffd9a43cef are in the state:
0x7fffd9a43cef: b7
and the control unit is programmed to store this value at location 0x7fffd9a43cef, the control
unit then
Store data in
memory by
writing it there.
1. places 0x7fffd9a43cef on the address bus,
2. places the bit pattern e2 on the data bus, and
3. places a “write” signal on the control bus.
6.4. PROGRAM EXECUTION IN THE CPU 123
Then the bits at memory location 0x7fffd9a43cef will be changed to the state:
0x7fffd9a43cef: e2
Important. When the state of any bit in memory or in a register is changed any previous
states are lost forever. There is no way to “undo” this state change nor to determine how
the bit got in its current state.
Most modern CPUs use an instruction queue. Several instructions are waiting in the queue, ready
to be executed. Separate electronic circuitry keeps the instruction queue full while the regular
control unit is executing the instructions. But this is simply an implementation detail that allows
the control unit to run faster. The essence of how the control unit executes a program is represented
by the single instruction register model.
Since instructions are simply bit patterns, they can be stored in memory. The instruction
pointer register always has the memory address of (points to) the next instruction to be executed.
In order for the control unit to execute this instruction, it is copied into the instruction register.
The situation is as follows:
1. A sequence of instructions is stored in memory.
2. The memory address where the first instruction is located is copied to the instruction
pointer.
3. The CPU sends the address in the instruction pointer to memory on the address bus.
4. The CPU sends a “read” signal on the control bus.
5. Memory responds by sending a copy of the state of the bits at that memory location on the
data bus, which the CPU then copies into its instruction register.
6. The instruction pointer is automatically incremented to contain the address of the next
instruction in memory.
7. The CPU executes the instruction in the instruction register.
8. Go to step 3.
Steps 3, 4, and 5 are called an instruction fetch. Notice that steps 3 – 8 constitute a cycle, the
instruction execution cycle. It is shown graphically in Figure 6.5.
124 CHAPTER 6. CENTRAL PROCESSING UNIT
Fetch the
instruction
pointed to by the
Instruction
Pointer
Add number of
bytes in the
instruction to
Instruction
Pointer
Execute the
instruction
Is it the halt
No instruction?
Yes
Stop CPU
How do we get the instructions into memory? The instructions for a program are stored
in a file on a storage device, usually a disk. The computer system is controlled by an
operating system. When you indicate to the operating system that you wish to execute
a program, e.g., by double-clicking on its icon, the operating system locates a region of
memory large enough to hold the instructions in the program then copies them from the
file to memory. The contents in the file remain unchanged. 2
How do we create a file on the disk that contains the instructions? This is a multi-step
process using several programs that are provided for you. The programs and the files that
each create are:
The source file is written in a programming language, e.g., C++. This is very similar
to creating a file with a word processor. The main differences are that an editor is
much simpler than a word processor, and the contents of the source file are written in
the programming language instead of, say, English.
• A compiler/assembler is used to create object files.
The compiler translates the programming language in a source file into the bit pat-
terns that can be used by a CPU (machine language). The source file contents remains
unchanged.
• A linker is used to create executable files.
Most programs are made up of several object files. For example, a GNU/Linux in-
stallation includes many object files that contain the machine instructions to perform
common tasks. These are programs that have already been written and compiled.
Related tasks are commonly grouped together into a single file called a library.
Whenever possible, you should use the short programs in these libraries to perform
the computations your program needs rather that write it yourself. The linker pro-
gram will merge the machine code from these several object files into one file.
You may have used an integrated development environment (IDE), e.g., Microsoft®Visual
Studio®, Eclipse™, which combines all of these three programs into one package where each
of the intermediate steps is performed automatically. You use the editor program to create the
source file and then give the run command to the IDE. The IDE will compile the program in
your source files, link the resulting object files with the necessary libraries, load the resulting
executable file into memory, then start your program. In general, the intermediate object files
resulting from the compilation of each source file are automatically deleted from the disk.
In this book we will explicitly perform each of these steps separately so we can learn the role
of each program — editor, assembler, linker — used in preparing the application program.
2 This is a highly simplified description. The details depend upon the overall system.
126 CHAPTER 6. CENTRAL PROCESSING UNIT
6 */
7
8 #include <stdio.h>
9
10 int main(void)
11 {
12 register int wye;
13 int *ptr;
14 int ex;
15
16 ptr = &ex;
17 ex = 305441741;
18 wye = -1;
19 printf("Enter an integer: ");
20 scanf("%i", ptr);
21 wye += *ptr;
22 printf("The result is %i\n", wye);
23
24 return 0;
25 }
Listing 6.1: Simple program to illustrate the use of gdb to view CPU registers.
We introduced some gdb commands in Chapter 2. Here are some additional ones that will be
used in this section:
• n — execute current source code statement of a program that has been running; if it’s a
call to a function, the entire function is executed.
Useful gdb
commands.
• s — execute current source code statement of a program that has been running; if it’s a
call to a function, step into the function.
• si — execute current (machine) instruction of a program that has been running; if it’s a
call to a function, step into the function.
• i r — info registers — displays the contents of the registers, except floating point and
vector.
Here is a screen shot of how I compiled the program then used gdb to control the execution
of the program and observe the register contents. My typing is boldface and the session is
annotated in italics. Note that you will probably see different addresses if you replicate this
example on your own (Exercise 6-1).
The “-g” option is required. It tells the compiler to include debugger information in
the executable program.
7
8 #include <stdio.h>
9
10 int main(void)
11
12 register int wye;
13 int *ptr;
14 int ex;
15
16 ptr = &ex;
(gdb)
17 ex = 305441741;
18 wye = -1;
19 printf("Enter an integer: ");
20 scanf("%i", ptr);
21 wye += *ptr;
22 printf("The result is %i\n", wye);
23
24 return 0;
25
(gdb)
The li command lists ten lines of source code. The display is centered around the
current line. Since I have not started execution of this program, the display is centered
around the beginning of main. The display ends with the (gdb) prompt. Pushing the
return key repeats the previous command, and li is smart enough to display the next
ten lines.
(gdb) br 19
Breakpoint 1 at 0x400569: file gdbExample1.c, line 19.
(gdb) run
Starting program: /home/bob/my_book_64/progs/chap06/gdbExample1
I set a breakpoint at line 19 then run the program. When line 19 is reached, the
program is paused before the statement is executed, and control returns to gdb.
(gdb) print ex
$1 = 305441741
(gdb) print &ex
$2 = (int *) 0x7fff504c473c
I use the print command to view the value assigned to the ex variable and learn its
memory address.
(gdb) help x
Examine memory: x/FMT ADDRESS.
ADDRESS is an expression for the memory address to examine.
FMT is a repeat count followed by a format letter and a size letter.
Format letters are o(octal), x(hex), d(decimal), u(unsigned decimal),
t(binary), f(float), a(address), i(instruction), c(char) and s(string).
Size letters are b(byte), h(halfword), w(word), g(giant, 8 bytes).
The specified number of objects of the specified size are printed
according to the format.
Defaults for format and size letters are those previously used.
128 CHAPTER 6. CENTRAL PROCESSING UNIT
The help command will provide very brief instructions on using a command. We want
to display values stored in specific memory locations in various formats, and the help
command provides a reminder of how to use the command.
I verify that the value assigned to the ex variable is stored at location 0x7fff504c473c.
Next, I examine all four bytes of the word, one byte at a time. In this display,
In other words, the byte-wise display appears to be backwards. This is due to the
values being stored in the little endian storage scheme as explained on page 19 in
Chapter 2.
I also examine all four bytes of the word, two bytes at a time. In this display,
This shows how gdb displays these four bytes as though they represent two 16-bit ints
stored in little endian format. (You can now see why I entered such a strange integer
in this demonstration run.)
The compiler has honored our request and allocated a register for the wye variable.
Registers are located in the CPU and do not have memory addresses, so gdb cannot
print the address. We will need to use the i r command to view the register contents.
(gdb) i r
rax 0x7fff504c473c 140734540564284
rbx 0xffffffff 4294967295
rcx 0x0 0
rdx 0x7fff504c4838 140734540564536
rsi 0x7fff504c4828 140734540564520
rdi 0x1 1
rbp 0x7fff504c4750 0x7fff504c4750
rsp 0x7fff504c4730 0x7fff504c4730
r8 0x7ff0482a22e0 140669979599584
r9 0x7ff0482b6160 140669979681120
r10 0x7fff504c4590 140734540563856
r11 0x7ff047f534c0 140669976130752
r12 0x400460 4195424
r13 0x7fff504c4820 140734540564512
r14 0x0 0
r15 0x0 0
rip 0x400569 0x400569 <main+29>
eflags 0x206 [ PF IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
fctrl 0x37f 895
fstat 0x0 0
ftag 0xffff 65535
fiseg 0x0 0
fioff 0x0 0
foseg 0x0 0
---Type <return> to continue, or q <return> to quit---
fooff 0x0 0
fop 0x0 0
mxcsr 0x1f80 [ IM DM ZM OM UM PM ]
The i r command displays the current contents of the CPU registers. The first column
is the name of the register. The second shows the current bit pattern in the register,
in hexadecimal. Notice that leading zeros are not displayed. The third column shows
some the register contents in 64-bit signed decimal. The registers that always hold
addresses are also shown in hexadecimal in the third column. The columns are often
not aligned due to the tabbing of the display.
We see that the value in the ebx general purpose register is the same as that stored in
the wye variable, 0xffffffff.3 (Recall that ints are 32 bits, even in 64-bit mode.) We
conclude that the compiler chose to allocate ebx as the wye variable.
3 If this is not clear, you need to review Section 3.3.
130 CHAPTER 6. CENTRAL PROCESSING UNIT
Notice the value in the rip register, 0x400569. Refer back to where I set the break-
point on source line 19. This shows that the program stopped at the correct memory
location.
It is only coincidental that the address of the ex variable is currently stored in the rax
register. If a general purpose register is not allocated as a variable within a function,
it is often used to store results of intermediate computations. You will learn how to use
registers this way in subsequent chapters of this book.
(gdb) br 21
Breakpoint 2 at 0x40058b: file gdbExample1.c, line 21.
(gdb) br 22
Breakpoint 3 at 0x400593: file gdbExample1.c, line 22.
These two breakpoints will allow us to examine the value stored in the wye variable
just before and after it is modified.
(gdb) cont
Continuing.
Enter an integer: 123
This verifies that the user’s input value is stored correctly and that the wye variable
has not yet been changed.
(gdb) cont
Continuing.
And this verifies that our (rather simple) algorithm works correctly.
(gdb) i r rbx rip
rbx 0x7a 122
rip 0x400593 0x400593 <main+71>
We can specify which registers to display with the i r command. This verifies that the
rbx register is being used as the wye variable.
And we see that the rip has incremented from 0x400569 to 0x400593. Don’t forget that
the rip register always points to the next instruction to be executed.
(gdb) cont
Continuing.
The result is 122
Finally, I continue to the end of the program. Notice that gdb is still running and I
have to quit the gdb program.
6.6. EXERCISES 131
6.6 Exercises
6-1 (§6.2, §6.5) Enter the program in Listing 6.1 and trace through the program one line at a
time using gdb. Use the n command, not s or si. Keep a written record of the rip register
at the beginning of each line. Hint: use the i r command. How many bytes of machine
code are in each of the C statements in this program? Note that the addresses you see in
the rip register may differ from the example given in this chapter.
6-2 (§6.2, §6.4) As you trace through the program in Exercise 6-1 stop on line 22:
wye += *ptr;
We determined in the example above that the %rbx register is used for the variable wye.
Inspect the registers.
a) What is the address of the first instruction that will be executed when you enter the
n command?
b) How will %rbx change when this statement is executed?
6-3 (§6.5) Modify the program in Listing 6.1 so that a register is also requested for the ex
variable. Were you able to convince the compiler to do this for you? Did the compiler
produce any error or warning messages? Why do you think the compiler would not use a
register for this variable.
6-4 (§6.2, §6.5) Use the gdb debugger to observe the contents of memory in the program from
Exercise 2-31. Verify that your algorithm creates a null-terminated string without the
newline character.
6-5 (§6.2, §6.5) Write a program in C that allows you to determine the endianess of your com-
puter. Hint: use unsigned char* ptr.
6-6 (§6.2, §6.5) Modify the program in Exercise 6-5 so that you can demonstrate, using gdb,
that endianess is a property of the CPU. That is, even though a 32-bit int is stored little
endian in memory, it will be read into a register in the “proper” order. Hint: declare a
second int that is a register variable; examine memory one byte at a time.
Chapter 7
Programming in Assembly
Language
While reading this chapter, you should also consult the info resources available in
most GNU/Linux installations for both the make and the as programs. Appendix B
provides a general tutorial for writing Makefiles, but you need to get the details from
info. info is especially important for learning about as’s assembler directives.
You should also reread the Development Environment section on page xviii.
Creating a program in assembly language is essentially the same as creating one in a high-
level compiled language like C, C++, Java, FORTRAN, etc. We will begin the chapter by looking
in detail at the steps involved in creating a C program. Then we will look at which of these steps
apply to assembly language programming.
1. A text editor is used to write the source code and save it in a file.
2. A compiler translates the source code into machine language that can be executed by the
CPU.
3. A linker is used to integrate all the functions in your program, including externally ac-
cessed libraries of functions, and to determine where each component will be loaded into
memory when the program is executed.
4. A loader is used to load the machine code version of the program into memory where the
CPU can execute it.
5. A debugger is used to help the programmer locate errors that may have crept into the the
program. (Yes, none of us is perfect!)
You enter your source code in the text editor part, click on a “build” button to compile and link
your program, then click on a “run” button to load and execute the program. There is typically
a “debug” button that loads and executes the program under control of the debugger program
if you need to debug it. The individual steps of program preparation are obscured by the IDE
user interface. In this book we use the GNU programming environment in which each step is
performed explicitly.
Several excellent text editors exist for GNU/Linux, each with its own “personality.” My “fa-
vorite” changes from time to time. I recommend trying several that are available to you and
deciding which one you prefer. You should avoid using a word processor to create source files
132
7.2. PROGRAM ORGANIZATION 133
because it will add formatting to the text (unless you explicitly specify text-only). Text editors I
have used include:
• vi is supposed to be installed on all Linux (and Unix) systems. It provides a command line
user interface that is mode oriented. Text is manipulated through keyboard commands.
Several commands place vi in “text insert” mode. The ’esc’ key is used to return to com-
mand mode. Most installations include vim (Vi IMproved) which has additional features
helpful in editing program source code.
• emacs also has a command line user interface. Text is inserted directly. The ’ctrl’ and
“meta” keys are used to specify keyboard sequences for manipulating text.
GUI interfaces are available for both vi and emacs. Any of these, and many other, text editors
would be an excellent choice for the programming covered in this book. Don’t spend too much
time trying to pick the “best” one.
The GNU programming tools are executed from the command line instead of a graphical
user interface (GUI). (IDEs for Linux and Unix are typically GUI frontends that execute GNU
programming tools behind the scenes.) The GNU compiler, gcc, creates an executable program
by performing several distinct steps [22]. The description here assumes a single C source file,
filename.c.
1. Preprocessing. This resolves compiler directives such as #include (file inclusion), #define
(macro definition), and #if (conditional compilation) by invoking the program cpp. Com-
pilation can be stopped at the end of the preprocessing phase with the -E option, which
writes the resulting C source code to standard out.
2. Compilation itself. The source code that results from preprocessing is translated into as-
sembly language. Compilation can be stopped at the end of the compilation phase with the
-S option, which writes the assembly language source code to filename.s.
3. Assembly. The assembly language source code that results from compilation is translated
into machine code by invoking the as program. Compilation can be stopped at the end of
the assembly phase with the -c option, which writes the machine code to filename.o.
4. Linking. The machine code that results from assembly is linked with other machine code
from standard C libraries and other machine code modules, and addresses are resolved.
This is accomplished by invoking the ld program. The default is to write the executable
file, a.out. A different executable file name can be specified with the -o option.
I recommend that you create a separate directory for each program you write. Place all the source
files, plus the Makefile (see Appendix B) for the program in this directory. This will help you keep
your program files organized.
7 int main(void)
8 {
9 return 0;
10 }
This creates the file doNothingProg1.s (see Listing 7.2), which contains the assembly language
generated by the gcc compiler. The two compiler options used here have the following meanings:
-S Causes the compiler to create the .s file, which contains the assembly language equivalent
of the source code. The machine code (.o file) is not created.
-O0 Do not do any optimization. For instructional purposes, we want to see every step of the
assembly language. (This is upper-case “oh” followed by the numeral zero.)
1 .file "doNothingProg1.c"
2 .text
3 .globl main
4 .type main, @function
5 main:
6 .LFB2:
7 pushq %rbp
8 .LCFI0:
9 movq %rsp, %rbp
10 .LCFI1:
11 movl $0, %eax
12 leave
13 ret
14 .LFE2:
15 .size main, .-main
16 .section .eh_frame,"a",@progbits
17 .Lframe1:
18 .long .LECIE1-.LSCIE1
19 .LSCIE1:
20 .long 0x0
21 .byte 0x1
22 .string "zR"
23 .uleb128 0x1
24 .sleb128 -8
25 .byte 0x10
26 .uleb128 0x1
27 .byte 0x3
28 .byte 0xc
29 .uleb128 0x7
30 .uleb128 0x8
31 .byte 0x90
32 .uleb128 0x1
33 .align 8
34 .LECIE1:
35 .LSFDE1:
36 .long .LEFDE1-.LASFDE1
7.2. PROGRAM ORGANIZATION 135
37 .LASFDE1:
38 .long .LASFDE1-.Lframe1
39 .long .LFB2
40 .long .LFE2-.LFB2
41 .uleb128 0x0
42 .byte 0x4
43 .long .LCFI0-.LFB2
44 .byte 0xe
45 .uleb128 0x10
46 .byte 0x86
47 .uleb128 0x2
48 .byte 0x4
49 .long .LCFI1-.LCFI0
50 .byte 0xd
51 .uleb128 0x6
52 .align 8
53 .LEFDE1:
54 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
55 .section .note.GNU-stack,"",@progbits
Listing 7.2: A “null” program (gcc assembly language). Much of the code the compiler generates
(lines 16 – 53) is meant to improve the efficiency of the program or for debugging
and is not relevant to the concepts discussed in this book.
Unlike the relationship between assembly language and machine language, there is not a one-to-one
relationship between higher-level languages and assembly language. The assembly language gener-
ated by a compiler may differ across different releases of the compiler, and different optimization levels
will generally affect the code that is generated by the compiler. The code in Listing 7.2 was generated
by release 4.2.3 of gcc and the optimization level was -O0 (no optimization). If you attempt to replicate
this example, your results may vary.
This is not easy to read, even for an experienced assembly language programmer. So we will
start with the program in Listing 7.3, which was written in assembly language by a program-
mer (rather than by a compiler). Naturally, the programmer has added comments to improve
readability.
1 # doNothingProg2.s
2 # Minimum components of a C program, in assembly language.
3 # Bob Plantz - 6 June 2009
4
5 .text
6 .globl main
7 .type main, @function
8 main: pushq %rbp # save caller’s frame pointer
9 movq %rsp, %rbp # establish our frame pointer
10
After examining what the assembly language programmer did we will return to Listing 7.2 and
look at the assembly language generated by the compiler.
Assembly language provides of a set of mnemonics that have a one-to-one correspondence to
the machine language. A mnemonic is a short, English-like group of characters that suggests
the action of the instruction. For example, “mov” is used to represent the instruction that copies
(“moves”) a value from one place to another. Thus, the machine instruction
136 CHAPTER 7. PROGRAMMING IN ASSEMBLY LANGUAGE
4889E5
copies the entire 64-bit value in the rsp register to the rbp register. Even if you have never seen
assembly language before, the mnemonic representation of this instruction in Listing 7.2,
9 movq %rsp, %rbp # establish our frame pointer
probably makes much more sense to you than the machine code. (The ‘q’ suffix on “mov” means
a quadword (64 bits) is being moved.)
Strictly speaking, the mnemonics are completely arbitrary, as long as you have an assembler pro-
gram that will translate them into the desired machine instructions. However, most assembler
programs more or less use the mnemonics used in the manuals provided by CPU vendors.
The first thing to notice is that assembly language is line-oriented. That is, there is only one
assembly language statement on each line, and none of the statements spans more than one line.
A statement can continue onto subsequent lines, but this requires a special line-continuation
character. This differs from the “free form” nature of C/C++ where the line structure is irrel-
evant. In fact, good C/C++ programmers take advantage of this to improve the readability of
their code.
Next, notice that the pattern of each line falls into one of three categories:
• Lines 1 – 3 begin with the “#” character. The rest of the line is written in English and is
easily read. The “#” character in the first column designates a comment line. Just as with
a high-level language, comments are intended solely for the human reader and have no
effect on the program.
• Lines 4 and 10 have been left blank in order to improve readability. (Well, they will improve
readability once you learn how to read assembly language.)
• The remaining nine lines are organized into columns. They probably do not make much
sense to you at this point because they are written in assembly language, but if you look
carefully, each of the assembly language lines is organized into four possible fields:
label: operation operand(s) #comment
The assembler requires at least one space or tab character to separate the fields. When
Use the tab key
for readability. writing assembly language, your program will be much easier to read if you use the tab
key to move from one field to the next.
1. The label field allows us to give a symbolic name to any line in the program. Since each
line corresponds to a memory location in the program, other parts of the program can then
refer to the memory location by name.
(a) A label consists of an identifier immediately followed by the “:” character. You, as the
programmer, must make up these identifiers. The rules for creating an identifier are
given below.
(b) Notice that most lines are not labeled.
2. The operation field provides the basic purpose of the line. There are two types of opera-
tions:
(a) assembly language mnemonic — The assembler translates these into actual machine
instructions, which are copied into memory when the program is to be executed. Each
machine instruction will occupy from one to five bytes of memory.
Assembler (b) assembler directive(pseudo op) — Each of these operations begins with the period
directives are
(“.”) character. They are used to direct the way in which the assembler translates the
called Pseudo
Ops in info as. file. They do not translate directly into machine instructions, although some do cause
Read about memory to be allocated.
Pseudo Ops.
7.2. PROGRAM ORGANIZATION 137
3. The operand field specifies the arguments to be used by the operation. The arguments are
specified in several different ways:
Different operations require differing numbers of operands — zero, one, two, or three.
4. The comment field is just like a comment line, except it takes up only the remainder of
the line. Since assembly language is not as easy to read as higher-level languages, good
programmers will place a comment on almost every line.
The rules for creating an identifier are very similar to those for C/C++. Each identifier
consists of a sequence of alphanumeric characters and may include other printable characters
such as “.”, “_”, and “$”. The first character must not be a numeral. An identifier may be
any length, and all characters are significant. Case is also significant. For example, “myLabel”
Identifiers are
and “MyLabel” are different. Compiler-generated labels begin with the “.” character, and many called symbols in
system related names begin with the “_” character. It is a good idea to avoid beginning your info as. Read
own labels with the “.” or the “_” character so that you do not inadvertently create one that is about symbol
names.
already in use by the system.
Integers can be used as labels, but they have a special meaning. They are used as local labels, which
are sometimes useful in advanced assembly language programming techniques. They will not be
used in this book.
The assembler program, as, will translate the file doNothingProg2.s (see Listing 7.3) into
machine code and provide the memory allocation information for the operating system to use
when the program is executed. We will first describe the contents of this file, then look at the
GNU commands to convert it into an executable program.
Now we turn attention to the specific file in Listing 7.3, doNothingProg2.s. On line 5 you
recognize
5 .text
as an assembler directive because it starts with a period character. It directs the assembler to
place whatever follows in the text section.
What does “text section” mean? When a source code file is translated into machine code, an
object file is produced. The object file organization follows the Executable and Linking Format
(ELF). ELF files can be seen from two different points of view. Programs that store information
in ELF files store it in sections. The ELF standard specifies many different types of sections,
each depending on the type of information stored in it.
The .text directive specifies that when the following assembly language statements are
translated into machine instructions, they should stored in a text section in the object file. Text
sections are used to store program instructions in machine code format.
GNU/Linux divides memory into different segments for specific purposes when a program is
loaded from the disk. The four general categories are:
Memory
segments.
• text (also called code) is where program instructions and constant data are stored. It is
read-only memory. The operating system prevents a program from changing anything
stored in the text segment.
• data is where global variables and static local variables are stored. It is read-write memory
and remains in place for the duration of the program.
• stack is where automatic local variables and the data that links functions are stored. It is
read-write memory that is allocated and deallocated dynamically as the program executes.
• heap is the pool of memory available when a C program calls the malloc function (or C++
calls new). It is read-write memory that is allocated and deallocated by the program.
138 CHAPTER 7. PROGRAMMING IN ASSEMBLY LANGUAGE
The operating system needs to view an ELF file as a set of segments. One of the functions
of the ld program is to group sections together into segments so that they can be loaded into
memory. Each segment contains one or more sections. This grouping is generally accomplished
by arrays of pointers to the file, not necessarily by physically moving the sections. That is, there
is still a section view of the ELF file remaining. So the information stored in an ELF file is
grouped into sections, but it may or may not also be grouped into segments.
All ELF files
have sections, When the operating system loads the program into memory, it uses the segment view of the
but only some ELF file. Thus the contents of all the text sections will be loaded into the text segment of the
have segments. program process.
This has been a very simplistic overview of ELF sections and segments. We will touch on the
subject again briefly in Section 8.1. Further details can be found by reading the man page for elf
and sources like [13] and [21]. The readelf program is also useful for learning about ELF files.
It is included in the binutils collection of the GNU binary tools so is installed along with as and
ld.
The assembler directive on line 6
6 .globl main
has one operand, the identifier “main.” As you know, all C/C++ programs start with the function
named “main.” In this book, we also start our assembly language programs with a main function
and execute them within the C/C++ runtime environment. The .globl directive makes the name
globally known, analogous to defining an identifier outside a function body in C/C++.1 That is,
code outside this file can refer to this name. When a program is executed, the operating system
does some preliminary set up of system resources. It then starts program execution by calling a
function named “main,” so the name must be global in scope.
One can write stand-alone assembly language programs. In GNU/Linux this is accomplished by
using the __start label on the first instruction in the program. The object (.o) files are then linked
using the ld command directly rather than use gcc. See Section 8.5.
has two operands, a name and a type. The name is entered into the symbol table (see Section
7.3). In addition to the machine code, the object file contains the symbol table along with infor-
mation about each symbol. The ELF format recognizes two types of symbols, data and function.
The .type directive is used here to specify that the symbol main is the name of a function.
None of these three directives get translated into actual machine instructions, and none
of them occupy any memory in the finished program. Rather, they are used to describe the
characteristics of the statements that follow.
What follows next in Listing 7.3 are the actual assembly language instructions. They will
occupy memory when they are translated. The first instruction is on line 8:
8 main: pushq %rbp # save caller’s frame pointer
1 Function names are defined outside the function body (outside the {. . .} block) in C/C++. Hence, the names are
global, and a function can call functions defined in other files. Variables can also be declared outside functions. Functions
in other files can reference such variables using the extern storage class specifier.
7.2. PROGRAM ORGANIZATION 139
2. The operation is a pushq instruction, which stands for “push quadword.” It “pushes” a
value onto the call stack. This will be explained in Section 8.2 (page 158). For now, this is
a technique for temporarily saving the value stored in the operand.
The “quadword” part of this instruction means that 64 bits are moved. As you will see in
more detail later, as requires that a single letter be appended to most instructions:
“b” ⇒ “byte” ⇒ operand is 8 bits
“w” ⇒ “word” ⇒ operand is 16 bits
“l” ⇒ “long” ⇒ operand is 32 bits
“q” ⇒ “quadword” ⇒ operand is 64 bits
to specify the size of the operand(s).
3. There is one operand, %rbp. The GNU assembler requires the “%” prefix on the operand to
indicate that this is the name of a register in the cpu. This instruction saves the 64-bit
value in the rbp register on the call stack.
The value in the rbp register is an address. In 64-bit mode addresses can be 64 bits long, Addresses can be
and we have to save the entire address. 64 bits.
4. Finally, we have added a comment to this line. The comment shows that the purpose of
this instruction is to save the value that the calling function was using as a frame pointer.
(The reasons for doing this will be explained in Chapter 8.)
3. There are two operands, %rsp and %rbp. Again, the “%” prefix to each operand means that
it is the name of a register in the cpu.
4. Finally, I have added a comment to this line. The comment shows that the purpose of this
instruction is to establish a new frame pointer in this function. (Again, the reasons for
doing this will be explained in Chapter 8.)
As the name of this “program” implies, it does not do anything, but it still must return to
the operating system. GNU/Linux expects the main function to return an integer to it, and the Function return
return value is placed in the eax register. Zero means that the program executed with no errors. value goes in eax
This may not make a lot of sense to you at this point, but it should become clearer later in the register.
book. Returning the integer zero to the operating system is accomplished on line 12:
140 CHAPTER 7. PROGRAMMING IN ASSEMBLY LANGUAGE
1. This line also has no label. After indenting, it begins with a movl instruction.
2. The first operand is prefixed with a “$” character, which indicates that the operand is to
be taken as a literal value. That is, the source operand is the integer zero. You recognize
that the second operand is the eax register in the cpu. This instruction places a copy of the
32-bit integer zero in the eax register.
Even though the CPU is in 64-bit mode, 64-bit integers are seldom needed. So the default
behavior of environment is to use 32 bits for ints. 64-bit ints can be specified in C/C++
ints are 32 bits.
with either the long or the long long modifier. In assembly language the programmer
would use quadwords for integers. (As pointed out on page 118 this instruction also zeros
the high-order 32 bits of the rax register. But you should not write code that depends upon
this behavior.)
3. The comment on this line shows that the purpose of this instruction is to return a zero to
the calling function (the operating system).
The first two instructions in this function,
8 main: pushq %rbp # save caller’s frame pointer
9 movq %rsp, %rbp # establish our frame pointer
form a prologue to the actual processing that is performed by the function. They changed some
values in registers and used the call stack. Before returning to the operating system, it is essen-
tial that an epilogue be executed to restore the values. The compiler uses the leave instruction
(see Listing 7.2) to accomplish this. The leave instruction is equivalent to the following two
instructions:
12 movq %rbp, %rsp # restore stack pointer
13 popq %rbp # restore caller’s frame pointer
1. No labels are used on these lines. The movq instruction ensures that the stack pointer
is moved back to the location where the rbp register was saved. Since the stack pointer
was not used in this function, this instruction is not necessary here. But your program
will crash if the stack pointer is not in the correct location when the next instruction is
executed, so it is a good idea to get into the habit of always using both these instructions
at the end of a function.
2. The popq instruction copies the 64-bit value on the top of the call stack into the operand
and moves the stack pointer accordingly. (You will learn about using the stack pointer in
Section 8.2.) The operand in this case is the rbp register.
3. The comment states that the reason for the popq instruction is to restore the frame pointer
value for the calling function (the operating system since this is main).
4. Although the leave instruction is slightly more efficient, we will use the movq and popq
instructions in this book to emphasize the two operations that must be performed.
Finally, this function must return to the function that called it, which is back in the operating
system.
14 ret # back to caller
1. This line has no label. And the instruction does not specify any operands. This is the
instruction for returning program control back to the function that called this one. In
this particular case, since this is the main function, control is passed back to the operating
system.
2. Here is an example of an instruction that changes the value in the instruction pointer reg-
ister (rip) in order to alter the linear flow of the program. We will see later the mechanism
that is used to implement this.
3. The comment on this line briefly describes the reason for the instruction.
7.2. PROGRAM ORGANIZATION 141
As you can see from this example, even a function that does nothing requires several instruc-
tions. The most commonly used assembly language instruction is
In the Intel syntax, the size of the data is determined by the operand, so the size character
(b, w, l, or q) is not appended to the instruction, and the order of the operands is reversed:
Intel®
Syntax mov destination, source
The mov instruction copies the bit pattern from the source operand to the destination operand.
The bit pattern of the source operand is not changed. If the destination operand is a register
and its size is less than 64 bits, the effect on the other bits in the register is shown in Table 7.1.
Table 7.1: Effect on other bits in a register when less than 64 bits are changed.
The mov instruction does not affect the rflags register. In particular, neither the CF nor
the OF flags are affected. No more than one of the two operands may be a memory location.
Thus, in order to move a value from one memory location to another, it must be moved from the
first memory location into a register, then from that register into the second memory location. You have to use a
register to move
(Accessing data in memory will be covered in Sections 8.1 and 8.3.) data.
The other instructions used in this “do nothing” program — pushq, popq, and ret — use the
call stack. The call stack will be discussed in Section 8.2, which will then allow us to discuss
these instructions. For now, you should memorize how to use them as “boilerplate” for the
prologue and epilogue of each function.
If you have any experience with x86 assembly language, the syntax used by the GNU assembler,
as, will look a little strange to you. In principle, the syntax is arbitrary. A programmer could
invent any sort of assembly language and write a program that would translate it into the
appropriate machine code. But most cpu manufacturers publish a manual with a suggested
assembly language syntax for their cpu.
Most assemblers for the x86 cpus follow the syntax suggested by Intel®, but as uses the
AT&T syntax. It is not radically different from Intel’s. Some of the more striking differences
are:
142 CHAPTER 7. PROGRAMMING IN ASSEMBLY LANGUAGE
AT&T Intel®
operand order: source, destination destination, source
Pay attention to
register names: prefixed with the “%” char- just the name, e.g., eax
operand order.
acter, e.g., %eax
literal values: prefixed with the “$” char- just the value, e.g., 123
acter, e.g., $123
operand size: use the b, w, l, or q suffix determined by the register
on opcode to denote byte, specification (more compli-
word, long, or quadruple cated if operand is stored
word in memory)
The GNU assembler, as, does not require the size suffix on instructions in all cases. From the info
documentation for as:
It is recommended that you get in the habit of using the size suffix letters when you begin writing
your own assembly language. This will help you to avoid introducing obscure bugs in your code.
The assembler directives are typically not specified by the cpu manufacturer, so you will see
a much wider variety of syntax, depending on the particular assembler program. We will not
try to list any differences here.
The GNU assembler, as, also supports the Intel® syntax. The assembler directive .intel_syntax
says that following assembly language is written in the Intel® syntax; .att_syntax says it is
written in AT&T syntax. Using Intel® syntax, the assembly language code in Listing 7.3 would
be written
Intel®
Syntax mov eax, 0
mov rsp, rbp
pop rbp
ret
Keep in mind that gcc produces assembly language in AT&T syntax, so you will undoubtedly
find it easier to use that when you write your own code. The .intel_syntax directive might be
useful if somebody gives you and entire function written in Intel® syntax assembly language.
The syntax rules for our particular assembler, as, are described in an on-line manual that is
in the GNU info format. as supports some two dozen computer architectures, so it is a challenge
to wade through the info manual to find what you need. On the other hand, it provides the
most up to date information. And it is especially important for learning how to use assembler
directives because they are specific to the assembler.
Now would be a good time to start learning how to use info for as. As you encounter new
assembly language concepts in this book, also look them up in info for as. If you are unfamiliar
with info, at the GNU/Linux prompt, simply type
$ info info
7.2. PROGRAM ORGANIZATION 143
identifies the name of the C source file. When you write in assembly language this information
clearly does not apply.
The five lines
5 main:
6 .LFB2:
7 pushq %rbp
8 .LCFI0:
9 movq %rsp, %rbp
set up the call stack for this function. The use of the call stack will be explained in more detail
in Section 8.2 on page 158 and in subsequent Sections.
The additional labels generated by the compiler, .LFB2 and .LCF10 are used for entries in the
unwind table, which is briefly described below. Our programs will not include unwind tables, so
we will not need such labels.
Notice that the lines after the two labels main, and .LFB2 are blank. The assembler does not
generate any machine code for either of these two lines, so they do not take up any memory. The
next thing that comes in memory is the
7 pushq %rbp
instruction. Thus, both labels apply to the address where this instruction is located.
The instruction
12 leave
in the assembly language written by a programmer (Figure 7.3). We use the two individual
instructions because they explicitly show the operations that must be performed at the end of
each function. They undo the set up of the call stack that took place at the very beginning of the
function. (The goal of this book is to show what the computer is doing.)
Lines 16 – 53 make up what is called an unwind table. The -fasynchronous-unwind-tables
option causes the compiler to generate an unwind table in dwarf2 format for the function. In my
version of the compiler, the default is to generate the table in 64-bit mode and not generate it in
32-bit mode. This may vary depending on different versions of the compiler. We will not use the
table so will use the -fno-asynchronous-unwind-tables option to turn off the feature, as shown
in Listing 7.4. The GNU/Linux command is:
1 .file "doNothingProg1.c"
2 .text
3 .globl main
4 .type main, @function
5 main:
6 pushq %rbp
7 movq %rsp, %rbp
8 movl $0, %eax
9 leave
10 ret
11 .size main, .-main
12 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
13 .section .note.GNU-stack,"",@progbits
Listing 7.4: A “null” program (gcc assembly language). We have used the
-fno-asynchronous-unwind-tables compiler option to remove the exception
handler frame.
Lines 15, 54, and 55 in Listing 7.2 are the same as lines 11 – 13 in Listing 7.4. They also use
directives that do not apply to the programs we will be writing in this book.
Finally, you may have noticed that the main label is on a line by itself in Listing 7.2 but not
in Listing 7.3. When there is only a label on a line, no machine instructions are generated, and
no memory is allocated. Thus, the label really applies to the next line. It is common to place
labels on their own line so that longer, easier to read labels can be used while still keeping the
operations visually lined up in a column. This technique is illustrated in Listing 7.5.
1 # doNothingProg3.s
2 # The minimum components of a C program, written in assembly
3 # language. Same as doNothingProg2.s, except with the main
4 # label on its own line.
5 # Bob Plantz - 7 June 2009
6
7 .text
8 .globl main
9 .type main, @function
10 main:
11 pushq %rbp # save caller’s frame pointer
12 movq %rsp, %rbp # establish our frame pointer
13
Listing 7.5: The “null” program rewritten to show a label placed on its own line.
4 1 .file "doNothingProg1.c"
5 9 .Ltext0:
6 10 .globl main
7 12 main:
8 13 .LFB0:
9 14 .file 1 "doNothingProg1.c"
10 1:doNothingProg1.c **** /*
11 2:doNothingProg1.c **** * doNothingProg1.c
12 3:doNothingProg1.c **** * The minimum components of a C program.
13 4:doNothingProg1.c **** * Bob Plantz - 6 June 2009
14 5:doNothingProg1.c **** */
15 6:doNothingProg1.c ****
16 7:doNothingProg1.c **** int main(void)
17 8:doNothingProg1.c **** {
18 15 .loc 1 8 0
19 16 .cfi_startproc
20 17 0000 55 pushq %rbp
21 18 .LCFI0:
22 19 .cfi_def_cfa_offset 16
23 20 0001 4889E5 movq %rsp, %rbp
24 21 .cfi_offset 6, -16
25 22 .LCFI1:
26 23 .cfi_def_cfa_register 6
27 9:doNothingProg1.c **** return 0;
28 24 .loc 1 9 0
29 25 0004 B8000000 movl $0, %eax
30 25 00
31 10:doNothingProg1.c **** }
32 26 .loc 1 10 0
33 27 0009 C9 leave
34 28 000a C3 ret
35 29 .cfi_endproc
36 30 .LFE0:
37 32 .Letext0:
40
41 DEFINED SYMBOLS
42 *ABS*:0000000000000000 doNothingProg1.c
43 /tmp/cczPwhLl.s:12 .text:0000000000000000 main
44
45 NO UNDEFINED SYMBOLS
Listing 7.6: Assembly language embedded in C source code listing. The line number in the C
source file is also indicated with the .loc assembler directive. Note that the C source
code line numbering begins with 0; this can vary with different versions of as.
The “-g” option tells the compiler to include symbols for debugging. “-Wa,” passes the imme-
diately following options to the assembly phase of the compilation process. Thus, the options
passed to the assembler are “-adhls”, which cause the assembler to generate a listing with the
following characteristics:
• -ad: omit debugging directives
146 CHAPTER 7. PROGRAMMING IN ASSEMBLY LANGUAGE
As you can see above the secondary letters can be combined with one “-a.” The “d” has the same
effect as the “-fno-asynchronous-unwind-tables” option. The listing is written to standard out,
which can be redirected to a file. We gave this file the “.lst” file extension because it cannot be
assembled.
1 .file "doNothingProg1.c"
2 .text
3 .globl main
4 .type main, @function
5 main:
6 leal 4(%esp), %ecx
7 andl $-16, %esp
8 pushl -4(%ecx)
9 pushl %ebp
10 movl %esp, %ebp
11 pushl %ecx
12 movl $0, %eax
13 popl %ecx
14 popl %ebp
15 leal -4(%ecx), %esp
16 ret
17 .size main, .-main
18 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
19 .section .note.GNU-stack,"",@progbits
The first thing to notice is that all the instructions use the “l” suffix to indicate “longword”
because addresses are 32 bits. And only the 32-bit portion of the registers is used. That is, esp
instead of rsp, etc.
The prologue in the 32-bit main function,
6 leal 4(%esp), %ecx
7 andl $-16, %esp
8 pushl -4(%ecx)
9 pushl %ebp
10 movl %esp, %ebp
11 pushl %ecx
is much more complex that the 64-bit version. This has to do with the use of 32-bit addresses
and other performance issues that are beyond the scope of this book. Similarly, the epilogue,
13 popl %ecx
14 popl %ebp
15 leal -4(%ecx), %esp
7.3. ASSEMBLERS AND LINKERS 147
6 .text
7 .globl main
8 .type main, @function
9 main:
10 pushl %ebp # save caller’s frame pointer
11 movl %esp, %ebp # establish our frame pointer
12 movl $0, %eax # return 0 to caller
13 movl %ebp, %esp # restore stack pointer
14 popl %ebp # restore caller’s frame pointer
15 ret # back to caller
7.3.1 Assemblers
An assembler must perform the following tasks:
• Translate assembly language mnemonics into machine language.
• Translate symbolic names for addresses into numeric addresses.
Since the numeric value of an address may be required before an instruction can be trans-
lated into machine language, there is a problem with forward references to memory locations.
For example, a code sequence like:
1 # if (response == ’y’)
2 cmpb $’y’, response # was it ’y’?
3 jne noChange # no, there is no change
4
17 saveEnd:
18 jmp allDone # skip over false block
19
instruction on line 3. (Don’t forget that assembly language is line oriented; translation is done
one line at a time.) When this code sequence is executed, the immediately previous instruction
(cmpb $’y’, response) compares the byte stored at location response with the character ‘y’.
If they are not equal, i.e., a ‘y’ is not stored at location response, the jne instruction causes
program flow to jump to location noChange. In order to accomplish this action, the translation
of this instruction (the machine code) must include a numerical value that specifies how far to
jump. That is, it must include the distance, in number of bytes, between the jne instruction
and the memory location labeled noChange on line 23. In order to compute this distance, the
assembler must determine the address that corresponds to the label noChange when it translates
this instruction, but the assembler has not even encountered the noChange label, much less
determined its corresponding address.
The simplest solution is to use a two-pass assembler:
1. The first pass builds a symbol table, which provides an address for each memory label.
2. The second pass performs the actual translation into machine language, consulting the
symbol table for numeric values of the symbols.
Algorithm 7.1 is a highly simplified description of how the first pass of an assembler works.
The symbol table is carried from the first pass to the second. The second pass also consults
a table of operation codes, which provides the machine code corresponding to each instruction
7.3. ASSEMBLERS AND LINKERS 149
mnemonic. A highly simplified description of the second pass is given in Algorithm 7.2.
Algorithm 7.2: Second pass of a two-pass assembler.
given: SymbolT able from Pass One
given: Op − CodeT able
Data: LocationCounter
1 LocationCounter ⇐ 0;
2 get first line of source code;
3 while more lines do
4 if line is instruction then
5 find machine code from Op-Code Table;
6 find symbol value from SymbolTable;
7 assemble instruction into machine code;
8 else
9 carry out directive;
10 write machine code to object file;
11 determine number of bytes used;
12 LocationCounter ⇐ LocationCounter + number of bytes;
13 get next line of source code;
7.3.2 Linkers
Look again at the code sequence above. On line 14 there is the instruction:
call write
This call to the write function is a reference to a memory label outside the file being assembled.
Thus, the assembler has no way to determine the address of write for the symbol table during
the first pass. The only thing the assembler can do during the second pass is to leave enough
memory space for the address of write when it assembles this instruction. The actual address
will have to be filled in later in order to create the entire program. Filling in these references to
external memory locations is the job of the linker program.
The algorithm for linking functions together is very similar to that of the assembler. The
same forward reference problem exists. Again, the simplest solution is to use a two-pass linker
program.
The highly simplified algorithm in Algorithms Algorithms 7.3 and 7.4 also provide for loading
the entire program into memory. The functions are linked together as they are loaded. In
practice, this is seldom done. For example, the GNU linker, ld, does not load the program into
memory. Instead, it creates another machine language file — the executable program. The
executable program file contains all the functions of the program with all the cross-function
memory references resolved. Thus ld is a link editor program.
Getting even more realistic, many of the functions used by a program are not even included
in the executable program file. They are loaded as required when the program is executing. The
link editor program must provide dynamic links for the executable program file.
However, you can get the general idea of linking separately assembled (or compiled) functions
together by studying the algorithms in Algorithms 7.3 and 7.4. In particular, notice that the
assembler (or compiler) must include other information in addition to machine code in the object
file. The additional information includes:
3. The location relative to the beginning of the function where the external memory reference
is made.
150 CHAPTER 7. PROGRAMMING IN ASSEMBLY LANGUAGE
I create a directory named “CS252plantz01.0.” All the files that you create for each
program should be kept in a separate directory only for that program.
bob$ cd CS252plantz01.0/
bob$ ls
bob$ pwd /home/bob/CS252/CS252plantz01.0
These two commands show that the new subdirectory is empty and where my current
working directory is located within the file hierarchy.
This is where I used emacs to enter the program from Listing 7.5.
bob$ ls
doNothingProg.s
bob$ as -gstabs -o doNothingProg.o doNothingProg.o
bob$ ls
doNothingProg.o doNothingProg.s
bob$ gcc -o doNothingProg doNothingProg.s
bob$ ls
doNothingProg doNothingProg.o doNothingProg.s
bob$ ./doNothingProg
bob$
This starts up the emacs program and creates a new file named “doNothingProg.s.” You
may use any text editor. I am now ready to use the emacs editor to enter my program.
emacs is an extremely powerful and versatile editor. We could easily spend the rest
of the book simply learning about emacs, but the following very small subset of emacs
commands will be enough to get you started. These are all keyboard commands, which
will allow you to use emacs from a remote system that does not support X-window.
• To enter text, simply type.
• Use the arrow keys to move around in existing text.
• The “Backspace” or the “Delete” key will delete the character immediately to the
left of the cursor.
• Typing ctrl-x then ctrl-s will save your current work, writing over the previous
contents in the file.
• Typing ctrl-x then ctrl-c will exit emacs giving you the option of first saving
unsaved changes.
• If you wish to learn more about emacs, ctrl-h will start the emacs tutorial.
bob$ ls
doNothingProg.s
On the first line, I invoke the assembler, as. The –gstabs option directs the assembler
to include debugging information with the output file. We will very definitely make
use of the debugger! The -o option is followed by the name of the output (object) file.
You should always use the same name as the source file, but with the .o extension. The
second command simply shows the new file that has been created in my directory.
bob$ gcc doNothingProg.o -o doNothingProg
bob$ ls
doNothingProg doNothingProg.o doNothingProg.s
152 CHAPTER 7. PROGRAMMING IN ASSEMBLY LANGUAGE
Next I link the object file. Even though there is only one object file, this step is required
in order to bring in the GNU/Linux libraries needed to create an executable program.
As in as, the -o option is used to specify the name of a file. In the linking case, this
will be the name of the final product of our efforts.
Note: The linker program is actually ld. The problem with using it directly, for ex-
ample,
ld doNothingProg.o -o doNothingProg *** DOES NOT WORK ***
is that you must also explicitly specify all the libraries that are used. By using gcc for
the linking, the appropriate libraries are automatically included in the linking.
bob$ ./doNothingProg
bob$
7.5.1 Instructions
data movement:
opcode source destination action see page:
movs $imm/%reg %reg/mem move 141
popw %reg/mem pop from stack 163
pushw $imm/%reg/mem push onto stack 163
s = b, w, l, q; w = l, q
arithmetic/logic:
opcode source destination action see page:
cmps $imm/%reg %reg/mem compare 209
incs %reg/mem increment 220
s = b, w, l, q
program flow control:
opcode location action see page:
call label call function 156
je label jump equal 211
jmp label jump 213
jne label jump not equal 211
ret return from function 168
7.6 Exercises
The functions you are asked to write in these exercises are not complete programs. You can
check that you have written a valid function by writing a main function in C that calls the
function you have written in assembly language. Compile the main function with the -c option
so that you get the corresponding object (.o) file. Assemble your assembly language file. Make
7.6. EXERCISES 153
sure that you specify the debugging options when compiling/assembling. Use the linking phase
of gcc to link the .o files together. Run your program under gdb and set a breakpoint in your
assembly language function. (Hint: you can specify the source file name in gdb commands.) Now
you can verify that your assembly language function is being called. If the function returns a
value, you can print that value in the main function using printf.
7-1 (§7.2) Write the C function:
/* f.c */
int f(void) {
return 0;
}
in assembly language. Make sure that it assembles with no errors. Use the -S option to
compile f.c and compare gcc’s assembly language with yours.
7-2 (§7.2) Write the C function:
/* g.c */
void g(void) {
}
in assembly language. Make sure that it assembles with no errors. Use the -S option to
compile g.c and compare gcc’s assembly language with yours.
7-3 (§7.2) Write the C function:
/* h.c */
int h(void) {
return 123;
}
in assembly language. Make sure that it assembles with no errors. Use the -S option to
compile h.c and compare gcc’s assembly language with yours.
7-4 (§7.2) Write three assembly language functions that do nothing but return an integer.
They should each return different, non-zero, integers. Write a C main function to test your
assembly language functions. The main function should capture each of the return values
and display them using printf.
7-5 (§7.2) Write three assembly language functions that do nothing but return a character.
They should each return different characters. Write a C main function to test your assembly
language functions. The main function should capture each of the return values and display
them using printf.
7-6 (§7.2, §6.5) Write an assembly language function that returns four characters. The return
value is always in the eax register in our environment, so you can store four characters
in it. The easiest way to do this is to determine the hexadecimal value for each character,
then combine them so you can store one 32-bit hexadecimal value in eax.
Write a C main function to test your assembly language function. The main function should
capture the return values and display them using the write system call.
Explain the order in which they are displayed.
Chapter 8
1. Read data from an input device, such as the keyboard, a disk file, the internet, etc., into
main memory.
5. Write the results to an output device, such as the screen, a disk file, audio speakers, etc.
In this chapter you will learn how to call functions that can read input from the keyboard,
allocate memory for storing data, and write output to the screen.
1. STDOUT_FILENO is the file descriptor of standard out, normally the screen. This symbolic
name is defined in the unistd.h header file.
2. Although the C syntax allows a programmer to place the text string here, only its address
is passed to write, not the entire string.
3. The programmer has counted the number of characters in the text string to write to
STDOUT_FILENO.
1 /*
2 * helloWorld2.c
3 *
4 * "hello world" program using the write() system call.
5 * Bob Plantz - 8 June 2009
6 */
7 #include <unistd.h>
8
154
8.1. CALLING WRITE IN 64-BIT MODE 155
9 int main(void)
10 {
11
14 return 0;
15 }
Listing 8.1: “Hello world” program using the write system call function (C).
This program uses only constant data — the text string “Hello world.” Constant data used by a
program is part of the program itself and is not changed by the program.
Looking at the compiler-generated assembly language in Listing 8.2, the constant data ap-
pears on line 4, as indicated by the comment added on that line. Comments have also been
added on lines 11 – 14 to explain the argument set up for the call to write.
1 .file "helloWorld2.c"
2 .section .rodata
3 .LC0:
4 .string "Hello world.\n" # constant data
5 .text
6 .globl main
7 .type main, @function
8 main:
9 pushq %rbp
10 movq %rsp, %rbp
11 movl $13, %edx # third argument
12 movl $.LC0, %esi # second argument
13 movl $1, %edi # first argument
14 call write
15 movl $0, %eax
16 leave
17 ret
18 .size main, .-main
19 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
20 .section .note.GNU-stack,"",@progbits
Listing 8.2: “Hello world” program using the write system call function (gcc assembly lan-
guage).
Data can only be located in one of two places in a computer:
• in memory, or
• in a CPU register.
(We are ignoring the case of reading from an input device or writing to an output device here.)
Recall from the discussion of memory segments on page 137 that the Linux kernel uses different
memory segments for the various parts of a program. The directive on line 2,
2 .section .rodata
uses the .section assembler directive to direct the assembler to store the data that follows in a
“read-only data” section in the object file. Even though it begins with a ‘.’ character .rodata is
not an assembler directive but the name of a section in an ELF file.
Your first thought is probably that the .rodata section should be loaded into a data segment
in memory, but recall that data memory segments are read/write. Thus .rodata sections are
mapped into a text segment, which is a read-only memory segment.
The .string directive on line 4,
3 .LC0:
4 .string "Hello world.\n" # constant data
156 CHAPTER 8. PROGRAM DATA – INPUT, STORE, OUTPUT
allocates enough bytes in memory to hold each of the characters in the text string, plus one for
the NUL character at the end. The first byte contains the ASCII code for the character ’H’,
the second the ASCII code for ’e’, etc. Notice that the last character in this string is ’\n’,
the newline character; it occupies only one byte of memory. So fourteen bytes of memory are
allocated in the .rodata section in this program, and each byte is set to the corresponding ASCII
code for each character in the text string. The label on line 3 provides a symbolic name for the
beginning address of the text string so that the program can refer to this memory location.
The most common directives for allocating memory for data are shown in Table 8.1. If these
Table 8.1: Common assembler directives for allocating memory. The label is optional.
are used in the .rodata section, the values can only be used as constants in the program.
The assembly language instruction used to call a function is
call functionName
where functionName is the name of the function being called. The call instruction does two
things:
call pushes the
return address
onto the call 1. The address in the rip register is pushed onto the call stack. (The call stack is described in
stack. Section 8.2.) Recall that the rip register is incremented immediately after the instruction
is fetched. Thus, when the call instruction is executed, the value that gets pushed onto
the stack is the address of the instruction immediately following the call instruction. That
is, the return address gets pushed onto the stack in this first step.
2. The address that functionName resolves to is placed in the rip register. This is the ad-
dress of the function that is being called, so the next instruction to be fetched is the first
instruction in the called function.
Before the call is made, any arguments to a function must be stored in their proper locations,
as specified in the ABI [25]. Up to six arguments are passed in the general purpose registers.
Reading the argument list from left to right in the C code, the order of using the registers is
given in Table 8.2. If there are more than six arguments, the additional ones are pushed onto
the call stack, but in right-to-left order. This will be described in Section 11.2.
Each of the three arguments to write in this program — the file descriptor, the address of
the text string, and the number of bytes in the text string — is also a constant whose value is
known when the program is first loaded into memory and is not changed by the program. The
locations of these constants on lines 11 – 13,
8.1. CALLING WRITE IN 64-BIT MODE 157
Argument Register
first rdi
second rsi
third rdx
fourth rcx
fifth r8
sixth r9
are not as obvious. The location of the data that an instruction operates on must be specified
in the instruction and its operands. The manner in which the instruction uses an operand to
locate the data is called the addressing mode. Assembly language includes a syntax that the
programmer uses to specify the addressing mode for each operand. When the assembler trans-
lates the assembly language into machine code it sets the bit pattern in the instruction to the
corresponding addressing mode for each operand. Then when the CPU decodes the instruction
during program execution it knows where to locate the data represented by that operand.
The simplest addressing mode is register direct. The syntax is to simply use the name of a
register, and the data is located in the register itself.
use the register direct addressing mode for their operands. The pushq instruction has only one
operand, and the movq has two.
Each of the instructions on lines 11 – 13 use the register direct addressing mode for the
destination, but the source operand is the data itself. So all three instructions employ the
immediate data addressing mode for the source.
Immediate data: The data value is located in memory immediately after the instruction. This
addressing mode can only be used for a source operand.
Although the register direct addressing mode can be used to specify either a source or destina-
tion operand, or both, the immediate data addressing mode is valid only for a source operand.
Let us consider the mechanism by which the control unit accesses the data in the immediate
data addressing mode. First, we should say a few words about how a control unit executes an
instruction. Although a programmer thinks of each instruction as being executed atomically, it
is actually done in discrete steps by the control unit. In addition to the registers used by a pro-
grammer, the CPU contains many registers that cannot be used directly. The control unit uses
these registers as “scratch paper” for temporary storage of intermediate values as it progresses
through the steps of executing an instruction.
158 CHAPTER 8. PROGRAM DATA – INPUT, STORE, OUTPUT
Now, recall that when the control unit fetches an instruction from memory, it automatically
increments the instruction pointer (rip) to the next memory location immediately following the
instruction it just fetched. Usually, the instruction pointer would now be pointing to the next
instruction in the program. But in the case of the immediate data addressing mode, the “$”
symbol tells the assembler to store the operand at this location.
As the control unit decodes the just fetched instruction, it detects that the immediate data
addressing mode has been used for the source operand. Since the instruction pointer is currently
pointing to the data, it is a simple matter for the control unit to fetch it. Of course, when it does
this fetch, the control unit increments the instruction pointer by the size of the data it just
fetched.
Now the control unit has the source data, so it can continue executing the instruction. And
when it has completed the current instruction, the instruction pointer is already pointing to the
next instruction in the program.
The constants in the instructions on lines 11 and 13 are obvious. (The symbolic name
“STDOUT_FILENO” is defined in unistd.h as 1.) The constant on line 12 is the label .LC0, which
resolves to the address of this memory location. As explained above, this address will be in the
.rodata section when the program is loaded into memory. The address is not known within the
.text segment when the file is first compiled. The compiler leaves space for it immediately after
the instruction (immediate addressing mode). Then when the address is determined during the
linking phase, it is plugged in to the space left for it. The net result is that the address becomes
immediate data when the program is executed.
So the following code sequence:
11 movl $13, %edx # third argument
12 movl $.LC0, %esi # second argument
13 movl $1, %edi # first argument
14 call write
• The number of bytes actually written to the screen is returned in the eax register. So if the
current function is using eax, the value will be changed by the call to write.
• The write function is a C wrapper that sets up the registers for the syscall instruction.
Unfortunately, there is no guarantee that it restores the values that were in the registers
when it was called.
• push data-item causes a the data-item to be placed on the top of the stack and moves the
stack pointer to point to this latest item.
• pop location causes the data item on the top of the stack to be removed and placed at
location and moves the stack pointer to point to the next item left on the stack.
8.2. INTRODUCTION TO THE CALL STACK 159
Notice that a stack is a “last in, first out” (LIFO) data structure. That is, the last thing to be
pushed onto the stack is the first thing to be popped off.
To illustrate the stack concept let us use our dinner plate example. Say we have three dif-
ferently colored dinner plates, a red one on the dining table, a green one on the kitchen counter,
and a blue one on the bedside table. Now we will stack them on the shelf in the following way:
1. push dining-table-plate
2. push kitchen-counter-plate
3. push bedside-table-plate
At this point, our stack looks like:
green plate
red plate
8 #include <stdio.h>
9
10 int theStack[500];
11 int *stackPointer = &theStack[500];
160 CHAPTER 8. PROGRAM DATA – INPUT, STORE, OUTPUT
12
13 /*
14 * precondition:
15 * stackPointer points to data element at top of stack
16 * postcondtion:
17 * address in stackPointer is decremented by four
18 * dataValue is stored at top of stack
19 */
20 void push(int dataValue)
21 {
22 stackPointer--;
23 *stackPointer = dataValue;
24 }
25
26 /*
27 * precondition:
28 * stackPointer points to data element at top of stack
29 * postcondtion:
30 * data element at top of stack is copied to *dataLocation
31 * address in stackPointer is incremented by four
32 */
33 void pop(int *dataLocation)
34 {
35 *dataLocation = *stackPointer;
36 stackPointer++;
37 }
38
39 int main(void)
40 {
41 int x = 12;
42 int y = 34;
43 int z = 56;
44 printf("Start with the stack pointer at %p",
45 (void *)stackPointer);
46 printf(", and x = %i, y = %i, and z = %i\n", x, y, z);
47
48 push(x);
49 push(y);
50 push(z);
51 x = 100;
52 y = 200;
53 z = 300;
54 printf("Now the stack pointer is at %p",
55 (void *)stackPointer);
56 printf(", and x = %i, y = %i, and z = %i\n", x, y, z);
57 pop(&z);
58 pop(&y);
59 pop(&x);
60
65 return 0;
66 }
theStack[0]
theStack[1]
theStack[2] ????
stackPointer
theStack[496] ????
theStack[497] ????
theStack[498] ????
theStack[499] ????
????
Figure 8.1: The stack in Listing 8.3 when it is first initialized. “????” means that the value in
the array element is undefined.
the stack appears as shown in Figure 8.2. Here you can see that since the push operation pre-
decrements the stack pointer, the first data item to be placed on the stack is stored in a valid
portion of the array.
theStack[0]
theStack[1]
theStack[2] ????
stackPointer
theStack[496] ????
theStack[497] ????
theStack[498] ????
theStack[499] 12
????
After all three data items — x, y, and z — are pushed onto the stack, it appears as shown
in Figure 8.3. The stack pointer always points to the data item that is at the top of the stack.
Notice that this stack is “growing” toward lower numbered elements in the array.
Most stacks grow
toward lower
addresses. We theStack[0]
tend to draw theStack[1]
them “upside
down.” theStack[2] ????
stackPointer
theStack[496] ????
theStack[497] 56
theStack[498] 34
theStack[499] 12
????
After changing the values in the variables, the program in Listing 8.3 restores the original
values by popping from the stack in reverse order. The state of the stack after all three pops
are shown in Figure 8.4. Even though we know that the values are still stored in the array, the
permissible stack operations — push and pop — will not allow us to access these values. Thus,
from a programming point of view, the values are gone.
theStack[0]
theStack[1]
theStack[2] ????
stackPointer
theStack[496] ????
theStack[497] 56
theStack[498] 34
theStack[499] 12
????
Figure 8.4: The stack after all three data items have been popped off. Even though the values
are still stored in the array, it is considered a programming error to access them.
The stack must be considered as “empty” when it is in this state.
Our very simple stack in this program does not protect against stack overflow or stack un-
derflow. Most software stack implementations also include operations to check for an empty
stack and for a full stack. And many implementations include an operation for looking at, but
not removing, the top element. But these are not the main features of a stack data structure, so
we will not be concerned with them here.
In GNU/Linux, as with most operating systems, the call stack has already been set up for us.
We do not need to worry about allocating the memory or initializing a stack pointer. When the
operating system transfers control to our program, the stack is ready for us to use.
The x86-64 architecture uses the rsp register for the call stack pointer. Although you could
create your own stack and stack pointer, several instructions use the rsp register implicitly. And
all these instructions cause the stack to grow from high memory addresses to low (see Exercise
8-2). Although this may seem a bit odd at first, there are some good reasons for doing it this
way.
In particular, think about how you might organize things in memory. Recall that the instruc-
tion pointer (the rip register) is automatically incremented by the control unit as your program
8.2. INTRODUCTION TO THE CALL STACK 163
is executed. Programs come in vastly different sizes, so it makes sense to store the program in-
structions at low memory addresses. This allows maximum flexibility with respect to program
size.
The stack is a dynamic structure. You do not know ahead of time how much stack space will
be required by any given program as it executes. It is impossible to know how much space to
allocate for the stack. So you would like to allocate as much space as possible, and to keep it as
far away from the programs as possible. The solution is to start the stack at the highest address
and have it grow toward lower addresses.
This is a highly simplified rationalization for implementing stacks such that they grow
“downward” in memory. The organization of various program elements in memory is much
more complex than the simple description given here. But this may help you to understand that
there are some good reasons for what may seem to be a rather odd implementation.
The assembly language push instruction is:
pushq source
1. The value in the rsp register is decremented by eight. That is, eight is subtracted from the
stack pointer.
A push changes
rsp before
2. The eight bytes of the source operand are copied into memory at the new location pointed putting value on
to by the (now decremented) stack pointer. The state of the operand is not changed. the stack.
popq destination
1. The eight bytes in the memory location pointed to by the stack pointer are copied to the
destination operand. The previous state of the operand is replaced by the value from
memory.
A pop changes
rsp after getting
2. The value in the rsp register is incremented by eight. That is, eight is added to the stack value from the
pointer. stack.
push source
Intel®
Syntax pop destination
The size of the operand, eight bytes, is determined by the operating system. When executing
in 64-bit mode, all pushes and pops operate on 64-bit values. Unlike the mov instruction, you
cannot push or pop 8-, 16-, or 32-bit values. This means that the address in the stack pointer
(rsp register) will always be an integral multiple of eight.
A good example of using a stack is saving registers within a function. Recall that there is
only one set of registers in the CPU. When one function calls another, the called function has no
way of knowing which registers are being used by the calling function. The ABI [25] specifies
that the values in registers rbx, rbp, rsp, and r12 – r15 be preserved by the called function (see
Table 6.4 on page 121).
The program in Listing 8.4 shows how to save and restore the values in these registers.
Notice that since a stack is a LIFO structure, it is necessary to pop the values off the top of the
stack in the reverse order from how they were pushed on.
1 # saveRegisters1.s
2 # The rbx and r12 - r15 registers must be preserved by called function.
3 # Sets a bit pattern in these registers, but restores original values
164 CHAPTER 8. PROGRAM DATA – INPUT, STORE, OUTPUT
7 .text
8 .globl main
9 .type main, @function
10 main:
11 pushq %rbp # save caller’s frame pointer
12 movq %rsp, %rbp # establish our frame pointer
13
Listing 8.4: Save and restore the contents of the rbx and r12 – r15 registers. See Table 6.4, page
121, for the registers that should be saved/restored in a function if they are used in
the function.
The problem with this technique is maintaining the address in the stack pointer at a 16-byte
boundary. Another way to save/restore the registers will be given in Section 11.2.
could be used to store zero in a variable. But this technique would obviously be very tedious,
and any changes made to your code would almost certainly lead to a great deal of debugging.
For example, can you figure out the reason I had to do a pop before pushing the value onto the
stack? (Recall that the four bytes have already been reserved on the stack.)
At first, it may seem tempting to use the stack pointer, rsp, as the reference pointer. But this
creates complications if we wish to use the stack within the function.
A better technique would be to maintain another pointer to the local variable area on the
stack. If we do not change this pointer throughout the function, we can always use the base
register plus offset addressing mode to directly access any of the local variables. The syntax is:
offset(register_name)
Intel®
Syntax [register_name + offset]
base register plus offset: The data value is located in memory. The address of the memory
location is the sum of a value in a register plus an offset value, which can be an 8-, 16- or
32-bit signed integer.
syntax: place parentheses around the register name with the offset value imme-
diately before the left parenthesis.
examples: -8(%rbp); (%rsi); 12(%rax)
Intel®
Syntax [rbp - 8]; [rsi]; [rax + 12]
The appropriate register for implementing this is the frame pointer, rbp.
When a function is called, the calling function begins the process of creating an area on
the stack, called the stack frame. Any arguments that need to be passed on the call stack are
first pushed onto it, as described in Section 11.2. Then the call instruction pushes the return
address onto the call stack (page 156).
The first thing that the called function must do is to complete the creation of the stack frame.
The function prologue, first introduced in Section 7.2 (page 133), performs the following actions
at the very beginning of each function:
2. Copy the current value in the stack pointer to the frame pointer.
3. Subtract a value from the stack pointer to allow for the local variables.
Once the function prologue has completed the stack frame, we observe that:
• The local variables are located in an area of the call stack – between the addresses in the
rsp and rbp registers.
• The rbp register is a pointer to the bottom (the numerically highest address) of the local
variable area.
• The remaining area of the stack can be accessed using the stack pointer (rsp) as always.
Notice that each local variable is located at some fixed offset from the base register, rbp. In fact,
it’s a negative offset.
Listing 8.5 is the compiler-generated assembly language for the program in Listing 2.4 (page
23). Comments have been added to explain the parts of the code being discussed here.
1 .file "echoChar1.c"
2 .section .rodata
3 .LC0:
166 CHAPTER 8. PROGRAM DATA – INPUT, STORE, OUTPUT
Listing 8.5: Echoing characters entered from the keyboard (gcc assembly language). Comments
added. Refer to Listing 2.4 for the original C version.
The function begins by pushing a copy of the caller’s frame pointer (in the rbp register) onto
the call stack, thus saving it. Next it sets the frame pointer for this register at the current top
of the stack. These two actions establish a reference point to the stack frame for this function.
Next the program allocates sixteen bytes on the stack for the local variable, thus growing the
stack frame by sixteen bytes. It may seem wasteful to set aside so much memory since the only
variable in this program requires only one byte of memory, but the ABI [25] specifies that the
stack pointer (rsp) should be on a sixteen-byte address boundary before calling another function.
The easiest way to comply with this specification is to allocate memory for local variables in
multiples of sixteen.
Figure 8.5 shows the state of the stack just after the prologue has been executed. The return
address to the calling function is safely stored on the stack, followed by the caller’s frame pointer
value. The stack pointer (rsp) has been moved up the stack to allow memory for the local
variable. If this function needs to push data onto the stack, such activity will not interfere with
the local variable, the caller’s frame pointer value, nor the return address. The frame pointer
(rbp) provides a reference point for accessing the local variable.
IMPORTANT: The space for the local variables must be allocated immediately after establishing the
frame pointer. Any other use of the stack within the function, e.g., saving registers, must be done
after allocating space for local variables.
Most of the code in the body of the function is already familiar to you, but the instruction
8.3. LOCAL VARIABLES ON THE CALL STACK 167
rsp
Memory available
for use as
a stack by
this function
1 byte for aLetter
-16
-8
+0 Caller’s rbp
Unused memory (15 bytes)
rbp +8 Return address
Figure 8.5: Local variables in the program from Listing 8.5 are allocated on the stack. Numbers
on the left are offsets from the address in the frame pointer (rbp register).
that loads the address of the local variable, aString into the rsi register:
18 leaq -16(%rbp), %rsi # address of aLetter var.
is new. It uses the base register plus offset addressing mode for the source.
We can see from the instruction on line 18 that the aString variable is located negative
sixteen bytes away from the address in the rbp register.
As with the write function, the second argument to the read function must be the address
of a variable. However, the address of aString cannot be known when the program is compiled
and linked because it is the address of a variable that exists in the stack frame. There is no way
for the compiler or linker to know where this function’s stack frame will be in memory when it
is called. The address of the variable must be computed at run time.
Each instruction that accesses a stack frame variable must compute the variable’s address,
which is called the effective address. The instruction for computing addresses is load effective
address — leal for 32-bit and leaq for 64-bit addresses. The syntax of the lea instruction is
Use lea to get a
memory address;
leaw source, %register use mov to access
what is stored at
where w = l for 32-bit, q for 64-bit. the address.
Intel®
Syntax lea register, source
The source operand must be a memory location. The lea instruction computes the effec-
tive address of the source operand and stores that address in the destination register. So the
instruction
leaq -16(%rbp), %rsi
takes the value in rbp (the base address of this function’s stack frame), adds -16 to it, and stores
this sum in rsi. Now rsi contains the address of the variable aLetter.
So the following code sequence:
18 leaq -16(%rbp), %rsi # address of aLetter var.
19 movl $1, %edx # 1 character
20 movl $0, %edi # STDIN_FILENO
21 call read
• The characters read from the keyboard must be stored in memory. You cannot pass the
name of a cpu register to the read function.
168 CHAPTER 8. PROGRAM DATA – INPUT, STORE, OUTPUT
• The number of bytes actually read from the keyboard is returned in the eax register. So if
the current function is using eax, the value will be changed by the call to read.
• The read function is a C wrapper that sets up the registers for the syscall instruction.
Unfortunately, there is no guarantee that it restores the values that were in the registers
when it was called.
IMPORTANT: Since neither the write nor the read system call functions are guaranteed to restore
the values in the registers, your program must save any required register values before calling
either of these functions.
There is also a new instruction on line 31:
31 leave # undo stack frame
Just before this function exits the portion of the stack frame allocated by this function must be
released and the value in the rbp register restored. The leave instruction performs the actions:
movq %rbp, %rsp
popq %rbp
which effectively
After the epilogue has been executed, the stack is in the state shown in Figure 8.6. The
Figure 8.6: Local variable stack area in the program from Listing 8.5. Although the values in the
gray area may remain they are invalid; using them at this point is a programming
error.
stack pointer (rsp) points to the address that will return program flow back to the instruction
immediately after the call instruction that called this function. Although the data that was
stored in the memory which is now above the stack pointer is still there, it is a violation of stack
protocol to access it.
One more step remains in completing execution of this function — returning to the calling
function. Since the return address is at the top of the call stack, this is a simple matter of
popping the address from the top of the stack into the rip register. This requires a special
instruction,
ret
Automatic variables are created when the function is first entered. They are deleted upon exit
from the function, so any value stored in them during execution of the function is lost.
Static variables are created when the program is first started. Any values stored in them
persist throughout the lifetime of the program.
8.3. LOCAL VARIABLES ON THE CALL STACK 169
Most local variables in a function are automatic variables. General purpose registers are
used for local variables whenever possible. Since there is only one set of general purpose regis-
ters, a function that is using one for a variable must be careful to save the value in the register
before calling another function. Register usage is specified by the ABI [25] as shown in Table
6.4 on page 121. But you should not write code that depends upon everyone else following these
recommendations, and there are only a small number of registers available for use as variables.
In C/C++, most of the automatic variables are typically allocated on the call stack. As you have
seen in the discussion above, they are created (automatically) in the prologue when the function
first starts and are deleted in the epilogue just as it ends. Static variables must be stored in the
data segment.
We are now in a position to write the echoChar program in assembly language. The program
is shown in Listing 8.6.
1 # echoChar2.s
2 # Prompts user to enter a character, then echoes the response
3 # Bob Plantz - 8 June 2009
4
5 # Useful constants
6 .equ STDIN,0
7 .equ STDOUT,1
8 # Stack frame
9 .equ aLetter,-16
10 .equ localSize,-16
11 # Read only data
12 .section .rodata
13 prompt:
14 .string "Enter one character: "
15 .equ promptSz,.-prompt-1
16 msg:
17 .string "You entered: "
18 .equ msgSz,.-msg-1
19 # Code
20 .text # switch to text section
21 .globl main
22 .type main, @function
23 main:
24 pushq %rbp # save caller’s frame pointer
25 movq %rsp, %rbp # establish our frame pointer
26 addq $localSize, %rsp # for local variable
27
Listing 8.6: Echoing characters entered from the keyboard (programmer assembly language).
This program introduces another assembler directive (lines 6,7,9,10,15,18):
The .equ directive evaluates the expression and sets the name equivalent to it. Note that the
expression is evaluated during assembly, not during program execution. In essence, the name
and its value are placed on the symbol table during the first pass of the assembler. During the
second pass, wherever the programmer has used “name” the assembler substitutes the number
that the expression evaluated to during the first pass.
You see an example on line 9 of Listing 8.6:
9 .equ aLetter,-16
In this case the expression is simply -16. Then when the symbol is used on line 34:
34 leaq aLetter(%rbp), %rsi # place to store character
the assembler substitutes -16 during the second pass, and it is exactly the same as if the pro-
grammer had written:
leaq -16(%rbp), %rsi # place to store character
Of course, using .equ to provide a symbolic name makes the code much easier to read.
An example of a more complex expression is shown on lines 13 – 15:
13 prompt:
14 .string "Enter one character: "
15 .equ promptSz,.-prompt-1
The “.” means “this address”. Recall that the .string directive allocates one byte for each char-
acter in the text string, plus one for the NUL character. So it has allocated 22 bytes here. The
expression computes the difference between the beginning and the end of the memory allocated
by .string, minus 1. Thus, promptSz is entered on the symbol table as being equivalent to 21.
And on line 28 the programmer can use this symbolic name,
28 movl $promptSz, %edx # prompt size
which is much easier than counting each of the characters by hand and writing:
movl $21, %edx # prompt size
More importantly, the programmer can change the text string and the assembler will compute
the new length and change the number in the instruction automatically. This is obviously much
less prone to error.
Be careful not to mistake the .equ directive as creating a variable. It does not allocate any memory.
It simply gives a symbolic name to a number you wish to use in your program, thus making your
code easier to read.
A comment about programming style when using the .equ directive is appropriate here. No-
tice that the programmer has used it to give the same numerical value to two different symbols:
9 .equ aLetter,-16
10 .equ localSize,-16
Each symbol is used differently in the code. It would be confusing to a reader if only one symbol
were used in both places.
8.3. LOCAL VARIABLES ON THE CALL STACK 171
7 #include <stdio.h>
8
9 int main(void)
10 {
11 int anInt;
12
17 return 0;
18 }
Listing 8.7: Calling printf and scanf to write and read formatted I/O (C).
The assembly language generated by the gcc compiler is shown in Listing 8.8. Comments have
been added to explain the printf and scanf calls.
1 .file "echoInt1.c"
2 .section .rodata
3 .LC0:
4 .string "Enter an integer number: "
5 .LC1:
6 .string "%i"
7 .LC2:
8 .string "You entered: %i\n"
9 .text
10 .globl main
11 .type main, @function
12 main:
13 pushq %rbp
14 movq %rsp, %rbp
15 subq $16, %rsp
16 movl $.LC0, %edi # address of message
17 movl $0, %eax # no floats
18 call printf
19 leaq -4(%rbp), %rsi # address of anInt
20 movl $.LC1, %edi # address of format string
21 movl $0, %eax # no floats
22 call scanf
23 movl -4(%rbp), %esi # copy of anInt value
24 movl $.LC2, %edi # address of format string
25 movl $0, %eax # no floats
26 call printf
27 movl $0, %eax
28 leave
29 ret
172 CHAPTER 8. PROGRAM DATA – INPUT, STORE, OUTPUT
Listing 8.8: Calling printf and scanf to write and read formatted I/O (gcc assembly language).
The first call to printf passes only one argument. However, on line 17 in Listing 8.8 0 is
passed in eax:
16 movl $.LC0, %edi # address of message
17 movl $0, %eax # no floats
18 call printf
The eax register is not listed as being used for passing arguments (see Section 8.1).
Both printf and scanf can take a variable number of arguments. The ABI [25] specifies
that the total number of arguments passed in SSE registers must be passed in rax. As you will
learn in Section 14.5, the SSE registers are used for passing floats in 64-bit mode. Since no float
arguments are being passed in this call, rax must be set to 0. Recall that setting eax to 0 also
sets the high-order bits of rax to 0 (Table 7.1, page 141).
The call to scanf on line 14 in the C version passes two arguments:
scanf("%i", &anInt);
Again, we see that the eax register must be set to 0 because there are no float arguments.
The program written in assembly language (Listing 8.9) is easier to read because the pro-
grammer has used symbolic names for the constants and the stack variable.
1 # echoInt2.s
2 # Prompts user to enter an integer, then echoes the response
3 # Bob Plantz -- 11 June 2009
4
5 # Stack frame
6 .equ anInt,-4
7 .equ localSize,-16
8 # Read only data
9 .section .rodata
10 prompt:
11 .string "Enter an integer number: "
12 scanFormat:
13 .string "%i"
14 printFormat:
15 .string "You entered: %i\n"
16 # Code
17 .text # switch to text section
18 .globl main
19 .type main, @function
20 main:
21 pushq %rbp # save caller’s frame pointer
22 movq %rsp, %rbp # establish our frame pointer
23 addq $localSize, %rsp # for local variable
24
28
Listing 8.9: Calling printf and scanf to write and read formatted I/O (programmer assembly
language).
2. The address in the stack pointer (rsp) should be a multiple of 16 immediately before an-
other function is called.
These rules are best illustrated by considering the program in Listing 8.10.
1 /*
2 * varAlign1.c
3 * Allocates some local variables to illustrate their
4 * alignment on the call stack.
5 * Bob Plantz - 11 June 2009
6 */
7
8 #include <stdio.h>
9
10 int main(void)
11 {
12 char alpha, beta, gamma;
13 char *letterPtr;
14 int number;
15 int *numPtr;
16
17 alpha = ’A’;
18 beta = ’B’;
19 gamma = ’C’;
20 number = 123;
21 letterPtr = α
22 numPtr = &number;
23
26
27 return 0;
28 }
The assembly language generated by the compiler is shown in Listing 8.11 with comments added
for explanation.
1 .file "varAlign1.c"
2 .section .rodata
3 .LC0:
4 .string "%c %c %c %i\n"
5 .text
6 .globl main
7 .type main, @function
8 main:
9 pushq %rbp
10 movq %rsp, %rbp
11 subq $32, %rsp # 2 * 16
12 movb $65, -1(%rbp) # alpha = ’A’;
13 movb $66, -2(%rbp) # beta = ’B’;
14 movb $67, -3(%rbp) # gamma = ’C’;
15 movl $123, -8(%rbp) # number = 123;
16 leaq -1(%rbp), %rax
17 movq %rax, -16(%rbp) # letterPtr = α
18 leaq -8(%rbp), %rax
19 movq %rax, -24(%rbp) # numPtr = &number;
20 movq -24(%rbp), %rax
21 movl (%rax), %edx
22 movsbl -3(%rbp),%ecx
23 movsbl -2(%rbp),%edi
24 movq -16(%rbp), %rax
25 movzbl (%rax), %eax
26 movsbl %al,%esi
27 movl %edx, %r8d
28 movl %edi, %edx
29 movl $.LC0, %edi
30 movl $0, %eax
31 call printf
32 movl $0, %eax
33 leave
34 ret
35 .size main, .-main
36 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
37 .section .note.GNU-stack,"",@progbits
The char variables take one byte, so they can be aligned on each byte:
12 movb $65, -1(%rbp) # alpha = ’A’;
13 movb $66, -2(%rbp) # beta = ’B’;
14 movb $67, -3(%rbp) # gamma = ’C’;
The next available byte is at -4, but the int requires four bytes. However, it cannot be allocated
at -7 because it must be aligned on a byte address that is a multiple of four. So it is placed at -8:
15 movl $123, -8(%rbp) # number = 123;
8.4. DESIGNING THE LOCAL VARIABLE PORTION OF THE CALL STACK 175
The two pointer variables each require eight bytes. So placing letterPtr at -16 and numPtr
at -24 allows enough memory for each and places each on an address that is a multiple of eight.
16 leaq -1(%rbp), %rax
17 movq %rax, -16(%rbp) # letterPtr = α
18 leaq -8(%rbp), %rax
19 movq %rax, -24(%rbp) # numPtr = &number;
Placing each variable such that the alignment rules are met requires 24 bytes on the stack
for local variables. However, the ABI also states that the stack pointer must be on a 16-byte
address boundary. So we need to allocate 32 bytes for the local variables:
11 subq $32, %rsp # 2 * 16
Listing 8.12 shows how an assembly language programmer uses symbolic names to write
code that is easier to read.
1 # varAlign2.s
2 # Allocates some local variables to illustrate their
3 # alignment on the call stack.
4 # Bob Plantz - 11 June 2009
5 # Stack frame
6 .equ numPtr,-24
7 .equ letterPtr,-16
8 .equ number,-8
9 .equ gamma,-3
10 .equ beta,-2
11 .equ alpha,-1
12 .equ localSize,-32
13 # Read only data
14 .section .rodata
15 format:
16 .string "%c %c %c %i\n"
17 # Code
18 .text
19 .globl main
20 .type main, @function
21 main:
22 pushq %rbp # save caller’s frame pointer
23 movq %rsp, %rbp # establish our frame pointer
24 addq $localSize, %rsp # for local vars
25
Notice the assembly language syntax for single character constants on lines 26 – 28:
26 movb $’A’, alpha(%rbp) # initialize variables
27 movb $’B’, beta(%rbp)
28 movb $’C’, gamma(%rbp)
The GNU assembly language info documentation specifies that only the first single quote, ’A, is
required. But the C syntax, ’A’, also works, so we have used that because it is generally easier
to read.1
We can summarize the proper sequence of instructions for establishing a local variable envi-
ronment in a function:
These three
operations
1. Push the calling function’s frame pointer onto the stack.
MUST be
performed
EXACTLY in this
2. Copy the value in the stack pointer register (rsp) into the frame pointer register (rbp) to
order at the establish the frame pointer for the current function.
BEGINNING of
each function. 3. Allocate space for the local variables by moving the stack pointer to a lower address.
Just before ending this function, these three steps need to be undone. Since the frame pointer
is pointing to where the top of the stack was before we allocated memory for local variables, the
local variable memory can be deleted by simply copying the value in the frame pointer to the
stack pointer. Now the calling function’s frame pointer value is at the top of the stack. The
ending sequence is:
These two
1. Copy the value in the frame pointer register (rbp) to the stack pointer register (rsp).
operations
MUST be
2. Pop the value at the top of the stack into the frame pointer register (rbp).
performed
EXACTLY in this Listing 8.13 shows the general format that must be followed when writing a function. If you
order at the END follow this format and do everything in the order that is given for all your functions, you will
of each function. have many fewer problems getting them to work properly. If you do not, I guarantee that you
will have many problems.
1 # general.s
2 .text
3 .globl general
4 .type general, @function
5 general:
6 pushq %rbp # save calling function’s frame pointer
7 movq %rsp, %rbp # establish our frame pointer
8
When performing I/O you invoke the Linux operations yourself. The technique involves mov-
ing the arguments to specific registers, placing a special code in the eax register, and then using
the syscall instruction to call a function in the operating system. (The way this works is de-
scribed in Section 15.6 on page 345.) The operating system will perform the action specified
by the code in the eax register, using the arguments passed in the other registers. The values
required for reading from and writing to files are given in Table 8.3.
Table 8.3: Register set up for using syscall instruction to read, write, or exit.
In Listing 8.14 we have rewritten the program of Listing 8.6 without using the C environ-
ment.
1 # echoChar3.s
2 # Prompts user to enter a character, then echoes the response
3 # Does not use C libraries
4 # Bob Plantz -- 11 June 2009
5
6 # Useful constants
7 .equ STDIN,0
8 .equ STDOUT,1
9 .equ READ,0
10 .equ WRITE,1
11 .equ EXIT,60
178 CHAPTER 8. PROGRAM DATA – INPUT, STORE, OUTPUT
12 # Stack frame
13 .equ aLetter,-16
14 .equ localSize,-16
15 # Read only data
16 .section .rodata # the read-only data section
17 prompt:
18 .string "Enter one character: "
19 .equ promptSz,.-prompt-1
20 msg:
21 .string "You entered: "
22 .equ msgSz,.-msg-1
23 # Code
24 .text # switch to text section
25 .globl __start
26
27 __start:
28 pushq %rbp # save caller’s frame pointer
29 movq %rsp, %rbp # establish our frame pointer
30 addq $localSize, %rsp # for local variable
31
Comparing this program with the one in Listing 8.6, the program arguments are the same
and are passed in the same registers. The only difference with using the syscall function is that
you have to provide a code for the operation to be performed in the eax register. The complete list
of system operations that can be performed are in the system file /usr/include/asm-x86_64/unistd.h.
(The path on your system may be different.)
To determine the arguments that must be passed to each system operation read section 2 of
8.6. CALLING FUNCTIONS, 32-BIT MODE 179
the man page for that operation. For example, the arguments for the write system call can be
seen by using
Then follow the rules in Section 8.1 for placing the arguments in the proper registers.
Listing 8.15: Displaying four characters on the screen using the write system call function in
assembly language.
After all three arguments have been pushed onto the call stack, it looks like:
esp
????
1
(esp)+4 $Chars
(esp)+8 4
180 CHAPTER 8. PROGRAM DATA – INPUT, STORE, OUTPUT
where the notation (esp) + n means “the address in the esp register plus n.” The stack pointer,
the esp register, points to the last item pushed onto the call stack. The other two arguments
are stored on the stack below the top item. Don’t forget that “below” on the call stack is at
numerically higher addresses because the stack grows toward lower addresses.
When the call instruction is executed, the return address is pushed onto the call stack as
shown here:
esp
????
return
(esp)+4 1
(esp)+8 $Chars
(esp)+12 4
where “return” is the address where the called function is supposed to return to at the end of
its execution. So the arguments are readily available inside the called function; you will learn
how to access them in Chapter 8. And as long as the called function does not change the return
address, and restores the stack pointer to the position it was in when the function was called, it
can easily return to the calling function.
Now, let’s look at what happens to the stack memory area in the assembly language pro-
gram in Listing 8.15. Assume that the value in the esp register when the main function is
called is 0xbffffc5c and that the value in the ebp register is 0xbffffc6a. Immediately after the
subl $8, %esp instruction is executed, the stack looks like:
address contents
bffffc50: ????????
bffffc54: ????????
bffffc58: bffffc6a
bffffc5c: important information
the value in the esp register is 0xbffffc50, and the value in the ebp register is 0xbffffc58. The
“?” indicates that the states of the bits in the indicated memory locations are irrelevant to us.
That is, the memory between locations 0xbffffc50 and 0xbffffc57 is “garbage.”
We have to assume that the values in bytes number 0xbffffc5c, 5d, 5e, and 5f were placed
there by the function that called this function and have some meaning to that function. So we
have to be careful to preserve the value there.
Since the esp register contains 0xbffffc50, we can continue using the stack — pushing and
popping — without disturbing the eight bytes between locations 0xbffffc50 and 0xbffffc57.
These eight bytes are the ones we will use for storing the local variables. And if we take care not
to change the value in the ebp register throughout the function, we can easily access the local
variables.
8.7.1 Instructions
data movement:
opcode source destination action see page:
movs $imm/%reg %reg/mem move 141
movsss $imm/%reg %reg/mem move, sign extend 216
movzss $imm/%reg %reg/mem move, zero extend 217
popw %reg/mem pop from stack 163
pushw $imm/%reg/mem push onto stack 163
s = b, w, l, q; w = l, q
arithmetic/logic:
opcode source destination action see page:
cmps $imm/%reg %reg/mem compare 209
incs %reg/mem increment 220
leaw mem %reg load effective address 167
subs $imm/%reg %reg/mem subtract 190
s = b, w, l, q; w = l, q
8.8 Exercises
8-1 (§8.1) Enter the C program in Listing 8.1 and get it to work correctly. Run the program
under gdb, setting a break point at the call to write. When the program breaks, use the
si (Step one instruction exactly) command to execute the instructions that load registers
with the arguments. As you do this, keep track of the contents in the appropriate argument
registers and the rip register. What is the address where the text string is stored? If you
single step into the write function, use the cont command to continue through it.
182 CHAPTER 8. PROGRAM DATA – INPUT, STORE, OUTPUT
8-2 (§8.2) Modify the program in Listing 8.3 so that the stack grows from lower numbered
array elements to higher numbered ones.
8-3 (§8.2) Enter the the assembly language program in Listing 8.4 and show that the rbp and
rsp registers are also saved and restored by this function.
8-4 (§8.3) Enter the C program in Listing Listing 2.4 (page 23) and compile it with the debug-
ging option, -g. Run the program under gdb, setting a break point at each of the calls to
write and read. Each time the program breaks, use the si (Step one instruction exactly)
command to execute the instructions that load registers with the arguments. As you do
this, keep track of the contents in the appropriate argument registers and the rip regis-
ter. What are the addresses where the text strings are stored? What is the address of the
aLetter variable? If you single step into either the write or read functions, use the cont
command to continue through it.
8-5 (§8.3) Modify the assembly language program in Listing 8.6 such that it also reads the
newline character when the user enters a single character. Run the program with gdb. Set
a breakpoint at the first instruction, then run the program. When it breaks, write down
the values in the rsp and rbp registers. Write down the changes in these two registers as
you single step (si command) through the first three instructions.
Set breakpoints at the instruction that calls the read function and at the next instruction
immediately after that one. Examine the values in the argument-passing registers.
From the addresses you wrote down above, determine where the two characters (user’s
character plus newline) that are read from the keyboard will be stored, and examine that
area of memory.
Use the cont command to continue execution through the read function. Enter a character.
When the program breaks back into gdb, examine the area of memory again to make sure
the two characters got stored there.
8-6 (§8.3) Write a program in assembly language that prompts the user to enter an integer,
then displays its hexadecimal equivalent.
8-7 (§8.3) Write a program in assembly language that “declares” four char variables and four
int variables, and initializes all eight variables with appropriate values. Then call printf
to display the values of all eight variables with only one call.
Chapter 9
Computer Operations
We are now ready to look more closely at the instructions that control the CPU. This will only
be an introduction to the topic. We will examine the most common operations — assignment,
addition, and subtraction. Additional operations will be described in subsequent chapters.
Each assembly language instruction must be translated into its corresponding machine code,
including the locations of any data it manipulates. It is the bit pattern of the machine code that
directs the activities of the control unit.
The goal here is to show you that a computer performs its operations based on bit patterns.
As you read through this material, keep in mind that even though this material is quite te-
dious, the operations are very simple. Fortunately, instruction execution is very fast, so lots of
meaningful work can be done by the computer.
will assign the integer 123 to the variable x. If x is later used in an expression, the value
assigned to x will be used in evaluating the expression. For example, the expression
2 * x;
causes memory to be allocated and the location of that memory to be given the name “x.” That
is, other parts of the program can refer to the memory location where the value of x is stored by
using the name “x.” The type name in the declaration, int, tells the compiler how many bytes
to allocate and the code used to represent the data stored at this location. The int type uses the
two’s complement code. The assignment statement,
x = 123;
183
184 CHAPTER 9. COMPUTER OPERATIONS
sets the bit pattern in the location named x to 0x0000007b, the two’s complement code for the
integer 123. The assignment statement
x = -123;
sets the bit pattern in the location named, x to 0xffffff85, the two’s complement code for the
integer -123.
Let us consider the simplest case where
• the allocated memory is within the CPU (i.e., a register).
• the bit pattern has no “real world” meaning.
That is, we will consider a program that simply sets a bit pattern in a CPU register. A C program
to do this is shown in Listing 9.1.
1 /*
2 * assignment1.c
3 * Assign a 32-bit pattern to a register
4 *
5 * Bob Plantz - 11 June 2009
6 */
7
8 #include <stdio.h>
9
10 int main(void)
11 {
12 register int x;
13
14 x = 0xabcd1234;
15
18 return 0;
19 }
The register modifier “advises” the compiler to use a CPU register for the integer variable
named “x.” And the notation 0xabcd1234 means that abcd1234 is written in hexadecimal. (Recall
that hexadecimal is used as a compact notation for representing bit patterns.) When the C
program in Listing 9.1 is compiled into its assembly language equivalent with no optimization:
bob$ gcc -S -O0 -fno-asynchronous-unwind-tables assignment1.c
the gcc compiler generates the assembly language program shown in Listing 9.2, with a com-
ment added to show where the assignment operation takes place.
1 .file "assignment1.c"
2 .section .rodata
3 .LC0:
4 .string "x = %i\n"
5 .text
6 .globl main
7 .type main, @function
8 main:
9 pushq %rbp
10 movq %rsp, %rbp
11 movl $-1412623820, %esi # x = 0xabcd1234;
12 movl $.LC0, %edi
13 movl $0, %eax
9.1. THE ASSIGNMENT OPERATOR 185
14 call printf
15 movl $0, %eax
16 leave
17 ret
18 .size main, .-main
19 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
20 .section .note.GNU-stack,"",@progbits
Listing 9.2: Assignment to a register variable (gcc assembly language). Comment added to show
the assignment operation.
The C assignment operation is implemented with the mov instruction. For example, in Listing
9.1,
14 x = 0xabcd1234;
is implemented with
11 movl $-1412623820, %esi # x = 0xabcd1234;
on line 11 in Listing 9.2. We can see that the compiler chose to use the esi register as the x
variable.
The instructions on lines 12 – 14 implement the call to the printf function. One reason
for the call to the printf function is to prevent the compiler from eliminating the assignment
statement during its optimization of this function. Yes, even with the -O0 option the compiler
does some optimization.
Compare this to Listing 7.4 on page 144. Notice that the prologue
main:
pushq %rbp
movq %rsp, %rbp
and epilogue
leave
ret
Intel®
Syntax mov esi, -1412623820
we see the three other differences noted in Section 7.2.2 (page 141):
• the operand order is opposite,
• the AT&T syntax requires a “%” prefix to the name of a register, and
• the AT&T syntax requires a “$” prefix to the immediate data.
These differences are specific to the assembler program being used and are not relevant to the
behavior of the CPU. The assembler program will translate the assembly language instruction
into the correct machine language code.
You may wonder why the gcc compiler assigns the constant -1412623820 to the variable,
while the C version of the program assigns 0xabcd1234. The answer is that they are the same
values. The first is expressed in decimal and the second in hexadecimal. We discussed the
equivalence of decimal and hexadecimal in Section 2.2 (page 8), and we discussed signed decimal
integers in Section 3.3 (page 34).
In Listing 9.3 we show the essential assembly language required to implement the C program
from Listing 9.1.
186 CHAPTER 9. COMPUTER OPERATIONS
1 # assignment2.s
2 # Assigns a 32-bit pattern to the esi register.
3 # Bob Plantz - 11 June 2009
4
5 .text
6 .globl main
7 .type main, @function
8 main:
9 pushq %rbp # save caller’s base pointer
10 movq %rsp, %rbp # establish our base pointer
11
is the only assembly language statement that was added to the program. From this comparison,
you can see that this assembly language statement implements the two C statements:
register int x;
x = 0xabcd1234;
Like the compiler (Listing 9.2), we are using the esi register as our variable. We can use the
registers in Table 6.4 (page 121) as variables, except the stack pointer, %rsp, which has special
uses. The “%” prefix tells the assembler that these are names of registers, hence in the CPU and
not labels on memory locations.
Let us look more closely at the program in Listing 9.3. I used an editor to enter the code then
assembled and linked it. Since it does not produce a display on the screen, I used gdb to observe
the changes in the registers. My typing is boldface.
$ gdb assignment2
(gdb) li
1 # assignment2.s
2 # Assigns a 32-bit pattern to the esi register.
3 # Bob Plantz - 11 Jun 2009
4
5 .text
6 .globl main
7 .type main, @function
8 main:
9 pushq %rbp # save caller’s frame pointer
10 movq %rsp, %rbp # establish our frame pointer
9.1. THE ASSIGNMENT OPERATOR 187
I use the li command to list part of the program. This allows us to see where I should
set the first breakpoint.
(gdb) br 9
(gdb) run
I run the program, it breaks at the first breakpoint, and I can display the registers.
The i r rax rsi rsp rbp rip (info registers) command displays the contents of the
registers that are used in this program. Note that the value in the rip register (the
instruction pointer) is 0x4004ac. If you replicate this example (a good thing to do) you
will probably get a different values in your registers.
(gdb) si
Next I use the single instruction (si) command to execute one instruction.
I display the new state of the registers. Notice that the rip register has changed from
0x4004ac to 0x4004ad. This tells us that the instruction that was just executed, pushl
%rbp, is 0x4004ad - 0x4004ac = 1 byte long. The numbers in the right-hand column
show the decimal equivalent of the bit patterns for some of the registers. The instruc-
tion that is about to be executed will copy the value in the rsp register to the rbp register
and the next one will set the thirty-two bits of the esi register to 0xabcd1234.
(gdb) si
main () at assignment2.s:12
12 movl $0xabcd1234, %esi # set a bit pattern in esi
(gdb) si
The i r command shows us that the rbp register has been changed to equal the rsp
register and the esi register has been set to the bit pattern 0xabcd1234. The rsi register
actually contains the bit pattern 0x00000000abcd1234; gdb does not display leading
zeros. The rip register has changed from 0x4004ad to 0x4004b5. This tells us that the
total number of bytes in the two instructions that were just executed, movq %rsp, %rbp
Don’t forget that
these are in hex. and movl $0xabcd1234, %edi is 0x4004b5 - 0x4004ad = 8 bytes.
(gdb) si
rax 0x0 0
rsi 0xabcd1234 2882343476
rsp 0x7fff22d95020 0x7fff22d95020
rbp 0x7fff22d95020 0x7fff22d95020
rip 0x4004ba 0x4004ba <main+14>
Executing another single instruction shows that the movl $0, %eax instruction does,
indeed, store all zeros in the eax register. The program is now poised at the instruction
that will begin undoing the stack frame in preparation for the return to the calling
function.
(gdb) si
main () at assignment2.s:16
16 popq %rbp # restore caller’s frame pointer
(gdb) si
main () at assignment2.s:16
17 ret # back to caller
rax 0x0 0
rsi 0xabcd1234 2882343476
rsp 0x7fff22d95028 0x7fff22d95028
rbp 0x0 0x0
rip 0x4004be 0x4004be <main+18>
Executing two more instruction and displaying the registers shows that the frame
pointer register, rbp, has been restored to its original value and the return value (in
eax) is correct.
(gdb) cont
Continuing.
Finally, I use the continue command (cont) to run the program out to its end. Note:
If you use the si command to single step beyond the ret instruction at the end of the
main function, gdb will dutifully take you through the system libraries. At best, this is
a waste of time.
(gdb) q
$
The add instruction adds the source operand to the destination operand using the rules of binary
addition, leaving the result in the destination operand. As with the mov instruction, no more
than one operand can be a memory location. The source operand is not changed. In C/C++ the
You need to use
operation could be expressed as: at least one
register to add or
destination += source subtract.
For example, the instruction
addq %rax, %rdx
adds the 64-bit value in the rax register to the 64-bit value in the rdx register, leaving the rax
register intact. The instruction
addw %dx, %r10w
adds the 32-bit value in the dx register to the 32-bit value in the r10w register.
In the Intel syntax, the size of the data is determined by the operand, so the size character
(b, w, l, or q) is not appended to the instruction. (And the order of the operands is reversed.)
Intel®
Syntax add destination, source
We saw in Chapter 3 that addition may cause carry or overflow. Carry and overflow are
recorded in the 64-bit rflags register. The CF is bit number zero, and the OF is bit number
eleven (numbering from right to left). Whenever an add instruction is executed both bits are set
as shown in Algorithm 9.1.
Algorithm 9.1: Carry Flag and Overflow Flag after add.
1 if there is no carry then
2 CF ⇐ 0;
3 else
4 CF ⇐ 1;
5 if there is no overflow then
6 OF ⇐ 0;
7 else
8 OF ⇐ 1;
190 CHAPTER 9. COMPUTER OPERATIONS
If the values being added represent unsigned ints, CF indicates whether the result fits within
the operand size or not. If the values represent signed ints, OF indicates whether the result fits
within the operand size or not. If the size of the operands is less than 64 bits and the operation
produces a carry and/or an overflow, this is not propagated up through the next bits in the des-
tination operand. The carry and overflow conditions are simply recorded in the corresponding
bits in the rflags register.
For example, if we consider the initial conditions
register contents
rax: ffff eeee dddd cccc
r8: 2222 4444 6666 8888
CF: ?
OF: ?
the instruction
addl %eax, %r8w
would produce
register contents
rax: ffff eeee dddd cccc
r8: 2222 4444 4444 5554
CF: 1
OF: 0
would produce
register contents
rax: ffff eeee dddd cccc
r8: 2222 4444 6666 8854
CF: 1
OF: 1
The sub instruction subtracts the source operand from the destination operand using the rules
of binary subtraction, leaving the result in the destination operand. As with the mov instruction,
no more than one operand can be a memory location. The source operand is not changed. In
C/C++ the operation could be expressed as:
destination -= source
subtracts the 32-bit value in the eax register from the 32-bit value in the edx register. The
instruction
subb %dh, %ah
9.2. ADDITION AND SUBTRACTION OPERATORS 191
subtracts the 8-bit value in the dh register from the 8-bit value in the ah register.
In the Intel syntax, the size of the data is determined by the operand, so the size character
(b, w, or l) is not appended to the instruction. (And the order of the operands is reversed.)
Intel®
Syntax sub destination, source
Subtraction also affects the CF and the OF. Whenever a sub instruction is executed both bits
are set as shown in Algorithm 9.2.
Algorithm 9.2: Carry Flag and Overflow Flag after subtraction.
1 if there is no borrow then
2 CF ⇐ 0;
3 else
4 CF ⇐ 1;
5 if there is no overflow then
6 OF ⇐ 0;
7 else
8 OF ⇐ 1;
Just as with addition, if the values being subtracted represent unsigned ints, CF indicates
whether there was a borrow from beyond the operand size or not. If the values represent signed
ints, OF indicates whether the result fits within the operand size or not. If the size of the
operands is less than 64 bits and the operation produces a carry and/or an overflow, this is
not propagated up through the next bits in the destination operand. The carry and overflow
conditions are simply recorded in the corresponding bits in the rflags register.
For example, if we consider the initial conditions
register contents
rax: ffff eeee dddd cccc
r8: 2222 4444 6666 8888
CF: ?
OF: ?
the instruction
subl %eax, %r8w
would produce
register contents
rax: ffff eeee dddd cccc
r8: 2222 4444 8888 bbbc
CF: 1
OF: 1
would produce
register contents
rax: ffff eeee dddd cccc
r8: 2222 4444 6666 88bc
CF: 1
OF: 0
A simple program given in Listing 9.4 illustrates both addition and subtraction in C.
192 CHAPTER 9. COMPUTER OPERATIONS
1 /*
2 * addAndSubtract1.c
3 * Reads two integers from user, then
4 * performs addition and subtraction
5 * Bob Plantz - 11 June 2009
6 */
7
8 #include <stdio.h>
9
10 int main(void)
11 {
12 int w, x, y, z;
13
20 return 0;
21 }
Worse, there is no message even warning that these are incorrect results. You know (see Section
3.4, page 39) that the results have overflowed. C does not check for overflow, so you would have
to write code that explicitly checks for it.
The assembly language generated by gcc is shown in Listing 9.5 with comments added.
1 .file "addAndSubtract1.c"
2 .section .rodata
3 .LC0:
4 .string "Enter two integers: "
5 .LC1:
6 .string "%i %i"
7 .LC2:
8 .string "sum = %i, difference = %i\n"
9 .text
10 .globl main
11 .type main, @function
12 main:
13 pushq %rbp
14 movq %rsp, %rbp
15 subq $16, %rsp
16 movl $.LC0, %edi
17 movl $0, %eax
18 call printf
19 leaq -8(%rbp), %rdx # load address of x
20 leaq -4(%rbp), %rsi # load address of w
9.2. ADDITION AND SUBTRACTION OPERATORS 193
must be broken down into distinct steps at the assembly language level:
24 movl -4(%rbp), %edx # load w
25 movl -8(%rbp), %eax # load x
26 leal (%rdx,%rax), %eax # eax <- w + x
27 movl %eax, -12(%rbp) # y = w + x;
It probably seems very odd that there is no add instruction in this code sequence. The compiler
has used the leal instruction with the indexed addressing mode, which will be discussed in more
detail in Section 13.1 when we discuss arrays. Basically, it is intended to compute an address by
adding the values in the two registers that are in the parentheses. In this example, it adds the
two values in rdx and rax. This sum is intended to be used as an address, so the leal instruction
is used to load the sum into eax.
An important difference between leal and addl is that leal does not affect the condition
codes in the eflags register. It might seem that this would “disqualify” this construct from
being used to add two integers, but C does not check for carry or overflow. So this meets the
specifications of the C language.
Similarly, the C statement:
17 z = w - x;
It is easy to see that the compiler did not generate the most efficient code. (This was compiled
with no optimization.)
We have seen that the computations performed by both these C statements can produce
overflow. Table 9.1 shows how the variables (and CF and OF) change as we walk through the code
in the program of Listing 9.4. There are two runs of the program using the input values above.
statement w x y z CF OF
scanf(); 0x3b9aca00 0x77359400 ???????? ???????? ? ?
y = w + x; 0x3b9aca00 0x77359400 0xb2d05e00 ???????? 0 0
z = w - x; 0x3b9aca00 0x77359400 0xb2d05e00 0xc4653600 1 0
Table 9.1: Walking through the code in Listing 9.4. There are two runs of the program here.
Listing 9.6 shows an assembly language program that performs the same operations as the
C program in Listing 9.4 but uses the jno (jump if no overflow) instruction to check for overflow.
These checks are easy in assembly language. They add very little to the execution time of the
program, because most of the time only the conditional jumps are executed, and the jumps do
not take place.
1 # addAndSubtract2.s
2 # Gets two integers from user, then
3 # performs addition and subtraction
4 # Bob Plantz - 11 June 2009
5 # Stack frame
6 .equ w,-8
7 .equ x,-4
8 .equ localSize,-16
9 # Read only data
10 .section .rodata
11 prompt:
12 .string "Enter two integers: "
13 getData:
14 .string "%i %i"
15 display:
16 .string "sum = %i, difference = %i\n"
17 warning:
18 .string "Overflow has occurred.\n"
19 # Code
20 .text
21 .globl main
22 .type main, @function
23 main:
24 pushq %rbp # save caller’s base pointer
25 movq %rsp, %rbp # establish our base pointer
26 addq $localSize, %rsp # for local vars
27
We will ignore the problem of getting data into the computer for this example, but we will
certainly want to be able to move data from location to location in our computer. So we will have
five operations:
move
add
subtract
multiply
divide
Our design will need to allow three bits for encoding each of these operations. For example, we
could use the following code:
196 CHAPTER 9. COMPUTER OPERATIONS
move 000
add 001
subtract 010
multiply 100
divide 111
Recall that N bits can be used to encode 2N different values. We want 1 MB of memory. From
210 = 1024 = 1K, and 1M = 1K × 1K = 210 × 210 = 220 , we see that we need to allow 20 bits for
memory addressing.
Thus, if we want our computer to be able to add a value stored in one memory location to the
value at another we need 3 + 20 + 20 = 43 bits to encode the instruction. Question: how many
bits would be required if we wanted a design that would allow us to add two values stored in
memory and store the sum at a third location?
Our silly design falls far short of practicality. The instructions themselves take too much
memory, and we have allowed for only a very limited number of operations on the data. This
was a more serious problem in the early days of computer design because memory was very
expensive. The result was that computer designers came up with some clever ways to encode
the necessary information into very few bits.
The design of the x86 processors is a very good example of this cleverness. Intel has paid
particular attention to backwards compatibility as their designs have evolved. Thus, we see
the remnants of the earlier designs — when memory was very expensive — in the latest Intel
processors. The more common instructions generally take fewer bytes of memory. As newer,
more complex features have been added, they generally take more bytes.
Computer design took a different turn in the 1980s. Memory had become much cheaper and
CPUs had become much faster. This led to designs where all the instructions are the same size
— 32 bits being very common these days.
We now turn our attention to the machine code that is produced by the assembler. Recall
that it is the machine code that is actually executed by the control unit in the CPU. That is, the
computer is controlled by bit patterns that are loaded into the instruction register in the CPU.
Programmers seldom need to know what the machine code is for any given assembly lan-
guage instruction. The actual instruction depends upon the operation to be performed, the
location(s) of the data to operate on, and the size of the data. Even when writing in assembly
language, the programmer uses mnemonic names to specify each of these, and the assembler
program translates them into the proper machine code instruction. So you do not need to memo-
rize machine code. However, learning how assembly language instructions translate to machine
code is important for learning how a computer actually works. And knowing how to “hand
assemble” an instruction using a manual can help you find obscure bugs.
5 .text
6 .globl main
7 .type main, @function
8 main:
9 pushq %rbp # save caller’s base pointer
10 movq %rsp, %rbp # establish our base pointer
11
Listing 9.7: Some instructions for us to assemble. (This is not a program, just some instruc-
tions.)
The command to assemble the source file in Listing 9.7 and create a listing file is
The -al option sends the listing file to the standard output file, which defaults to the screen.
You can capture this output by redirecting the standard output to a disk file. A good extension
for the file name is “.lst.” The complete command is
Notice the line
as --gstabs -al -o someMachineCode.o someMachineCode.s \ continuation
> someMachineCode.lst character, ’\’.
Since the first instruction occupies one byte of memory, the second instruction will start in byte
number 0001 (the second byte from the beginning). From the assembly listing file (Figure 9.1)
we see that the machine code for
movq %rsp, %rbp
198 CHAPTER 9. COMPUTER OPERATIONS
1 # someMachineCode.s
2 # Some instructions to illustrate machine code.
3 # Bob Plantz - 11 June 2009
4
5 .text
6 .globl main
7 .type main, @function
8 main:
9 0000 55 pushq %rbp # save caller’s base pointer
10 0001 4889E5 movq %rsp, %rbp # establish our base pointer
11
12 0004 49BAEFCD movq $0x1234567890abcdef, %r10 # 64-bit immediate
12 AB907856
12 3412
13 000e 41BB7856 movl $0x12345678, %r11d # 32-bit immediate
13 3412
14 0014 6641BC34 movw $0x1234, %r12w # 16-bit immediate
14 12
15 0019 41B512 movb $0x12, %r13b # 8-bit immediate
16
17 001c 4989C2 movq %rax, %r10 # 64-bit operands
18 001f 4189CB movl %ecx, %r11d # 32-bit operands
19 0022 664189D4 movw %dx, %r12w # 16-bit operands
20 0026 4188DD movb %bl, %r13b # 8-bit operands
21
22 0029 4C01D0 addq %r10, %rax # add 64-bit operands
23
24 002c 8807 movb %al, (%rdi) # register indirect
25 002e 4C896618 movq %r12, 24(%rsi) # register indirect with offset
26
27 0032 B8000000 movl $0, %eax # return 0 to caller
27 00
28 0037 4889EC movq %rbp, %rsp # restore stack pointer
29 003a 5D popq %rbp # restore caller’s base pointer
30 003b C3 ret # back to caller
Figure 9.1: Assembler listing file for the function shown in Listing 9.7.
This instruction occupies three bytes. Thus, the third instruction in this function begins at the
fifth byte — relative location 0004. Continuing to line 30, the last instruction in the program
ret
is a one-byte instruction. It is the sixtieth byte in the function and is located at relative location
003b with the bit pattern,
So you can use the -al option for the as assembler to produce an assembler listing, which will
show you exactly what the bit patterns are for each instruction and which bytes, relative to the
beginning of the function, are set to these patterns.
9.3. INTRODUCTION TO MACHINE CODE 199
• Opcode — This is the first byte in the instruction and specifies the basic operation per-
formed by executing the instruction. It can also include operand location.
• ModRM — The mode/register/memory byte specifies operand locations and how they are
accessed.
• SIB — The scale/index/base byte specifies operand locations and how they are accessed.
• Data — These bytes are used to encode constants, either those that are part of the program,
or those that are relative address offsets to operand locations in memory.
• Prefix — If placed in before the opcode, these modify the behavior of the instruction, typi-
cally the size of the operands.
-prefix--opcode--modrm----sib-----data--
Figure 9.2: General format of instructions. There can be more than one prefix byte. The number
of data bytes depends on the size of the data.
The reason for this error is explained in Section 6.2 (page 118). Accessing the %dil register
requires that the assembler insert a REX prefix, but the %ah register cannot be accessed by an
instruction that has a REX prefix.
REX prefixes are a byproduct of maintaining backward compatibility. The x86-32 architec-
ture has only 8 general purpose registers, so it is sufficient to have only three bits in an instruc-
tion to specify any register. There are 16 general purpose registers in the x86-64 architecture,
so four bits are required to specify a register. Some instructions involve up to three registers,
thus there must be a place for three more bits to specify all the registers. Rather than change
the register-specifying patterns in the Opcode, ModRM, and SIB bytes, the CPU designers de-
cided to use the REX.R, REX.X, and REX.B bits in the REX prefix byte as the high-order bits for
specifying registers. This provides the necessary three bits for register specification. A fourth
bit in the REX prefix, the REX.W bit, is set to 1 when the operand is 64 bits. For all other operand
sizes — 8, 16, or 32 bits — REX.W is set to 0. The format of the REX prefix byte is shown in
Figure 9.3.
200 CHAPTER 9. COMPUTER OPERATIONS
0100WRXB
Figure 9.3: REX prefix byte. The four lettered bits are named REX.W, REX.R, REX.X, and REX.B.
mmrrrbbb
Figure 9.4: ModRM byte. The mode is specified by the mm bits, register by the rrr bits, and
address base register by the bbb bits.
mm meaning
00 memory operand; address in register specified by bbb
01 memory operand; address in register specified by bbb plus 8-bit offset
10 memory operand; address in register specified by bbb plus 16-bit offset
11 register operand; register specified by bbb
Table 9.2: The mm field in the ModRM byte. Shows how to interpret the bbb register field.
bbb and rrr. If mm = 00 the bbb register contains the memory address of one of the operands.
The bbb register contains a base address for the other two values of mm. 01 means that an 8-bit
offset, and 10 a 16-bit offset, is added to the base address to obtain the memory address. The
offset is stored as part of the instruction.
The meaning of the register fields is shown in Table 9.3. For 64-bit mode, the REX bit column
is explained in Section 9.3.3.
ssiiibbb
Figure 9.5: SIB byte. The ss bits specify a scale factor, the iii bits the index register, and the
bbb bits the address base register.
indexed addressing mode (see Section 13.1, page 291). The memory address is given by multi-
plying the value in the index register by the scale factor and adding this to the address in the
base register. There can also be a offset, which is added to this sum.
9.3. INTRODUCTION TO MACHINE CODE 201
Notes:
1. A 3-bit register field can be in an opcode, ModRM, or SIB byte, depending upon the instruction.
2. The REX bit is the REX.R, REX.X, or REX.B bit in the REX prefix (Section 9.3.3), depending on the location of
the register field.
3. If a REX prefix is required, the REX.W bit is set to 1 for 64-bit operands.
4. The ah, bh, ch, and dh registers cannot be used in an instruction that requires a REX prefix; the spl, bpl, sil, and
dil registers require a REX prefix.
Table 9.3: Machine code of general purpose registers. The register name specified by the pro-
grammer determines other bit patterns in the instruction in addition to those shown
here.
This instruction copies all eight bytes from the rsp register to the rbp register. It starts with a
REX Prefix, followed by two bytes for the instruction itself. The general format of the instruction
for moving data from one register to another is shown in Figure 9.6. The REX Prefix is followed
1000100w11srcdst
Figure 9.6: Machine code for the mov from a register to a register instruction. The source register
is coded in the src bits and the destination in the dst bits. See Table 9.3 for the bit
patterns in each of these fields.
REX Prefix is 1, the operand size is 64 bits. Thus, the instruction makes a copy of all 64 bits in
the rsp register into the ebp register.
The second mov format covered here is moving immediate data to a register. Examples are
given on lines 11 – 14 of Figure 9.1. The first operand (the source) is a literal — the value itself
is stated. This value will be stored immediately after the instruction. Of course, the instruction
must encode the fact that this operand is located at the address immediately following the
instruction — the immediate data addressing mode. The destination operand is a register —
the register direct addressing mode. The general format for the move immediate data to a
register instruction is shown in Figure 9.7 in binary.
1011wdst--data----data----data----data--
Figure 9.7: Machine code for the mov immediate data to a register instruction. The number of
data bytes depends on the size of the data.
Consider the
11 0004 49BAEFCD movq $0x1234567890abcdef, %r10
11 AB907856
11 3412
instruction, the assembler determines that this is a mov instruction and the source operand is
immediate data (due to the “$” character), so the first four bits of the opcode are 1011 (see Figure
9.7). Since the operand is not 8 bits, the “w” bit is 1. Next, the assembler figures out that the
destination register is the r10 register. Looking this up on Table 9.3 (which is built into the
assembler) shows that the remaining three bits are 010. Thus, the assembler generates the first
byte of the instruction:
Since the operand size is 64 bits, the data value, 0x1234567890abcdef, is stored immediately
(immediate addressing mode) after the instruction. Notice that the bytes seem to be stored
backwards. That is, it looks like the assembler stored the 64-bit value 0xefcdab9078563412!
Recall that the x86-64 architecture uses the little endian order for storing data in memory, so
when the movl instruction copies four bytes from memory into a register, the byte at the lowest
memory address is loaded into the least significant byte of the register, the byte at the next
memory address is loaded into the next higher order byte of the register, etc. The assembler
takes this into account for us and stores the immediate data in memory in little endian format.
The endian issue is irrelevant if you are always consistent with the size of the data item.
However, if your algorithm changes data size, you need to be very aware of the endianess of the
processor. For example, if you use a movl to store four bytes in memory, then four movbs to read
them back into registers, you need to be aware of how they are physically stored in memory.
Finally, since this instruction operates on a 64-bit value, the instruction requires a REX
Prefix. Referring to Figure 9.3 we see that the REX.W bit is 1, indicating the 64-bit size of the
operands. And the REX.B bit is 1, which is used with the dst field to give the 4-bit number of
the r10 register, 10102 .
BE CAREFUL! Notice that the instruction is ten bytes long (Figure 9.1), but the operand size is
four bytes. Do not confuse the size of the instruction with the size of the operand(s).
the same size as the register to which it is added, except when adding to the rax register. Then
the immediate data is 32 bits and is sign-extended to 64 bits before adding it to the value in the
rax register. Note that this instruction is not used for the ah portion of the a register. For adding
an immediate value to a value to the ah register or any of the other registers, the assembler
program must use the instruction shown in Figure 9.9.
0000010w--data----data----data----data--
Figure 9.8: Machine code for the add immediate data to the A register (except ah) instruction.
The number of data bytes depends on the size of the data.
1000000w11000dst--data----data----data----data--
Figure 9.9: Machine code for the add immediate data to register (not al, ax, nor eax registers)
instruction. The number of data bytes depends on the size of the data.
Notice that the instruction for adding to the a register (except the ah portion) is one byte
shorter than when adding to the other registers (compare Figures 9.8 and 9.9). There is an
historical reason for this. Early CPU designs had only one general purpose register. It was
used as the “accumulator” for performing arithmetic. (Perhaps naming it the “a” register makes
a little more sense.) As more general purpose registers were added to the designs, assembly
language programmers tended to continue using the “accumulator” register more frequently
than the others. And compiler writers continued this same pattern of register usage. Hence, the
“a” register is used much more for addition in a program than the other registers, and making it
a shorter instruction reduces memory usage and increases execution speed. The differences are
generally irrelevant these days, but the x86 architecture has evolved in such a way to maintain
backward compatibility.
The add instruction shown in Figure 9.10 is used when the data value is small enough to fit
into one byte, but it is being added to a two-, four-, or eight-byte register. The value is sign-
extended to a full 16-bit, 32-bit, or 64-bit value, respectively, inside the CPU before it is added to
the register. Sign-extension consists of copying the high-order bit into each bit to the left until
the full width is reached. For example, sign-extending 0x7f to 32 bits would give 0x0000007f;
sign-extending 0x80 to 32 bits would give 0xffffff80. Notice that sign-extension preserves the
signed decimal value of the bit pattern. (Review Section 3.3.)
Adding small
32-bit values.
1000001111000dst--data--
Figure 9.10: Machine code for the add immediate data to a register instruction. Used when the
data will fit into one byte, but the register is two, four, or eight bytes. Value is
sign-extended.
Even though the value can be coded in only eight bits, the full 32 bits of the register may be
affected by the addition. That is, the machine code is 83c105 (the data is coded in only one byte),
but the CPU adds 0x00000005 to the rcx register. (Recall that this may produce different results
than simply adding 0x05 to the cl portion of the ecx register.)
The format for adding a value in a register to a value in a register is shown in Figure 9.11.
Again, the registers and size of data are specified by the bits w, src, and dst are given in Table
9.3, and “src” means “source” and “dst” means “destination.”
204 CHAPTER 9. COMPUTER OPERATIONS
0000000w11srcdst
Figure 9.11: Machine code for the add register to register instruction.
This instruction adds the 32 bits from the ecx register to the 32 bits in the edx register, leaving
the result in the edx register. From Table Table 9.3, w = 1, src = 001, and dst = 010. Thus the
instruction is
00000001 11001010 2 = 01ca816
9.4.1 Instructions
data movement:
opcode source destination action see page:
movs $imm/%reg %reg/mem move 141
movsss $imm/%reg %reg/mem move, sign extend 216
movzss $imm/%reg %reg/mem move, zero extend 217
popw %reg/mem pop from stack 163
pushw $imm/%reg/mem push onto stack 163
s = b, w, l, q; w = l, q
arithmetic/logic:
opcode source destination action see page:
adds $imm/%reg %reg/mem add 189
adds mem %reg add 189
cmps $imm/%reg %reg/mem compare 209
incs %reg/mem increment 220
leaw mem %reg load effective address 167
subs $imm/%reg %reg/mem subtract 190
subs mem %reg subtract 190
s = b, w, l, q; w = l, q
9.5 Exercises
9-1 (§9.1) Enter the assembly language program in Listing 9.3. Use gdb to single step through
the program as shown in the book. Before executing each instruction, predict how the rax,
rbp, and rsp registers will change. Also record the values in the rip and eflags registers
as you single step through the program. How many bytes are there in each instruction?
9-2 (§9.2) Enter the C program in Listing 9.4. Using gdb, verify that the program works cor-
rectly, as shown in Table 9.1.
9-3 (§9.2) Enter the assembly language program in Listing 9.6 and run it. Notice that it gives
different results than the C version if there is overflow. Why is this? Modify the program
so that it gives the same results as the C version but still gives an overflow warning.
9-4 (§9.3) Assemble each of the mov instructions in Listings 9.7 by hand. Check your answers
with the assembly listing.
9-5 (§9.3) Assemble each of the add instructions in Listing 9.7 by hand. Check your answers
with the assembly listing.
9-6 (§9.3) Assemble each of the following instructions by hand (on paper).
Check your work by entering the code into a source file of the form
.text
.globl main
.type main, @function
main:
pushq %rbp
movq %rsp, %rbp
# Your code sequence goes here.
movl $0, %eax
popq %rbp
ret
9-7 (§9.3) Assemble each of the following instructions by hand (on paper).
Check your work by entering the code into a source file of the form
.text
.globl main
.type main, @function
main:
pushq %rbp
movq %rsp, %rbp
# Your code sequence goes here.
movl $0, %eax
popq %rbp
ret
instruction, where “64-bit_register” is any of the general purpose registers. What is the
general format of the instruction? Show your answer as a drawing similar to Figure 9.7.
Which ones use a REX prefix? Hint: assemble with the -al option.
9-9 (§9.3) Design an experiment that will allow you to determine what the machine code is for
the
popq 64-bit_register
instruction, where “64-bit_register” is any of the general purpose registers. What is the
general format of the instruction? Show your answer as a drawing similar to Figure 9.7.
Which ones use a REX prefix? Hint: assemble with the -al option.
9-10 (§9.3) Disassemble each of the machine instruction sequences by hand (on paper). (Find
the corresponding assembly language instruction for each machine code instruction.) No-
tice that this is a much more difficult problem, because it is difficult to tell where one
instruction ends and the next one begins. We have placed one machine instruction on each
line to help you. Enter each of your assembly language programs into a source file and use
the assembler to check your work.
a) b0ab c) b83412cdab
b4cd bbabcd1234
41b0ef 41b900000000
41b701 41be7b000000
d) 66b8cdab
b) 40b723 66bbbacd
40b634 66b93412
b256 66ba2143
b678
9.5. EXERCISES 207
e) 88c4 j) 5ab00000000
8808 83c301
88480a 83c100
8a08 81c2ff000000
8a480a
k) 6605cdab
f) 89c3 6681c3bace
6689d8 6681c13412
4889ca 6681c22143
4589c6
l) 6605ab00
g) 04ab
6683c301
80c4cd
6683c100
80c3ef
6681c2ff00
80c701
h) 80c123 m) 00c4
80c534 4100c2
80c256 00ca
80c678 4500c1
i) 053412cdab n) 01c3
81c3abcd1234 6600d8
81c1d4c3b2a1 4801ca
81c2a1b2c3d4 4501c6
Chapter 10
The assembly language we have studied thus far is executed in sequence. In this chapter we
will learn how to organize assembly language instructions to implement the other two required
program flow constructs — repetition and binary decision.
Text string manipulations provide many examples of using program flow constructs, so we
will use them to illustrate many of the concepts. Almost any program displays many text string
messages on the screen, which are simply arrays of characters.
10.1 Repetition
The algorithms we choose when programming interact closely with the data storage structure.
As you probably know, a string of characters is stored in an array. Each element of the array is
of type char, and in C the end of the data is signified with a sentinel value, the NUL character
(see Table 2.3 on page 20).
The other technique for specifying the length of the string is to store the number of characters in the
string together with the string. This is implemented in Pascal by storing the number of characters
in the first byte of the array, and the actual characters are stored immediately following.
Array processing is usually a repetitive task. The processing of a character string is a good
example of repetition. Consider the C program in Listing 10.1.
1 /*
2 * helloWorld1.c
3 * "hello world" program using the write() system call
4 * one character at a time.
5 * Bob Plantz - 12 June 2009
6 */
7 #include <unistd.h>
8
9 int main(void)
10 {
11 char *aString = "Hello World.\n";
12
19 return 0;
20 }
208
10.1. REPETITION 209
4. At the end of the {. . . } block program flow jumps back up to the evaluation of the boolean
expression.
statement. Notice that this variable must be changed inside the {. . . } block. Otherwise, the
boolean expression will always evaluate to true, giving an “infinite” loop.
It is important that you identify the variable that the while construct uses to control program
flow — the Loop Control Variable (LCV). Make sure that the value of the LCV is changed within
the {. . . } block. Note that there may be more than one LCV.
The way that the while construct controls program flow can be seen in the flow chart in Fig-
ure 10.1. This flow chart shows that we need the following assembly language tools to construct
a while loop:
• Instruction(s) to evaluate boolean expressions.
• An instruction that conditionally transfers control (jumps) to another location in the pro-
gram. This is represented by the large diamond, which shows two possible paths.
Initialize Loop
Control Variable
Evaluate false
Boolean
expression
true
Execute Body
of while loop
Next instruction
after while
loop construct
Figure 10.1: Flow chart of a while loop. The large diamond represents a binary decision that
leads to two possible paths, “true” or “false.” Notice the path that leads back to the
top of the while loop after the body has been executed.
Intel®
Syntax cmp destination, source
The cmp operation consists of subtracting the source operand from the destination operand
and setting the condition code bits in the rflags register accordingly. Neither of the operand
values is changed. The subtraction is done internally simply to get the result and set the OF, SF,
ZF, AF, PF, CF condition codes according to the result.
The other instruction is test. The syntax is
Intel®
Syntax test destination, source
The test operation consists of performing a bit-wise and between the two operands and
setting the condition codes in the rflags register accordingly. Neither of the operand values is
changed. The and operation is done internally simply to get the result and set the SF, ZF, and PF
condition codes according to the result. The OF and CF are set to 0, and the AF value is undefined.
10.1. REPETITION 211
jcc label
where cc is a 1 – 4 letter sequence specifying the condition codes, and label is a memory location.
Program flow is transferred to label if cc is true. Otherwise, the instruction immediately follow-
ing the conditional jump is executed. The conditional jump instructions are listed in Table 10.1.
A good way to appreciate the meaning of the cc sequences in this table is to consider a very
common application of a conditional jump:
cmpb %al, %bl
jae somePlace
movb $0x123, %ah
If the value in the bl register is numerically above the value in the al register, or if they are
equal, then program control transfers to the address labeled “somePlace.” Otherwise, program
control continues with the movb instruction.
212 CHAPTER 10. PROGRAM FLOW CONSTRUCTS
The differences between “greater” versus “above”, and “less” versus “below”, are a little sub-
tle. “Above” and “below” refer to a sequence of unsigned numbers. For example, characters
would probably be considered to be unsigned in most applications. “Greater” and “less” refer to
signed values. Integers are commonly considered to be signed.
Table 10.2 lists four conditional jumps that are commonly used when processing unsigned
values. And Table 10.3 lists four commonly used with signed values.
Since most instructions affect the settings of the condition codes in the rflags register, each
must be used immediately after the instruction that determines the conditions that the pro-
grammer intends to cause the jump.
HINT: It is easy to forget how the order of the source and destination controls the conditional jump
in this construct. Here is a place where the debugger can save you time. Simply put a breakpoint at
the conditional jump instruction. When the program stops there, look at the values in the source and
destination. Then use the si debugger command to execute one instruction and see where it goes.
1
The jump instructions bring up another addressing mode — rip-relative.
rip-relative: The target is a memory address determined by adding an offset to the current
address in the rip register.
The offset, which can be positive or negative, is stored immediately following the opcode for
the instruction in two’s complement format. Thus, the offset becomes a part of the instruction,
similar to the immediate data addressing mode. Just like the immediate addressing mode, the
offset is stored in little endian order in memory.
The following steps occur during program execution of a jcc instruction (recall Figure 6.5):
1. The jump instruction, including the offset value, is fetched.
1 In an environment where the instruction pointer is called the “program counter” this would be called “pc-relative.”
10.1. REPETITION 213
2. As always, the rip register is incremented by the number of bytes in the jump instruction,
including the offset value that is stored as part of the jump instruction.
3. If the conditions to cause a jump are true, the offset is added to the rip register.
When a conditional jump instruction is assembled, the assembler computes the number of
bytes from the jump instruction to the specified label. The assembler then subtracts the number
of bytes in the jump instruction from the distance to the label to yield the offset. This computed
offset is stored as part of the jump instruction. Each jump instruction has several forms, de-
pending on the number of bytes that must be used to store the offset. Note that the offset is
stored in two’s complement format to allow for negative jumps.
For example, if the offset will fit into eight bits the opcode for the je instruction is 7416 , and it
is 0f8416 if more than eight bits are required to store the offset (in which case the offset is stored
in as a thirty-two bit value). The machine code is shown in Table 10.4 for four different target
address offsets. Notice that the 32-bit offsets are stored in little endian order in memory.
Table 10.4: Machine code for the je instruction. Four different distances to the jump target
address. Notice that the 32-bit offsets are stored in little endian order.
jmp label
jmp *register
jmp *memory
BE CAREFUL: The unconditional jump uses “*” for indirection, while all other instructions use
“(register).” It might be tempting to use something like “*(%rax).” Although the (. . . ) are not an
error here, they are superfluous. They have essentially the same effect as something like (x) in an
algebraic expression.
The three ways to use an unconditional jump are shown in Listing 10.2.
1 # jumps.s
2 # demonstrates unconditional jumps
3 # Bob Plantz - 12 June 2009
4 # global variable
214 CHAPTER 10. PROGRAM FLOW CONSTRUCTS
5 .data
6 pointer:
7 .quad 0
8 format:
9 .string "The jump pattern is %x.\n"
10 # code
11 .text
12 .globl main
13 .type main, @function
14 main:
15 pushq %rbp # save frame pointer
16 movq %rsp, %rbp # set new frame pointer
17
On lines 22 – 23 an address is loaded into a register, then the jump is made indirectly via the
register to that address.
22 leaq here2, %rax
23 jmp *%rax
Lines 26 – 28 show how an address can be stored in memory, then the memory used indirectly
for the jump.
26 leaq here3, %rax
27 movq %rax, pointer
28 jmp *pointer
Of course, the indirect techniques are not required in this simple example, but they might be
needed for some programs.
Listing 10.3: Displaying a string one character at a time (gcc assembly language). Comments
added.
Notice that after initializing the loop control variable it jumps to the condition test,
12 movq $.LC0, -8(%rbp) # pointer to string
13 jmp .L2 # go to bottom of loop
Let us rearrange the instructions so that this is a true while loop — the condition test is at
the top of the loop. The exit condition has been changed from jne to je for correctness. The
original is on the left, the rearranged on the right:
Both versions have exactly the same number of instructions. However, the unconditional
jump instruction, jmp, is executed every time through the “true” while loop, but is executed only
once in the compiler’s version. Thus, the compiler’s version is more efficient. The savings is
probably insignificant in the vast majority of applications. However, if a loop is nested within
another loop or two, the difference could be important.
We also see another version of the mov instruction on line 22:
22 movzbl (%rax), %eax
This instruction converts the data size from 8-bit to 32-bit, placing zeros in the high-order 24
bits, as it copies the byte from memory to the eax register. The memory address of the copied
byte is in the rax register. (Yes, this instruction writes over the address in the register as it
executes.)
The x86-64 architecture includes instructions for extending the size of a value by adding
more bits to the left. There are two ways to do this:
• Sign extend — copy the sign bit to each of the new high-order bits. For example, when
sign extending an 8-bit value to 16 bits, 85 would become ff85, but 75 would become 0075.
• Zero extend — make each of the new high-order bits zero. When zero extending 85 to
sixteen bits, it becomes 0085.
where s denotes the size of the source operand and d the size of the destination operand. (Use
s meaning number of bits
b byte 8
the s column for d.) w word 16
l longword 32
q quadword 64
It can be used to move an 8-bit value from memory or a register into a 16-, 32-, or 64-bit register;
move a 16-bit value from memory or a register into a 32-bit register; or move a 32-bit value from
memory or a register into a 64-bit register. The “s” causes the rest of the high-order bits in
10.1. REPETITION 217
the destination register to be a copy of the sign bit in the source value. It does not affect the
condition codes in the rflags register.
In the Intel syntax the instruction is movsx. The size of the data is determined by the
operands, so the size characters (b, w, l, or q) are not appended to the instruction, and the
order of the operands is reversed.
Intel®
Syntax movsx destination, source
In some cases the Intel syntax is ambiguous. Intel-syntax assemblers use keywords to specify the data
size in such cases. For example, the nasm assembler uses
movsx destination, BYTE [source]
to move one byte and zero extend, and uses
movsx destination, WORD [source]
to move two bytes and sign extend.
where s denotes the size of the source operand and d the size of the destination operand. (Use
s meaning number of bits
b byte 8
the s column for d.) w word 16
l longword 32
q quadword 64
It can be used to move an 8-bit value from memory or a register into a 16-, 32-, or 64-bit register;
or move a 16-bit value from memory or a register into a 32-bit register. The “z” causes the rest
of the high-order bits in the destination register to be set to zero. It does not affect the condition
codes in the rflags register. Recall that moving a 32-bit value from memory or a register into a
There is no
64-bit register sets the high-order 32 bits to zero, so there is no movzlq instruction. movzlq
In the Intel syntax the instruction is movzx The size of the data is determined by the operands, instruction.
so the size characters (b, w, l, or q) are not appended to the instruction, and the order of the
operands is reversed.
Intel®
Syntax movzx destination, source
There is also a set of instructions that double the size of data in portions of the rax register,
sign extending as they do so. The instructions are:
AT&T syntax Intel® syntax start result
cbtw cbw byte in al word in ax
cwtl cwde word in ax long in eax
cwtd cwd word in ax long in dx:ax
cltd cdq lonq in eax quad in edx:eax
cltq cdqe lonq in eax quad in rax
cqto cqo quad in rax octuple in rdx:rax
where the notation “long in dx:ax” means a 32-bit value with the high-order 16 bits in dx and
the low-order 16 bits in ax. Notice that these instructions do not explicitly specify any operands,
but they change the rax and possibly the rdx registers. They do not affect the condition codes in
the rflags register.
Returning to while loops, the general structure of a count-controlled while loop is shown in
Listing 10.4.
1 # generalWhile.s
2 # general structure of a while loop (not a program)
3 #
218 CHAPTER 10. PROGRAM FLOW CONSTRUCTS
4 # count = 10;
5 # while (count > 0)
6 # {
7 # // loop body
8 # count--;
9 # }
10 #
11 # Bob Plantz - 10 June 2009
12
Loops, of course, take the most execution time in a program. However, in almost all cases code read-
ability is more important than efficiency. You should determine that a loop is an efficiency bottleneck
before sacrificing its structure for efficiency. And then you should generously comment what you have
done.
Our assembly language version of a “Hello world” program in Listing 10.5 uses a sentinel-
controlled while loop.
1 # helloWorld3.s
2 # "hello world" program using the write() system call
3 # one character at a time.
4 # Bob Plantz - 12 June 2009
5
6 # Useful constants
7 .equ STDOUT,1
8 # Stack frame
9 .equ aString,-8
10 .equ localSize,-16
11 # Read only data
12 .section .rodata
13 theString:
14 .string "Hello world.\n"
15 # Code
16 .text
17 .globl main
18 .type main, @function
19 main:
20 pushq %rbp # save base pointer
21 movq %rsp, %rbp # set new base pointer
22 addq $localSize, %rsp # for local var.
23
26 whileLoop:
27 movl aString(%rbp), %esi # current char in string
28 cmpb $0, (%esi) # null character?
29 je allDone # yes, all done
30
Listing 10.5: Displaying a string one character at a time (programmer assembly language).
We had to move the pointer value into a register in order to dereference the pointer. These two
You have to get
instruction implement the C expression: an address (a
pointer) into a
(*aString != ’\0’) register before
you can
In particular, you have to move the address into a register, then dereference it with the “(regis- dereference it.
ter)” syntax.
Be careful not to confuse this with the indirection operator, “*”, used with the jmp instruction that you
saw in Section 10.1.3, especially since the assembly language indirection operator is the same as the
dereference operator in C/C++.
There are two common errors when using the assembly language syntax.
• The assembly language dereference operator does not work on variable names. For exam-
ple, you cannot use
cmpb $0, (ptr(%rbp)) # *** DOES NOT WORK ***
nor
cmpb $0, (\$theString) # *** DOES NOT WORK ***
work to dereference the theString location. Unfortunately, the assembler may not consider
any of these to be syntax errors, just an unnecessary set of parentheses. Therefore, you
probably will not get an assembler error message, just incorrect program behavior.
• Another common error is to forget to dereference the register once you get the address
stored in it:
cmpb $0, %esi # *** DOES NOT WORK ***
220 CHAPTER 10. PROGRAM FLOW CONSTRUCTS
Read the
warning This would compare a byte in the eax register itself with the value zero. Since there
messages when are four bytes in the eax register, this code will generate an assembler warning message
you assemble
and link your because it does not specify which byte.
programs.
BE CAREFUL: The C/C++ syntax for the NUL character, ’\0’, is not recognized by the gnu assembler,
as. From Table 2.3 we see that the bit pattern for the NUL character is 0x00, and this value must be
used in the gnu assembly language.
We also need to add one to the pointer variable so as to move it to the next character in the
string. Adding one is a common operation, so there is an operator that simply adds one,
incs source
The inc instruction adds one to the source operand. The operand can be a register or a memory
location.
On line 34 of the program in Listing 10.5, incl is used to add one to the address stored in
Increment the
memory minus four bytes relative to the frame pointer:
entire 32- or
64-bit address, incl aString(%rbp) # aString++;
not just one byte.
BE CAREFUL: It is easy to think that the instruction ought to be incb since each character is only
one byte. The address in this program is 32 bits, so we have to use incl. And, of course, when we use a
64-bit address, we need to use incq. Don’t forget that the value we are adding one to is an address, not
the value stored at that address.
Subtracting one from a counter is also a common operation. The dec instruction subtracts
one from an operand and sets the rflags register accordingly. The operand can be a register or
a memory location.
decs source
A decl instruction is used on line 27 in Listing 10.6 to both subtract one from the counter
variable and to set the condition codes in the rflags register for the jg instruction.
1 # printStars.s
2 # prints 10 * characters on a line
3 # Bob Plantz - 12 June 2009
4
5 # Useful constants
6 .equ STDOUT,1
7 # Stack frame
8 .equ theChar,-1
9 .equ counter,-16
10 .equ localSize,-16
11 # Code
12 .text
13 .globl main
10.2. BINARY DECISIONS 221
8 #include <unistd.h>
9
10 int main(void)
11 {
12 char *ptr;
13 char response;
14
16
25 if (response == ’y’)
26 {
27 ptr = "Changes saved.\n";
28 while (*ptr != ’\0’)
29 {
30 write(STDOUT_FILENO, ptr, 1);
31 ptr++;
32 }
33 }
34 else
35 {
36 ptr = "Changes discarded.\n";
37 while (*ptr != ’\0’)
38 {
39 write(STDOUT_FILENO, ptr, 1);
40 ptr++;
41 }
42 }
43 return 0;
44 }
Let’s look at the flow of the program that the if-else controls.
1. The boolean expression (response == ’y’) is evaluated.
2. If the evaluation is true, the first block, the one that displays “Changes saved.”, is executed.
3. If the evaluation is false, the second block, the one that displays “Changes discarded.”, is
executed.
4. In both cases the next statement to be executed is the return 0;
The program control flow of the if-else construct is illustrated in Figure 10.2.
10.2. BINARY DECISIONS 223
Next instruction
after if-then
construct
Figure 10.2: Flow chart of if-else construct. The large diamond represents a binary decision
that leads to two possible paths, “true” or “false.” Notice that either the “then” block
or the “else” block is executed, but not both. Each leads to the end of the if-else
construct.
We already know all the assembly language instructions needed to implement the if-else
in Listing 10.7. The important thing to note is that there must be an unconditional jump at the
end of the “then” block to transfer program flow around the “else” block. The assembly language
generated for this program is shown in Listing 10.8.
1 .file "yesNo1.c"
2 .section .rodata
3 .LC0:
4 .string "Save changes? "
5 .LC1:
6 .string "Changes saved.\n"
7 .LC2:
8 .string "Changes discarded.\n"
9 .text
10 .globl main
11 .type main, @function
12 main:
13 pushq %rbp
14 movq %rsp, %rbp
15 subq $16, %rsp
16 movq $.LC0, -16(%rbp)
17 jmp .L2
18 .L3:
19 movq -16(%rbp), %rsi
20 movl $1, %edx
21 movl $1, %edi
22 call write
23 addq $1, -16(%rbp)
24 .L2:
25 movq -16(%rbp), %rax
26 movzbl (%rax), %eax
27 testb %al, %al
224 CHAPTER 10. PROGRAM FLOW CONSTRUCTS
28 jne .L3
29 leaq -1(%rbp), %rsi # place to store user response
30 movl $1, %edx
31 movl $0, %edi
32 call read
33 movzbl -1(%rbp), %eax # get user response
34 cmpb $121, %al # response == ’y’ ?
35 jne .L4 # no, go to else part
36 movq $.LC1, -16(%rbp) # yes, write "Changes saved.\n"
37 jmp .L5
38 .L6:
39 movq -16(%rbp), %rsi
40 movl $1, %edx
41 movl $1, %edi
42 call write
43 addq $1, -16(%rbp)
44 .L5:
45 movq -16(%rbp), %rax
46 movzbl (%rax), %eax
47 testb %al, %al
48 jne .L6
49 jmp .L7 # jump around else part
50 .L4: # else part,
51 movq $.LC2, -16(%rbp) # write "Changes discarded.\n"
52 jmp .L8
53 .L9:
54 movq -16(%rbp), %rsi
55 movl $1, %edx
56 movl $1, %edi
57 call write
58 addq $1, -16(%rbp)
59 .L8:
60 movq -16(%rbp), %rax
61 movzbl (%rax), %eax
62 testb %al, %al
63 jne .L9
64 .L7: # after if-else statement
65 movl $0, %eax
66 leave
67 ret
68 .size main, .-main
69 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
70 .section .note.GNU-stack,"",@progbits
Listing 10.8: Get yes/no response from user (gcc assembly language).
11 # }
12 #
13 # Bob Plantz - 10 June 2009
14
Listing 10.9: General structure of an if-else construct. Don’t forget the “jmp” at the end of the
“then” block (line 20).
This is not a complete program or even a function. It simply shows the key elements of an
if-else construct.
Our assembly language version of the yes/no program in Listing 10.10 follows this general
pattern. It, of course, uses more meaningful labels than what the compiler generated.
1 # yesNo2.s
2 # Prompts user to enter a y/n response.
3 # Bob Plantz - 12 June 2009
4
5 # Useful constants
6 .equ STDIN,0
7 .equ STDOUT,1
8 # Stack frame
9 .equ response,-1
10 .equ ptr,-16
11 .equ localSize,-16
12 # Read only data
13 .section .rodata
14 queryMsg:
15 .string "Save changes? "
16 saveMsg:
17 .string "Changes saved.\n"
18 discardMsg:
19 .string "Changes discarded.\n"
20 # Code
21 .text
22 .globl main
23 .type main, @function
24 main:
25 pushq %rbp # save base pointer
26 movq %rsp, %rbp # establish our base pointer
27 addq $localSize, %rsp # for local vars.
28 pushq %rbx # save for caller
29
44 getResp:
45 movl $1, %edx # read one byte
46 leaq response(%rbp), %rsi # into this location
47 movl $STDIN, %edi # from keyboard
48 call read
49 # if (response == ’y’)
50 cmpb $’y’, response(%rbp) # was it ’y’?
51 jne noChange # no, there is no change
52
68 saveEnd:
69 jmp allDone # go to end of if-else
70
87 allDone:
88 movl $0, %eax # return 0;
89 popq %rbx # restore reg.
10.2. BINARY DECISIONS 227
Listing 10.10: Get yes/no response from user (programmer assembly language).
jumps to the end of the “then” block of the if-else statement, which then jumps to the end of
the entire if-else statement:
68 saveEnd:
69 jmp allDone # go to end of if-else
on line 59. But this very slight efficiency gain comes at the expense of good software engineering.
In general, there could be more processing to do after the while loop in the “then” block of the
if-else statement. The real danger here is that additional processing will be added during the
program’s maintenance phase and the programmer will forget to change the structure. Good,
easy to read structure is almost always better than execution efficiency.
Another common programming problem is to check to see if a variable is within a certain
range. This requires a compound boolean expression, as shown in the C program in Listing
10.11.
1 /*
2 * range1.c
3 * Checks to see if a character entered by user is a numeral.
4 * Bob Plantz - 12 June 2009
5 */
6
7 #include <unistd.h>
8
9 int main()
10 {
11 char response; // For user’s response
12 char* ptr; // For text messages
13
33 {
34 ptr = "You entered some other character.\n";
35 while (*ptr != ’\0’)
36 {
37 write(STDOUT_FILENO, ptr, 1);
38 ptr++;
39 }
40 }
41 return 0;
42 }
42 .L6:
43 movq -16(%rbp), %rsi
44 movl $1, %edx
45 movl $1, %edi
46 call write
47 addq $1, -16(%rbp)
48 .L5:
49 movq -16(%rbp), %rax
50 movzbl (%rax), %eax
51 testb %al, %al
52 jne .L6
53 jmp .L7 # skip over "else" part
54 .L4: # "else" part
55 movq $.LC2, -16(%rbp)
56 jmp .L8
57 .L9:
58 movq -16(%rbp), %rsi
59 movl $1, %edx
60 movl $1, %edi
61 call write
62 addq $1, -16(%rbp)
63 .L8:
64 movq -16(%rbp), %rax
65 movzbl (%rax), %eax
66 testb %al, %al
67 jne .L9
68 .L7: # end of if-else construct
69 movl $0, %eax
70 leave
71 ret
72 .size main, .-main
73 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
74 .section .note.GNU-stack,"",@progbits
Listing 10.12: Compound boolean expression in an if-else construct (gcc assembly language).
In particular, notice that the decision regarding whether the character entered by the user is a
numeral or not is made on the lines:
34 movzbl -9(%rbp), %eax # load numeral character
35 cmpb $57, %al # is numeral > ’9’?
36 jg .L5 # yes, go to else part
37 movzbl -9(%rbp), %eax # load numeral character
38 cmpb $47, %al # is numeral <= ’/’?
39 jle .L5 # yes, go to else part
40 movq $.LC1, -8(%rbp) # "then" part
Consulting Table 2.3 on page 20 we see that the program first compares the character entered
by the user with the ascii code for the numeral “9” (5710 = 3916 ). If the character is numerically
greater, the program jumps to .L5, which is the beginning of the “else” part. Then the character
is compared to the ASCII code for the character “/”, which is numerically one less that the ascii
code for the numeral “0” (4810 = 3016 ). If the character is numerically equal to or less than, the
program also jumps to .L5.
If neither of these conditions causes a jump to the “else” part, the program simply continues
on to execute the “then” part. At the end of the “then” part, the program skips over the “else”
part to the end of the program:
53 jmp .L11 # skip over "else" part
54 .L5: # "else" part
230 CHAPTER 10. PROGRAM FLOW CONSTRUCTS
we see that the test for ’0’ is never made if (response <= ’9’) is false.
This is called short-circuit evaluation in C/C++. When connecting boolean tests with the &&
and || operators, each the boolean tests is each performed. If the overall result of the expression
— true or false — is known before all the tests are made, the remaining tests are not executed.
This is one of the most important reasons for not writing boolean expressions that include side
effects; the operation that produces a needed side effect may never get executed.
This code segment assigns an address to the ptr variable. If the condition, response == ’y’, is
true, then the address in the ptr variable is written over with another address. This could be
written in assembly language (see Listing 10.10) as:
movl $discardMsg, %esi
# if (response == ’y’)
cmpb $’y’, response(%rbp) # was it ’y’?
jne noChange # no, there is no change
movl $saveMsg, %esi # yes, get other message
noChange:
movl %esi, ptr(%rbp) # point to message
msgLoop:
movl ptr(%rbp), %esi # current char in string
cmpb $0, (%esi) # null character?
je allDone # yes, leave while loop
The x86-64 architecture provides a conditional move instruction, cmovcc, for simple if constructs
like this. The general format is
where cc is a 1 – 4 letter sequence specifying the settings of the condition codes. Similar to the
conditional jump instructions, the conditional data move takes place if the status flag settings
are true, and does not if they are false.
Possible letter sequences are the same as for the conditional jump instructions listed in Table
10.1 on page 211. The source operand can be either a register or a memory location, and the
destination must be a register. Unlike other data movement instructions, the cmovcc instruction
does not use the operand size suffix; the size is implicitly specified by the size of the destination
register.
The conditional move instruction would allow the above assembly language to be written
with a cmove instruction, where the “e” means “equal” (see Table 10.1).
Although this actually increases the average number of instructions executed, it allows the CPU
to make more efficient use of the pipeline. So a conditional move may provide faster program
execution by eliminating possible pipeline inefficiencies caused by a conditional jump. See for
example [28], [31], and [34].
10.3.1 Instructions
data movement:
opcode source destination action see page:
cmovcc %reg/mem %reg conditional move 230
movs $imm/%reg %reg/mem move 141
movsss $imm/%reg %reg/mem move, sign extend 216
movzss $imm/%reg %reg/mem move, zero extend 217
popw %reg/mem pop from stack 163
pushw $imm/%reg/mem push onto stack 163
s = b, w, l, q; w = l, q; cc = condition codes
232 CHAPTER 10. PROGRAM FLOW CONSTRUCTS
arithmetic/logic:
opcode source destination action see page:
adds $imm/%reg %reg/mem add 189
adds mem %reg add 189
cmps $imm/%reg %reg/mem compare 209
cmps mem %reg compare 209
decs %reg/mem decrement 220
incs %reg/mem increment 220
leaw mem %reg load effective address 167
subs $imm/%reg %reg/mem subtract 190
subs mem %reg subtract 190
tests $imm/%reg %reg/mem test bits 210
tests mem %reg test bits 210
s = b, w, l, q; w = l, q
10.4 Exercises
10-1 (§10.1) Verify on paper that the machine instructions in Table 10.4 actually cause a jump
of the number of bytes shown (in decimal) when the jump is taken.
10-2 (§10.1) Enter the program in Listing 10.2 and verify that the jump to here1 uses the rip-
relative addressing mode, and the other two jumps use the direct address. Hint: Produce
a listing file for the program and use gdb to examine register and memory contents.
10-3 (§10.1) Enter the program in Listing 10.5, changing the while loop to use eax as a pointer:
movl $theString, %eax
whileLoop:
cmpb $0, (%eax) # null character?
je allDone # yes, all done
This would seem to be more efficient than reading the pointer from memory each time
through the loop. Use gdb to debug the program. Set a break point at the call instruction
and another break point at the incl instruction. Inspect the registers each time the pro-
gram breaks into gdb. What is happening to the value in eax? Hint: Read what the “man
2 write” shell command has to say about the write system call function. This exercise
points out the necessity of understanding what happens to registers when calling another
function. In general, it is safer to use local variables in the stack frame.
10-4 (§10.1) Assume that you do not know how many numerals there are, only that the first
one is ’0’ and the last one is ’9’ (the character “0” and character “9”). Write a program
in assembly language that displays all the numerals, 0 – 9, on the screen, one character at
a time. Use only one byte in the .data segment for storing a character; do not allocate a
separate byte for each numeral.
234 CHAPTER 10. PROGRAM FLOW CONSTRUCTS
10-5 (§10.1) Assume that you do not know how many upper case letters there are, only that
the first one is ’A’ and the last one is ’Z’. Write a program in assembly language that
displays all the upper case letters, A – Z, on the screen, one character at a time. Use only
one byte in the .data segment for storing a character; do not allocate a separate byte for
each numeral.
10-6 (§10.1) Assume that you do not know how many lower case letters there are, only that
the first one is ’a’ and the last one is ’z’. Write a program in assembly language that
displays all the lower case letters, a – z, on the screen, one character at a time. Use only
one byte in the .data segment for storing a character; do not allocate a separate byte for
each numeral.
10-7 (§10.1) Enter the following C program and use the “-S” option to generate the assembly
language:
1 /*
2 * forLoop.c
3 * For loop multiplication.
4 *
5 * Bob Plantz - 21 June 2009
6 */
7
8 #include<stdio.h>
9
10 int main ()
11 {
12 int x, y, z;
13 int i;
14
Identify the loop that performs the actual multiplication. Write an equivalent C program
that uses a while loop instead of the for loop, and also generate the assembly language for
it. Do the loops differ? If so, how?
10-8 (§10.2) Enter the C program in Listing 10.7 and get it to work. Do you see any odd behavior
when the program terminates? Can you fix it? Hint: When the program prompts the user,
how many keys did you press? What was the second key press?
10-9 (§10.2) Enter the program in Listing 10.10 and get it to work.
10-10 (§10.2) Write a program in assembly language that displays all the printable characters
that are neither numerals nor letters on the screen, one character at a time. Don’t forget
that the space character, ’ ’, is printable. Do not display the DEL character. Use only one
byte for storing a character; do not allocate a separate byte for each character.
Use only one while loop in this program. You will need an if-else construct with a com-
pound boolean conditional statement.
10.4. EXERCISES 235
Good software engineering practice generally includes breaking problems down into functionally
distinct subproblems. This leads to software solutions with many functions, each of which solves
a subproblem. This “divide and conquer” approach has some distinct advantages:
• Several people can be working on different parts of the overall problems simultaneously.
The main disadvantage of breaking a problem down like this is coordinating the many sub-
solutions so that they work together correctly to provide a correct overall solution. In software,
this translates to making sure that the interface between a calling function and a called func-
tion works correctly. In order to ensure correct operation of the interface, it must be specified in
a very explicit way.
In Chapter 8 you learned how to pass arguments into a function and call it. In this chapter
you will learn how to use these arguments inside the called function.
1. Input. The data comes from another part of the program and is used by the function, but
is not modified by it.
2. Output. The function provides new data to another part of the program.
3. Update. The function modifies a data item that is held by another part of the program.
The new value is based on the value before the function was called.
All three interactions can be performed if the called function also knows the location of the
data item. This can be done by the calling function passing the address to the called function or
by making the address globally known to both functions. Updates require that the address be
known by the called function.
Outputs can also be implemented by placing the new data item in a location that is accessible
to both the called and the calling function. In C/C++ this is done by placing the return value
from a function in the eax register. And inputs can be implemented by passing a copy of the data
item to the called function. In both of these cases the called function does not know the location
of the original data item, and thus does not have access to it.
In addition to global data, C syntax allows three ways for functions to exchange data:
236
11.1. OVERVIEW OF PASSING ARGUMENTS 237
• Pass by value — an input value is passed by making a copy of it available to the function.
• Return value — an output value can be returned to the calling function.
• Pass by pointer — an output value can be stored for the calling function by passing the
address where the output value should be stored to the called function. This can also be
used to update a data item.
The last method, pass by pointer, can also be used to pass large inputs, or to pass inputs that
should be changed — also called updates. It is also the method by which C++ implements pass
by reference.
When one function calls another, the information that is required to provide the interface
between the two is called an activation record. Since both the registers and the call stack are
common to all the functions within a program, both the calling function and the called function
have access to them. So arguments can be passed either in registers or on the call stack. Of
course, the called function must know exactly where each of the arguments is located when
program flow transfers to it.
In principle, the locations of arguments need only be consistent within a program. As long
as all the programmers working on the program observe the same rules, everything should
work. However, designing a good set of rules for any real-world project is a very time-consuming
process. Fortunately, the ABI [25] for the x86-64 architecture specifies a good set of rules. They
rules are very tedious because they are meant to cover all possible situations. In this book we
will consider only the simpler rules in order to get an overall picture of how this works.
In 64-bit mode six of the general purpose registers and a portion of the call stack are used
for the activation record. The area of the stack used for the activation record is called a stack
frame. Within any function, the stack frame contains the following information:
• Arguments (in excess of six) passed from the calling function.
• The return address back to the calling function.
• The calling function’s frame pointer.
• Local variables for the current function.
and often includes:
• Copies of arguments passed in registers.
• Copies of values in the registers that must be preserved by a function — rbx, r12 – r15.
Some general memory usage rules (64-bit mode) are:
• Each argument is passed within an 8-byte unit. For example, passing three char values
requires three registers. This 8-byte rule also applies to arguments passed on the stack.
• Local variables can be allocated to take up only the amount of memory they require. For
example, three char values can be accommodated in a three-byte memory area.
• The address in the frame pointer (rbp register) must always be a multiple of sixteen. It
should never be changed within a function, except during the prologue and epilogue.
• The address in the stack pointer (rsp register) must always be a multiple of sixteen before
transferring program flow to another function.
We can see how this works by studying the program in Listing 11.1.
1 /*
2 * addProg.c
3 * Adds two integers
4 * Bob Plantz - 13 June 2009
5 */
6
238 CHAPTER 11. WRITING YOUR OWN FUNCTIONS
7 #include <stdio.h>
8 #include "sumInts1.h"
9
10 int main(void)
11 {
12 int x, y, z;
13 int overflow;
14
22 return 0;
23 }
1 /*
2 * sumInts1.h
3 * Returns N + (N-1) + ... + 1
4 * Bob Plantz - 4 Junee 2008
5 */
6
7 #ifndef SUMINTS1_H
8 #define SUMINTS1_H
9 int sumInts(int, int, int *);
10 #endif
1 /*
2 * sumInts1.c
3 * Adds two integers and outputs their sum.
4 * Returns 0 if no overflow, else returns 1.
5 * Bob Plantz - 13 June 2009
6 */
7
8 #include "sumInts1.h"
9
14 *sum = a + b;
15
Listing 11.1: Passing arguments to a function (C). (There are three files here.)
11.1. OVERVIEW OF PASSING ARGUMENTS 239
The compiler-generated assembly language for the sumInts function is shown in Listing 11.2
with comments added.
1 .file "sumInts1.c"
2 .text
3 .globl sumInts
4 .type sumInts, @function
5 sumInts:
6 pushq %rbp
7 movq %rsp, %rbp
8 movl %edi, -20(%rbp) # save a
9 movl %esi, -24(%rbp) # save b
10 movq %rdx, -32(%rbp) # save pointer to sum
11 movl $0, -4(%rbp) # overflow = 0;
12 movl -24(%rbp), %edx # load b
13 movl -20(%rbp), %eax # load a
14 leal (%rax,%rdx), %edx # b += a
15 movq -32(%rbp), %rax # load address of sum
16 movl %edx, (%rax) # *sum = b
17 cmpl $0, -20(%rbp)
18 jle .L2
19 cmpl $0, -24(%rbp)
20 jle .L2
21 movq -32(%rbp), %rax
22 movl (%rax), %eax
23 testl %eax, %eax
24 js .L3
25 .L2:
26 cmpl $0, -20(%rbp)
27 jns .L4
28 cmpl $0, -24(%rbp)
29 jns .L4
30 movq -32(%rbp), %rax
31 movl (%rax), %eax
32 testl %eax, %eax
33 jle .L4
34 .L3:
35 movl $1, -4(%rbp)
36 .L4:
37 movl -4(%rbp), %eax # return overflow;
38 leave
39 ret
40 .size sumInts, .-sumInts
41 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
42 .section .note.GNU-stack,"",@progbits
Listing 11.2: Accessing arguments in the sumInts function from Listing 11.1 (gcc assembly lan-
guage).
240 CHAPTER 11. WRITING YOUR OWN FUNCTIONS
As we go through this description, it is very easy to confuse the frame pointer (rbp register)
and the stack pointer (rsp register). They each are used to access different areas of the
stack.
• The frame pointer (rbp register) remains unchanged. It is used to access the area
of the stack that belongs to the current function, including local variables and argu-
ments passed into the current function.
• The stack pointer (rsp register) can be changed. It is used to create a new stack frame
for a function about to be called, including storing the return address and passing
arguments beyond the first six.
After saving the caller’s frame pointer and establishing its own frame pointer, this function
stores the argument values in the local variable area:
5 sumInts:
6 pushq %rbp
7 movq %rsp, %rbp
8 movl %edi, -20(%rbp) # save a
9 movl %esi, -24(%rbp) # save b
10 movq %rdx, -32(%rbp) # save pointer to sum
11 movl $0, -4(%rbp) # overflow = 0;
The arguments are in the following registers (see Table 8.2, page 157):
• a is in edi.
• b is in esi.
• The pointer to sum is in rdx.
Storing them in the local variable area frees up the registers so they can be used in this function.
Although this is not very efficient, the compiler does not need to work very hard to optimize
register usage within the function. The only local variable, overflow, is initialized on line 11.
The observant reader will note that no memory has been allocated on the stack for local
variables or saving the arguments. The ABI [25] defines the 128 bytes beyond the stack pointer
— that is, the 128 bytes at addresses lower than the one in the rsp register — as a red zone.
The operating system is not allowed to use this area, so the function can use it for temporary
storage of values that do not need to be saved when another function is called. In particular, leaf
functions can store local variables in this area without moving the stack pointer because they
do not call other functions.
Notice that both the argument save area and the local variable area are aligned on 16-byte
address boundaries. Figure 11.1 provides a pictorial view of where the three arguments and the
local variable are in the red zone.
11.1. OVERVIEW OF PASSING ARGUMENTS 241
(rbp)-128
Argument Save
Area
sum = (rbp)-32 address
Red Zone
b = (rbp)-24 value
a = (rbp)-20 value
(rbp)-16 ?
(rbp)-12 ?
Local Variable Area
rsp (rbp)-8 ?
overflow = (rbp)-4 ?
Caller’s rbp
rbp
(rbp)+8 Return Address
Figure 11.1: Arguments and local variables in the stack frame, sumInts function. The two input
values and the address for the output are passed in registers, then stored in the
Argument Save Area by the called function. Since this is a leaf function, the Red
Zone is used for this function’s stack frame.
As you know, some functions take a variable number of arguments. In these functions, the
ABI [25] specifies the relative offsets of the register save area. The offsets are shown in Table
11.1.
Register Offset
rdi 0
rsi 8
rdx 16
rcx 24
r8 32
r9 40
xmm0 48
xmm1 64
... ...
xmm15 288
Table 11.1: Argument register save area in stack frame. These relative offsets should be used
in functions with a variable number of arguments.
One of the problems with the C version of sumInts is that it requires a separate check for
overflow:
16 sumInts:
17 if (((a > 0) && (b > 0) && (*sum < 0)) ||
18 ((a < 0) && (b < 0) && (*sum > 0)))
19 {
20 overflow = 1;
21 }
Writing the function in assembly language allows us to directly check the overflow flag, as shown
in Listing 11.3.
1 # sumInts.s
2 # Adds two 32-bit integers. Returns 0 if no overflow
3 # else returns 1
242 CHAPTER 11. WRITING YOUR OWN FUNCTIONS
Listing 11.3: Accessing arguments in the sumInts function from Listing 11.1 (programmer as-
sembly language)
The code to perform the addition and overflow check is much simpler.
17 movl $0, %eax # assume no overflow
18 addl %edi, %esi # add values
19 cmovo overflow, %eax # overflow occurred
20 movl %esi, (%rdx) # output sum
The body of the function begins by assuming there will not be overflow, so 0 is stored in eax,
ready to be the return value. The value of the first argument is added to the second, because
the programmer realizes that the values in the argument registers do not need to be saved. If
this addition produces overflow, the cmovo instruction changes the return value to 1. Finally, in
either case the sum is stored at the memory location whose address was passed to the function
as the third argument.
9 int main(void)
10 {
11 int total;
12 int a = 1;
13 int b = 2;
14 int c = 3;
15 int d = 4;
16 int e = 5;
17 int f = 6;
18 int g = 7;
19 int h = 8;
20 int i = 9;
21
1 /*
2 * sumNine1.h
3 * Computes sum of nine integers.
4 * Bob Plantz - 13 June 2009
5 */
6 #ifndef SUMNINE_H
7 #define SUMNINE_H
8 int sumNine(int one, int two, int three, int four, int five,
9 int six, int seven, int eight, int nine);
10 #endif
1 /*
2 * sumNine1.c
3 * Computes sum of nine integers.
4 * Bob Plantz - 13 June 2009
5 */
6 #include <stdio.h>
7 #include "sumNine1.h"
8
9 int sumNine(int one, int two, int three, int four, int five,
10 int six, int seven, int eight, int nine)
11 {
12 int x;
13
Listing 11.4: Passing more than six arguments to a function (C). (There are three files here.)
The assembly language generated by gcc from the program in Listing 11.4 is shown in List-
ing 11.5, with comments added to explain parts of the code.
244 CHAPTER 11. WRITING YOUR OWN FUNCTIONS
1 .file "nineInts1.c"
2 .section .rodata
3 .LC0:
4 .string "The sum is %i\n"
5 .text
6 .globl main
7 .type main, @function
8 main:
9 pushq %rbp
10 movq %rsp, %rbp
11 subq $80, %rsp
12 movl $1, -8(%rbp)
13 movl $2, -12(%rbp)
14 movl $3, -16(%rbp)
15 movl $4, -20(%rbp)
16 movl $5, -24(%rbp)
17 movl $6, -28(%rbp)
18 movl $7, -32(%rbp)
19 movl $8, -36(%rbp)
20 movl $9, -40(%rbp)
21 movl -28(%rbp), %edx # load f into temp. reg.
22 movl -24(%rbp), %ecx # load e into temp. reg.
23 movl -20(%rbp), %esi # load d into temp. reg.
24 movl -16(%rbp), %edi # load c into temp. reg.
25 movl -12(%rbp), %r10d # load b into temp. reg.
26 movl -8(%rbp), %r11d # load a into temp. reg.
27 movl -40(%rbp), %eax # load i into temp. reg.
28 movl %eax, 16(%rsp) # put on stack, 9th arg.
29 movl -36(%rbp), %eax # load h into temp. reg.
30 movl %eax, 8(%rsp) # put on stack, 8th arg.
31 movl -32(%rbp), %eax # load g into temp. reg.
32 movl %eax, (%rsp) # put on stack, 7th arg.
33 movl %edx, %r9d # 6th arg. from temp. reg.
34 movl %ecx, %r8d # 5th arg. from temp. reg.
35 movl %esi, %ecx # 4th arg. from temp. reg.
36 movl %edi, %edx # 3rd arg. from temp. reg.
37 movl %r10d, %esi # 2nd arg. from temp. reg.
38 movl %r11d, %edi # 1st arg. from temp. reg.
39 call sumNine
40 movl %eax, -4(%rbp)
41 movl -4(%rbp), %esi
42 movl $.LC0, %edi
43 movl $0, %eax
44 call printf
45 movl $0, %eax
46 leave
47 ret
48 .size main, .-main
49 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
50 .section .note.GNU-stack,"",@progbits
1 .file "sumNine1.c"
2 .section .rodata
3 .LC0:
4 .string "sumNine done."
5 .text
11.2. MORE THAN SIX ARGUMENTS, 64-BIT MODE 245
6 .globl sumNine
7 .type sumNine, @function
8 sumNine:
9 pushq %rbp
10 movq %rsp, %rbp
11 subq $48, %rsp
12 movl %edi, -20(%rbp) # save one
13 movl %esi, -24(%rbp) # save two
14 movl %edx, -28(%rbp) # save three
15 movl %ecx, -32(%rbp) # save four
16 movl %r8d, -36(%rbp) # save five
17 movl %r9d, -40(%rbp) # save six
18 movl -24(%rbp), %edx # load two
19 movl -20(%rbp), %eax # load one, subtotal
20 addl %edx, %eax # add two
21 addl -28(%rbp), %eax # add three
22 addl -32(%rbp), %eax # add four
23 addl -36(%rbp), %eax # add five
24 addl -40(%rbp), %eax # add six
25 addl 16(%rbp), %eax # add seven
26 addl 24(%rbp), %eax # add eight
27 addl 32(%rbp), %eax # add nine
28 movl %eax, -4(%rbp) # x <- total
29 movl $.LC0, %edi
30 call puts
31 movl -4(%rbp), %eax
32 leave
33 ret
34 .size sumNine, .-sumNine
35 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
36 .section .note.GNU-stack,"",@progbits
Listing 11.5: Passing more than six arguments to a function (gcc assembly language). (There
are two files here.)
Before main calls sumNine the values of the seventh, eighth, and ninth arguments, g – i, are
moved to their appropriate locations on the call stack. Enough space was allocated at the begin-
ning of the function to allow for these arguments. They are moved into their correct locations
on lines 27 – 32:
27 movl -40(%rbp), %eax # load i into temp. reg.
28 movl %eax, 16(%rsp) # put on stack, 9th arg.
29 movl -36(%rbp), %eax # load h into temp. reg.
30 movl %eax, 8(%rsp) # put on stack, 8th arg.
31 movl -32(%rbp), %eax # load g into temp. reg.
32 movl %eax, (%rsp) # put on stack, 7th arg.
The stack pointer, rsp, is used as the reference point for storing the arguments on the stack
here because the main function is starting a new stack frame for the function it is about to call,
sumNine. Then the first six arguments, a – f, are moved to the appropriate registers:
33 movl %edx, %r9d # 6th arg. from temp. reg.
34 movl %ecx, %r8d # 5th arg. from temp. reg.
35 movl %esi, %ecx # 4th arg. from temp. reg.
36 movl %edi, %edx # 3rd arg. from temp. reg.
37 movl %r10d, %esi # 2nd arg. from temp. reg.
38 movl %r11d, %edi # 1st arg. from temp. reg.
When program control is transferred to the sumNine function, the partial stack frame appears
as shown in Figure 11.2. Even though each argument is only four bytes (int), each is passed
246 CHAPTER 11. WRITING YOUR OWN FUNCTIONS
in an 8-byte portion of stack memory. Compare this with passing arguments in registers; only
one data item is passed per register even if the data item does not take up the entire eight
bytes in the register. The return address is at the top of the stack, immediately followed by the
rsp
????
Return Address
seven = (rsp)+8 7
Stack
eight = (rsp)+16 8
Arguments
nine = (rsp)+24 9
Figure 11.2: Arguments 7 – 9 are passed on the stack to the sumNine function. State of the stack
when control is first transfered to this function.
three arguments (beyond the six passed in registers). Notice that each argument is in the same
position on the stack as it would have been if it had been pushed onto the stack just before the
call instruction. Since the address in the stack pointer (rsp) was 16-byte aligned before the call
to this function, and the call instruction pushed the 8-byte return address onto the stack, the
address in rsp is now 8-byte aligned.
The prologue of sumNine completes the stack frame. Then the function saves the register
arguments in the register save area of the stack frame:
9 pushq %rbp
10 movq %rsp, %rbp
11 subq $48, %rsp
12 movl %edi, -20(%rbp) # save one
13 movl %esi, -24(%rbp) # save two
14 movl %edx, -28(%rbp) # save three
15 movl %ecx, -32(%rbp) # save four
16 movl %r8d, -36(%rbp) # save five
17 movl %r9d, -40(%rbp) # save six
The state of the stack frame at this point is shown in Figure 11.3.
You may question why the compiler did not simply use the red zone. The sumNine function is
not a leaf function. It calls another function, which may require use of the call stack. So space
must be explicitly allocated on the call stack for local variables and the register argument save
areas.
By the way, the compiler has replaced this function call, a call to printf, with a call to puts:
28 movl $.LC0, %edi
29 call puts
Since the only thing to be written to the screen is a text string, the puts function is equivalent.
After the register arguments are safely stored in the argument save area, they can be easily
summed and the total saved in the local variable:
18 movl -24(%rbp), %edx # load two
19 movl -20(%rbp), %eax # load one, subtotal
20 addl %edx, %eax # add two
21 addl -28(%rbp), %eax # add three
22 addl -32(%rbp), %eax # add four
23 addl -36(%rbp), %eax # add five
24 addl -40(%rbp), %eax # add six
25 addl 16(%rbp), %eax # add seven
26 addl 24(%rbp), %eax # add eight
27 addl 32(%rbp), %eax # add nine
28 movl %eax, -4(%rbp) # x <- total
11.2. MORE THAN SIX ARGUMENTS, 64-BIT MODE 247
rsp
(rbp)-48
(rbp)-44
six = (rbp)-40 6
five = (rbp)-36 5
Argument Save
four = (rbp)-32 4
Area
three = (rbp)-28 3
two = (rbp)-24 2
one = (rbp)-20 1
(rbp)-16
(rbp)-12
Local Variable
rbp (rbp)-8
Area
x = (rbp)-4
Caller’s rbp
(rbp)+8 Return Address
seven = (rbp)+16 7
Stack
eight = (rbp)+24 8
Arguments
nine = (rbp)+32 9
Figure 11.3: Arguments and local variables in the stack frame, sumNine function. The first six
arguments are passed in registers but saved in the stack frame. Arguments be-
yond six are passed in the portion of the stack frame that is created by the calling
function.
Notice that the seventh, eighth, and ninth arguments are accessed by positive offsets from the
frame pointer, rbp. The were stored in the stack frame by the calling function. The called
function “owns” the entire stack frame so it does not need to make additional copies of these
arguments.
It is important to realize that once the stack frame has been completed within a function,
that area of the call stack cannot be treated as a stack. That is, it cannot be accessed through
pushes and pops. It must be treated as a record. (You will learn more about records in Section
13.2, page 296.)
If we were to recompile these functions with higher levels of optimization, many of these
assembly language operations would be removed (see Exercise 11-2). But the point here is to
examine the mechanisms that can be used to work with arguments and to write easily read
code, so we study the unoptimized code.
A version of this program written in assembly language is shown in Listing 11.6.
1 # nineInts2.s
2 # Demonstrate how integral arguments are passed in 64-bit mode.
3 # Bob Plantz - 13 June 2009
4
5 # Stack frame
6 # passing arguments on stack (rsp)
7 # need 3x8 = 24 -> 32 bytes
8 .equ seventh,0
9 .equ eighth,8
10 .equ ninth,16
11 # local vars (rbp)
12 # need 10x4 = 40 -> 48 bytes
13 .equ i,-4
248 CHAPTER 11. WRITING YOUR OWN FUNCTIONS
14 .equ h,-8
15 .equ g,-12
16 .equ f,-16
17 .equ e,-20
18 .equ d,-24
19 .equ c,-28
20 .equ b,-32
21 .equ a,-36
22 .equ total,-40
23 .equ localSize,-80
24 # Read only data
25 .section .rodata
26 format:
27 .string "The sum is %i\n"
28 # Code
29 .text
30 .globl main
31 .type main, @function
32 main:
33 pushq %rbp # save caller’s base pointer
34 movq %rsp, %rbp # establish ours
35 addq $localSize, %rsp # space for local variables
36 # + argument passing
37 movl $1, a(%rbp) # initialize local variables
38 movl $2, b(%rbp) # etc...
39 movl $3, c(%rbp)
40 movl $4, d(%rbp)
41 movl $5, e(%rbp)
42 movl $6, f(%rbp)
43 movl $7, g(%rbp)
44 movl $8, h(%rbp)
45 movl $9, i(%rbp)
46
1 # sumNine2.s
2 # Sums nine integer arguments and returns the total.
3 # Bob Plantz - 13 June 2009
4
5 # Stack frame
6 # arguments already in stack frame
7 .equ seven,16
8 .equ eight,24
9 .equ nine,32
10 # local variables
11 .equ total,-4
12 .equ localSize,-16
13 # Read only data
14 .section .rodata
15 doneMsg:
16 .string "sumNine done"
17 # Code
18 .text
19 .globl sumNine
20 .type sumNine, @function
21 sumNine:
22 pushq %rbp # save caller’s base pointer
23 movq %rsp, %rbp # set our base pointer
24 addq $localSize, %rsp # for local variables
25
Listing 11.6: Passing more than six arguments to a function (programmer assembly language).
(There are two files here.)
The assembly language programmer realizes that all nine integers can be summed in the sumNine
function before it calls another function. In addition, none of the values will be needed after this
250 CHAPTER 11. WRITING YOUR OWN FUNCTIONS
However, the edi register will be needed for passing an argument to puts, so the total is
saved in a local variable in the stack frame:
34 movl %edi, total(%rbp) # save total
The overall pattern of a stack frame is shown in Figure 11.4. The rbp register serves as the
frame pointer to the stack frame. Once the frame pointer address has been established in a
function, its value must never be changed. The return address is always located +8 bytes offset
from the frame pointer. Arguments to the function are positive offsets from the frame pointer,
Do not change
the rbp register and local variables are negative offsets from the frame pointer.
once it has been
set up in a Memory Available
function. For Use As A
Stack By
rsp This Function
Local Variables
And Saved
Register Contents
rbp (rbp)-8
Caller’s rbp
(rbp)+8 Return Address
Arguments
Passed In
Stack Frame
It is essential that you follow the register usage and argument passing disciplines precisely.
Any deviation can cause errors that are very difficult to debug.
1. In the calling function:
(a) Assume that the values in the rax, rcx, rdx, rsi, rdi and r8 – r11 registers will be
changed by the called function.
(b) The first six arguments are passed in the rdi, rsi, rdx, rcx, r8, and r9 registers in
left-to-right order.
(c) Arguments beyond six are stored on the stack as though they had been pushed onto
the stack in right-to-left order.
(d) Use the call instruction to invoke the function you wish to call.
11.3. INTERFACE BETWEEN FUNCTIONS, 32-BIT MODE 251
(d) Local variables are accessed by negative offsets from the frame pointer, rbp.
4. When leaving the called function:
(a) Place the return value, if any, in eax.
(b) Restore the the values in the rbx, rbp, rsp, and r12 – r15 registers from the register
save area in the stack frame.
(c) Delete the local variable space and register save area by copying rbp to rsp.
(d) Restore the caller’s frame pointer by popping rbp off the stack save area.
(e) Return to calling function with ret.
The best way to design a stack frame for a function is to make a drawing on paper following
the pattern in Figure 11.3. Show all the local variables and arguments to the function. To be
safe, assume that all the register-passed arguments will be saved in the function. Compute
and write down all the offset values on your drawing. When writing the source code for your
function, use the .equ directive to give meaningful names to each of the numerical offsets. If you
do this planning before writing the executable code, you can simply use the name(%rbp) syntax
to access the value stored at name.
1 .file "sumNine1.c"
2 .section .rodata
3 .LC0:
4 .string "sumNine done."
5 .text
6 .globl sumNine
7 .type sumNine, @function
8 sumNine:
11.3. INTERFACE BETWEEN FUNCTIONS, 32-BIT MODE 253
9 pushl %ebp
10 movl %esp, %ebp
11 subl $24, %esp
12 movl 12(%ebp), %edx # load two
13 movl 8(%ebp), %eax # load one, subtotal
14 addl %edx, %eax # add two
15 addl 16(%ebp), %eax # add three
16 addl 20(%ebp), %eax # add four
17 addl 24(%ebp), %eax # add five
18 addl 28(%ebp), %eax # add six
19 addl 32(%ebp), %eax # add seven
20 addl 36(%ebp), %eax # add eight
21 addl 40(%ebp), %eax # add nine
22 movl %eax, -4(%ebp) # x <- total
23 movl $.LC0, (%esp)
24 call puts
25 movl -4(%ebp), %eax # return x;
26 leave
27 ret
28 .size sumNine, .-sumNine
29 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
30 .section .note.GNU-stack,"",@progbits
Listing 11.7: Passing more than six arguments to a function (gcc assembly language, 32-bit).
(There are two files here.)
The argument passing sequence can be seen on lines 25 – 42 in the main function. Rather than
pushing each argument onto the stack, the compiler has used the technique of allocating space
on the stack for the arguments, then storing each argument directly in the appropriate location.
The result is the same as if they had been pushed onto the stack, but the direct storage technique
is more efficient.
The state of the call stack just before calling the nineInts function is shown in Figure 11.5.
Comparing this with the 64-bit version in Figure 11.3, we see that the local variables are treated
in essentially the same way. But the 32-bit version differs in the way it passes arguments:
• All the arguments are passed on the call stack, none in registers.
• Arguments are passed in 4-byte blocks.
254 CHAPTER 11. WRITING YOUR OWN FUNCTIONS
esp
arg1 = (esp)+0 1
arg2 = (esp)+4 2
arg3 = (esp)+8 3
arg4 = (esp)+12 4
arg5 = (esp)+16 5
arg6 = (esp)+20 6
Arguments
arg7 = (esp)+24 7
arg8 = (esp)+28 8 Beginning of called
function’s stack frame
arg9 = (esp)+32 9
????
????
????
a = (ebp)-40 1
Local variables
b = (ebp)-36 2
c = (ebp)-32 3 Belongs to this
function’s stack frame
d = (ebp)-28 4
e = (ebp)-24 5
f = (ebp)-20 6
g = (ebp)-16 7
h = (ebp)-12 8
i = (ebp)-8 9
ebp (ecx)
Caller’s ebp
Figure 11.5: Calling function’s stack frame, 32-bit mode. Local variables are accessed relative
to the frame pointer (ebp register). In this example, they are all 4-byte values.
Arguments are accessed relative to the stack pointer (esp register). Arguments are
passed in 4-byte blocks.
11.4. INSTRUCTIONS INTRODUCED THUS FAR 255
11.4.1 Instructions
data movement:
opcode source destination action see page:
cmovcc %reg/mem %reg conditional move 230
movs $imm/%reg %reg/mem move 141
movsss $imm/%reg %reg/mem move, sign extend 216
movzss $imm/%reg %reg/mem move, zero extend 217
popw %reg/mem pop from stack 163
pushw $imm/%reg/mem push onto stack 163
s = b, w, l, q; w = l, q; cc = condition codes
arithmetic/logic:
opcode source destination action see page:
adds $imm/%reg %reg/mem add 189
adds mem %reg add 189
cmps $imm/%reg %reg/mem compare 209
cmps mem %reg compare 209
decs %reg/mem decrement 220
incs %reg/mem increment 220
leaw mem %reg load effective address 167
subs $imm/%reg %reg/mem subtract 190
subs mem %reg subtract 190
tests $imm/%reg %reg/mem test bits 210
tests mem %reg test bits 210
s = b, w, l, q; w = l, q
11.5 Exercises
11-1 (§11.2) Enter the program in Listing 11.6. Single-step through the program with gdb and
record the changes in the rsp and rip registers and the changes in the stack on paper. Use
drawings similar to Figure 11.3.
Note: Each of the two functions should be in its own source file. You can single-step into
the subfunction with gdb at the call instruction in main, then single-step back into main at
the ret instruction in addConst.
11-2 (§11.2) Enter the C program in Listing 11.4. Using the “-S” compiler option, compile it with
differing levels of optimization, i.e., “-O1, -O2, -O3,” and discuss the assembly language that
is generated. Is the optimized code easier or more difficult to read?
11-3 (§11.2, §10.1) Write the function, writeStr, in assembly language. The function takes one
argument, a char *, which is a pointer to a C-style text string. It displays the text string
on the screen. It returns the number of characters displayed.
Demonstrate that your function works correctly by writing a main function that calls
writeStr to display “Hello world” on the screen.
Note that the main function will not do anything with the character count that is returned
by writeStr.
11-4 (§11.2, §10.1) Write the function, readLn, in assembly language. The function takes one
argument, a char *, which is a pointer to a char array for storing a text string. It reads
characters from the keyboard and stores them in the array as a C-style text string. It
does not store the ’\n’ character. It returns the number of characters, excluding the NUL
character, that were stored in the array.
Demonstrate that your function works correctly by writing a main function that prompts
the user to enter a text string, then echoes the user’s input.
When testing your program, be careful not to enter more characters than the allocated
space. Explain what would occur if you did enter too many characters.
Note that the main function will not do anything with the character count that is returned
by readLn.
11.5. EXERCISES 257
We saw in Section 3.5 (page 43) that input read from the keyboard and output written on the
screen is in the ASCII code and that integers are stored in the binary number system. So if
a program reads user input as, say, 12310 , that input is read as the characters ’1’, ’2’, and
’3’’, but the value used in the program is represented by the bit pattern 0000007b16 .1 In this
chapter, we return to the conversion algorithms between these two storage codes and look at the
assembly language that is involved.
performs an and operation between each of the respective 32 bits in the eax register with the 32
bits in the edx register, leaving the result in the edx register. The instruction
1 Some programs, notably those that do not perform many arithmetic operations, maintain the numbers in the char-
acter code. This requires more complex algorithms for performing arithmetic.
258
12.1. LOGICAL OPERATORS 259
performs an and operation between each of the respective 8 bits in the dh register with the 8 bits
in the ah register, leaving the result in the ah register.
The addressing modes available for the arithmetic operators, add and sub, are also available
for the logical operators. For example, if eax contains the bit pattern 0x89abcdef, the instruction
Logical operators
andb $0xee, %ah use the same
addressing
would change eax to contain 0x89abcdee. If we follow this with the instruction modes as add and
sub.
orl $0x11111111, %eax
8 #include "writeStr.h"
9 #include "readLn.h"
10 #include "toUpper.h"
11 #include "toLower.h"
12 #define MAX 50
13 int main()
14 {
15 char stringOrig[MAX];
16 char stringWork[MAX];
17
31 writeStr("Original: ");
32 writeStr(stringOrig);
33 writeStr("\n");
34
35 return 0;
36 }
1 /*
2 * toUpper.h
3 * Converts letters in a C string to upper case.
260 CHAPTER 12. BIT OPERATIONS; MULTIPLICATION AND DIVISION
7 #ifndef TOUPPER_H
8 #define TOUPPER_H
9 int toUpper(char *, char *);
10 #endif
1 /*
2 * toUpper.c
3 * Converts alphabetic letters in a C string to upper case.
4 * Bob Plantz - 14 June 2009
5 */
6
7 #include "toUpper.h"
8 #define UPMASK 0xdf
9
1 /*
2 * toLower.h
3 * Converts letters in a C string to lower case.
4 * Bob Plantz - 14 June 2009
5 */
6
7 #ifndef TOLOWER_H
8 #define TOLOWER_H
9 int toLower(char *, char *);
10 #endif
1 /*
2 * toLower.c
3 * Converts letters in a C string to lower case.
4 * Bob Plantz - 14 June 2009
5 */
6
7 #include "toLower.h"
8 #define LOWMASK 0x20
9
16 srcPtr++;
17 destPtr++;
18 count++;
19 }
20 *destPtr = ’\0’; // terminate string
21 return count;
22 }
Listing 12.1: Convert letters to upper/lower case (C). The functions writeStr and readLn are
not shown here; see Exercises 11-3 and 11-4 for the assembly language versions.
(There are three files here.)
The program assumes that the user enters all alphabetic characters without making mistakes.
Of course, the conversions could be accomplished with addition and subtraction, but in this
application the bit-wise logical operators are more natural.
In Listing 12.2 we show only the gcc-generated assembly language for the main and toUpper
functions.
1 .file "upperLower.c"
2 .section .rodata
3 .align 8
4 .LC0:
5 .string "Enter some alphabetic characters: "
6 .LC1:
7 .string "All upper: "
8 .LC2:
9 .string "\n"
10 .LC3:
11 .string "All lower: "
12 .LC4:
13 .string "Original: "
14 .text
15 .globl main
16 .type main, @function
17 main:
18 pushq %rbp
19 movq %rsp, %rbp
20 addq $-128, %rsp
21 movq %fs:40, %rax # load guard value
22 movq %rax, -8(%rbp) # store at end of stack
23 xorl %eax, %eax # clear rax
24 movl $.LC0, %edi
25 call writeStr
26 leaq -64(%rbp), %rdi
27 movl $50, %esi
28 call readLn
29 movl $.LC1, %edi
30 call writeStr
31 leaq -128(%rbp), %rsi
32 leaq -64(%rbp), %rdi
33 call toUpper
34 leaq -128(%rbp), %rdi
35 call writeStr
36 movl $.LC2, %edi
37 call writeStr
38 movl $.LC3, %edi
39 call writeStr
40 leaq -128(%rbp), %rsi
262 CHAPTER 12. BIT OPERATIONS; MULTIPLICATION AND DIVISION
1 .file "toUpper.c"
2 .text
3 .globl toUpper
4 .type toUpper, @function
5 toUpper:
6 pushq %rbp
7 movq %rsp, %rbp
8 movq %rdi, -24(%rbp) # save srcPtr
9 movq %rsi, -32(%rbp) # save destPtr
10 movl $0, -4(%rbp)
11 jmp .L2
12 .L3:
13 movq -24(%rbp), %rax # srcPtr
14 movzbl (%rax), %edx # load char there
15 movl $-33, %eax # load 0xffffffdf
16 andl %eax, %edx # make upper case
17 movq -32(%rbp), %rax # destPtr
18 movb %dl, (%rax) # store char there
19 addq $1, -24(%rbp) # srcPtr++;
20 addq $1, -32(%rbp) # destPtr++;
21 addl $1, -4(%rbp) # count++;
22 .L2:
23 movq -24(%rbp), %rax
24 movzbl (%rax), %eax
25 testb %al, %al # NUL character?
26 jne .L3 # no, keep going
27 movq -32(%rbp), %rax # yes, load destPtr
28 movb $0, (%rax) # and store NUL there
29 movl -4(%rbp), %eax # return count;
30 leave
31 ret
32 .size toUpper, .-toUpper
33 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
12.1. LOGICAL OPERATORS 263
34 .section .note.GNU-stack,"",@progbits
Listing 12.2: Convert letters to upper/lower case (gcc assembly language). Only two of the func-
tions in Listing 12.1 are shown. (There are two files here.)
The toLower function is similar to the toUpper, and the writeStr and readLn functions were
covered in the exercises in Chapter 11.
Most of the code in Listing 12.2 should be familiar from previous chapters. Note that the C
code specifies char arrays in the main function that are 50 elements long (lines 13 and 14). But
the compiler generates assembly language that allocates 64 bytes for each array:
18 main:
19 pushq %rbp
20 movq %rsp, %rbp
21 addq $-128, %rsp
and:
31 leaq -128(%rbp), %rsi
32 leaq -64(%rbp), %rdi
33 call toUpper
Recall that this 16-byte address alignment is specified by the ABI [25].
The code sequence on lines 21 – 23 in main:
21 movq %fs:40, %rax # load guard value
22 movq %rax, -8(%rbp) # store at end of stack
23 xorl %eax, %eax # clear rax
is new to you. This code sequence stores a value supplied by the operating system near the
end of the stack. The purpose is described in the gcc man page entry for the -fstack-protector
option:
Emit extra code to check for buffer overflows, such as stack smashing attacks.
This is done by adding a guard variable to functions with vulnerable objects.
This includes functions that call alloca, and functions with buffers larger than
8 bytes. The guards are initialized when a function is entered and then checked
when the function exits. If a guard check fails, an error message is printed
and the program exits.
The value stored there is checked at the end of the function, on lines 54 – 58:
54 movq -8(%rbp), %rdx
55 xorq %fs:40, %rdx
56 je .L3
57 call __stack_chk_fail # check for stack overflow
58 .L3:
If the value has been overwritten, the __stack_chk_fail function is called, which notifies the
user about the problem.
Your version of gcc may be compiled without this option as the default. It can be turned off
with the -fno-stack-protector option. Since the assembly language we are writing in this book
is not “industrial strength,” we will not include this stack protection code.
For the toUpper function, the compiler-generated assembly language first loads the address
stored in the srcPtr variable into a register so it can dereference the pointer.
13 movq -24(%rbp), %rax
It then moves the byte at that address into another register. It uses the movzbl instruction to
zero out the remaining 24 bits of the edx register. (Recall that changing the low-order 32 bits of
the rdx register also zeros out the high-order 32 bits.)
14 movzbl (%rax), %edx
264 CHAPTER 12. BIT OPERATIONS; MULTIPLICATION AND DIVISION
Next it loads the bit pattern ffffffdf (= −3310 ) into the eax register and performs the bit-wise
and operation, leaving the result in the edx register. This and operation leaves all the bits in
the edx register as they were, except the sixth bit is set to zero. The sixth bit in the ASCII code
determines whether a letter is upper or lower case.
15 movl $-33, %eax
16 andl %eax, %edx
Regardless of whether the letter was upper or lower case, it is now upper case. The letter is
stored in the low-order eight bits of the edx register, the dl register. So the program loads
the address stored in the destPtr variable into a register so it can dereference it and store the
character there.
17 movq -32(%rbp), %rax
18 movb %dl, (%rax)
We will now consider the version of this program written in assembly language (Listing 12.3).
1 # upperLower.s
2 # Converts alphabetic characters to all upper case
3 # and all lower case.
4 # Bob Plantz - 14 June 2009
5
6 # Constant
7 .equ MAX,50
8 # Local variable names
9 .equ stringOrig,-64 # original char array
10 .equ stringWork,-128 # working char array
11 .equ localSize,-128
12 # Read only data
13 .section .rodata
14 prompt:
15 .string "Enter some alphabetic characters: "
16 upMsg:
17 .string "All upper: "
18 lowMsg:
19 .string "All lower: "
20 origMsg:
21 .string "Original: "
22 endl:
23 .string "\n"
24 # Code
25 .text
26 .globl main
27 .type main, @function
28 main:
29 pushq %rbp # save base pointer
30 movq %rsp, %rbp # base pointer = current top of stack
31 addq $localSize, %rsp # allocate local var. space
32
42 call toLower
43 movq $lowMsg, %rdi
44 call writeStr
45 leaq stringWork(%rbp), %rdi # show modified string
46 call writeStr
47 movq $endl, %rdi
48 call writeStr
49
1 # toUpper.s
2 # Converts alpha characters to upper case.
3 # Bob Plantz - 14 June 2009
4
5 # Calling sequence:
6 # rdi <- address of string to be converted
7 # rsi <- address to store result string
8 # call toUpper
9 # returns number of characters written
10 # If rdi and rsi have the same address, original string
11 # is overwritten.
12
13 # Useful constant
14 .equ UPMASK,0xdf
15 # Stack frame, showing local variables and arguments
16 .equ destPtr,-32
17 .equ srcPtr,-24
18 .equ count,-4
19 .equ localSize,-32
20 # Code
21 .text
22 .globl toUpper
23 .type toUpper, @function
24 toUpper:
25 pushq %rbp # save frame pointer
26 movq %rsp, %rbp # new frame pointer
27 addq $localSize, %rsp # local vars. and arg.
266 CHAPTER 12. BIT OPERATIONS; MULTIPLICATION AND DIVISION
28
1 # toLower.s
2 # Converts alpha characters to lower case.
3 # Bob Plantz - 14 June 2009
4
5 # Calling sequence:
6 # rdi <- address of string to be converted
7 # rsi <- address to store result string
8 # call toLower
9 # returns number of characters written
10 # If rdi and rsi have the same address, original string
11 # is overwritten.
12
13 # Useful constant
14 .equ LOWMASK,0x20
15 # Stack frame, showing local variables and arguments
16 .equ destPtr,-32
17 .equ srcPtr,-24
18 .equ count,-4
19 .equ localSize,-32
20 # Code
21 .text
22 .globl toLower
23 .type toLower, @function
24 toLower:
25 pushq %rbp # save frame pointer
26 movq %rsp, %rbp # new frame pointer
27 addq $localSize, %rsp # local vars. and arg.
28
32 lowLoop:
33 movq srcPtr(%rbp), %rax # source pointer
34 movb (%rax), %al # get current char
35 cmpb $0, %al # at end yet?
36 je done # yes, all done
37
Listing 12.3: Convert letters to upper/lower case (programmer assembly language). See Exer-
cises 11.3 and 11.4 for the functions writeStr and readLn. (There are three files
here.)
Again, we will describe on the toUpper function. Writing directly in assembly language, we
also need to get the address in srcPtr so we can dereference it. But in copying the character
stored there, we simply ignore the remaining 56 bits of the rax register. Notice that the movb
instruction first uses the full 64-bit address in the rax register to fetch the byte stored there, and
it then can write over the low-order 8 bits of the same register. (This, of course, “destroys” the
address.)
33 movq srcPtr(%rbp), %rax # source pointer
34 movb (%rax), %al # get current char
Since we are ignoring the high-order 56 bits of the rax register, we must be consistent when
operating on the data in the low-order 8 bits. So we use the andb instruction to operate only on
the al portion of the rax register.
38 andb $UPMASK, %al # in range, convert
Storing the final result is the same, except we are using different registers.
39 movq destPtr(%rbp), %r8 # destination pointer
40 movb %al, (%r8) # store character
Both ways of implementing this algorithm are correct, and there is probably no significant effi-
ciency difference. However, comparing the two shows the importance of maintaining consistency
in data sizes. You do not need to zero out unused portions of registers, but you should also never
assume that they are zero.
There are two instructions for shifting bits to the right — shift right and shift arithmetic
right:
The source operand can be either an immediate value, or the value can be located in the cl
register. If it is an immediate value, it can be up to 6310 .
The destination operand can be either a memory location or a register. Any of the addressing
modes that we have covered can be used to specify a memory location.
The action of the shr instruction is to shift all the bits in the destination operand to the right
by the number of bit positions specified by the source operand. The “vacated” bit positions at
the high-order end of the destination operand are filled with zeros. The last bit to be shifted out
of the low-order bit position is copied into the carry flag (CF).
For example, if the eax register contained the bit pattern aabb 2233, then the instruction
shrw $1, %ax
would produce
and the CF would be one. With the same initial conditions, the instruction
shrl $4, %eax
would produce
would produce
and the CF would be one. With the same initial conditions, the instruction
sarl $4, %eax
would produce
The source operand can be either an immediate value, or the value can be located in the cl
register. If it is an immediate value, it can be up to 3110 .
The destination operand can be either a memory location or a register. Any of the addressing
modes that we have covered can be used to specify a memory location.
The action of both the shl and sal instructions is to shift all the bits in the destination
operand to the left by the number of bit positions specified by the source operand. In fact,
these are really two different assembly language mnemonics for the same machine code. The
“vacated” bit positions at the low-order end of the destination operand are filled with zeros. The
last bit to be shifted out of the highest-order bit position is copied into the carry flag (CF). For
example, if the eax register contained the bit pattern bbaa 2233, then the instruction
shlw $1, %ax
would produce
and the CF would be zero. With the same initial conditions, the instruction
shll $4, %eax
would produce
20 toLower(theString, theString);
21 theInt = hexToInt(theString);
22 printf("%lx = %li\n", theInt, theInt);
23 return 0;
24 }
1 /*
2 * hexToInt.h
3 * Converts hex character string to int.
4 * Bob Plantz - 8 April 2008
5 */
6
7 #ifndef HEXTOINT_H
8 #define HEXTOINT_H
9 long int hexToInt(char *);
10 #endif
1 /*
2 * hexToInt.c
3 * Converts hex character string to int.
4 * Assumes A - F in upper case.
5 * Bob Plantz - 14 June 2009
6 */
7
8 #include "hexToInt.h"
9 #define NUMERAL 0x30
10 #define ALPHA 0x57
11
17 current = *stringPtr;
18 while (current != ’\0’)
19 {
20 accumulator = accumulator << 4;
21 if (current <= ’9’) // only works for 0-9,A-F
22 current -= NUMERAL;
23 else
24 current -= ALPHA;
25 accumulator += (long int)current;
26 stringPtr++;
27 current = *stringPtr;
28 }
29 return accumulator;
30 }
Listing 12.4: Shifting bits (C). (There are three files here.)
Notice that “«” (on line 20 in the hexToInt function) is the left shift operator and “»” is the right
shift operator in C/C++. In C++ these operators are overloaded to provide file output and input.
12.2. SHIFTING BITS 271
The code in the main function is familiar. The compiler-generated assembly language for
hexToInt is shown in Listing 12.5 with comments added.
1 .file "hexToInt.c"
2 .text
3 .globl hexToInt
4 .type hexToInt, @function
5 hexToInt:
6 pushq %rbp
7 movq %rsp, %rbp
8 movq %rdi, -24(%rbp)
9 movq $0, -16(%rbp)
10 movq -24(%rbp), %rax
11 movzbl (%rax), %eax
12 movb %al, -1(%rbp)
13 jmp .L2 # jump to bottom of loop
14 .L5:
15 salq $4, -16(%rbp) # accumulator = accumulator << 4;
16 cmpb $57, -1(%rbp) # if (current > ’9’)
17 jg .L3 # jump to .L4 ("else" part)
18 movzbl -1(%rbp), %eax # "then" part
19 subl $48, %eax # convert numeral char to int
20 movb %al, -1(%rbp) # and update current
21 jmp .L4 # jump around the "else" part
22 .L3:
23 movzbl -1(%rbp), %eax # "else" part
24 subl $87, %eax # convert letter char to int
25 movb %al, -1(%rbp) # and update current
26 .L4:
27 movsbq -1(%rbp),%rax # type-cast char to a long int
28 addq %rax, -16(%rbp) # add it to accumulator
29 addq $1, -24(%rbp) # stringPtr++;
30 movq -24(%rbp), %rax
31 movzbl (%rax), %eax
32 movb %al, -1(%rbp) # current = *stringPtr;
33 .L2:
34 cmpb $0, -1(%rbp) # while (current != ’\0’)
35 jne .L5 # go to top of loop
36 movq -16(%rbp), %rax # 64-bit return value
37 leave
38 ret
39 .size hexToInt, .-hexToInt
40 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
41 .section .note.GNU-stack,"",@progbits
As usual, gcc has converted the while loop in hexToInt to a do-while loop, which is entered
at the bottom. Most of this code has been covered previously. The instruction
15 salq $4, -16(%rbp) # accumulator = accumulator << 4;
shifts the 64 bits that make up the value of the variable accumulator four bits to the left. Make
sure that you understand the the four high-order bits in this group of 64 are lost. That is, the
shift does not carry on to other memory areas beyond these 64 bits. As stated above, the last bit
to get shifted out of these 64 bits is copied to the CF.
The conversions from characters to integers
18 movzbl -1(%rbp), %eax # "then" part
272 CHAPTER 12. BIT OPERATIONS; MULTIPLICATION AND DIVISION
and
23 movzbl -1(%rbp), %eax # "else" part
24 subl $87, %eax # convert letter char to int
25 movb %al, -1(%rbp) # and update current
start by moving a one-byte character into an int-sized register with the high-order 24 bits ze-
roed. The actual conversion consists of subtracting off the “character part” as an integer arith-
metic operation. Then the result, which is guaranteed to fit within a byte, is stored back in the
single byte allocated for the original character.
Actually, we can easily see that the result of this conversion operation is a four-bit value in
the range 00002 – 11112 . The four-bit left shift of the variable accumulator has left space for
inserting these four bits. The bit insertion operation consists of first type casting the four-bit
integer to a 64-bit integer as we load it from the variable:
27 movsbq -1(%rbp),%rax # type-cast char to a long int
We also note that although the standard return value is 32-bits in the eax register, declaring
a long int (64-bit) return value causes the compiler to use the entire rax register:
36 movq -16(%rbp), %rax # 64-bit return value
Listing 12.6 shows a version of the hexToInt function written in assembly language.
1 # hexToInt_a.s
2 # Converts hex characters to a 64-bit int.
3 # Bob Plantz - 14 June 2009
4
5 # Calling sequence:
6 # rdi <- address of hex string to be converted
7 # call string2Hex
8 # returns 64-bit int represented by the hex string
9
10 # Useful constants
11 .equ NUMERAL,0x30
12 .equ ALPHA,0x57
13 .equ HEXBITS,4
14 .equ TYPEMASK,0xf
15 # Stack frame, showing local variables and arguments
16 .equ accumulator,-16
17 .equ current,-1
18 .equ localSize,-32
19 # Code
20 .text
21 .globl hexToInt
22 .type hexToInt, @function
23 hexToInt:
24 pushq %rbp # save frame pointer
25 movq %rsp, %rbp # new frame pointer
26 addq $localSize, %rsp # local vars. and arg.
27
12.3 Multiplication
The hexToInt function discussed in Section 12.2 shows how to convert a string of hexadecimal
characters into the integer they represent. That function uses the fact that each hexadecimal
274 CHAPTER 12. BIT OPERATIONS; MULTIPLICATION AND DIVISION
character represents four bits. So as the characters are read from left to right, the bits in the
accumulator are shifted four places to the left in order to make space for the next four-bit value.
The character is converted to the four bits it represents and added to the accumulator.
Although the four-bit left shift seems natural for hexadecimal, it is equivalent to multiply-
ing the value in the accumulator by sixteen. This follows from the positional notation used to
write numbers. Add another hexadecimal digit to the right of an existing number effectively
multiplies that existing number by sixteen. A little thought shows that this algorithm, shown
in Algorithm 12.1, works in any number base.
Algorithm 12.1: Character to integer conversion.
1 accumulator ⇐ 0;
2 while more characters do
3 accumulator ⇐ base × accumulator;
4 tempInt ⇐ integer equivalent of the character;
5 accumulator ⇐ accumulator + tempInt;
Of course, you probably want to write programs that allow users to work with decimal num-
bers. So we need to know how to convert a string of decimal characters to the integer they
represent. The characters that represent decimal numbers are in the range 3016 – 3916 . Table
12.1 shows the 32-bit int that corresponds to each numeric character. For a string of charac-
Numeral int
(ASCII code) (Binary number system)
0011 0000 0000 0000 0000 0000 0000 0000 0000 0000
0011 0001 0000 0000 0000 0000 0000 0000 0000 0001
0011 0010 0000 0000 0000 0000 0000 0000 0000 0010
0011 0011 0000 0000 0000 0000 0000 0000 0000 0011
0011 0100 0000 0000 0000 0000 0000 0000 0000 0100
0011 0101 0000 0000 0000 0000 0000 0000 0000 0101
0011 0110 0000 0000 0000 0000 0000 0000 0000 0110
0011 0111 0000 0000 0000 0000 0000 0000 0000 0111
0011 1000 0000 0000 0000 0000 0000 0000 0000 1000
0011 1001 0000 0000 0000 0000 0000 0000 0000 1001
Table 12.1: Bit patterns (in binary) of the ASCII numerals and the corresponding 32-bit ints.
ters that represents a decimal integer, Algorithm 12.1 can be specialized to give Algorithm 12.2.
(Recall that “·” is the bit-wise and operator.)
Algorithm 12.2: Decimal character to integer conversion.
1 accumulator ⇐ 0;
2 while more characters do
3 accumulator ⇐ 10 × accumulator;
4 tempInt ⇐ 0xf · character;
5 accumulator ⇐ accumulator + tempInt;
Shifting N bits to the left multiplies a number by 2N , so it can only be used to multiply be
powers of two. Algorithm 12.2 multiplies the accumulator by 10, which cannot be accomplished
with only shifts. Thus, we need to use the multiplication instruction for decimal conversions.
The multiplication instruction is somewhat more complicated than addition. The main prob-
lem is that the product can, in general, occupy the number of digits in the multiplier plus the
number of digits in the multiplicand. This is easily seen by computing 99 × 99 = 9801 (in deci-
mal). Thus in general,
muls source
Intel®
Syntax mul source
In the x86-64 architecture, the destination operand contains the multiplicand and must be in the
al, ax, eax, or rax register, depending on the size of the operand, for the unsigned multiplication
instruction, mul. This register is not specified as an operand. The instruction specifies the source
mul uses rax
operand, which contains the multiplier and must be the same size. It can be located in another behind the scene.
general-purpose register or in memory. If the numbers are eight bits (hence, one number is in
al), the high-order portion of the result will be in the ah register, and the low-order portion of the
result will be in the al register. For sixteen and thirty-two bit numbers, the low-order portion of
mul will probably
the product will be stored in a portion of the rax register and the high-order will be stored in a change rdx
portion of the rdx register as shown in Table 12.2. behind the scene.
For example, let’s see how the computation 7 × 24 = 168 looks in 8-bit, 16-bit, and 32-bit
values. First, note that:
and
• longDay → allocate four bytes of memory and set the bit pattern of those four bytes to
0x00000018.
First, consider 8-bit multiplication. If eax contains the bit pattern 0x??????07, then
mulb byteDay
changes eax such that it contains 0x????00a8. Notice that only the al portion of the A register
can be used for the operand, but the result will occupy the entire ax portion of the register even
though the result would fit into only the al portion. That is, the instruction will produce a 16-bit
result, and anything stored in the ah portion will be lost.
Next, consider 16-bit multiplication. If eax contains 0x????0007, then
mulw wordDay
changes eax to contain 0x????00a8 and edx to contain 0x????0000. Two points are important in
this example:
• the ah portion of the A register must be set to zero before executing the mulw instruction so
that the ax portion contains the proper value, and
• the dx portion of the D register is used, even though the result is fits within the 16 bits of
the ax register.
Finally, 32-bit multiplication. If eax contains 0x00000007, then
mull longDay
changes eax to contain 0x000000a8 and edx to contain 0x00000000. This example shows the
entire eax register must be used for the operand before mull is executed, and the entire edx
register is used for the high-order portion of the result, even though it is not needed. That is,
the instruction will produce a 64-bit result, and anything stored in the edx register will be lost.
These examples show that the rax and rdx registers are used without ever explicitly appear-
ing in the instruction. You must be very careful not to write over a required value that is stored
in one of these registers. Using the multiplication instruction requires some careful planning.
There is also a signed multiply instruction, which has three forms:
imuls source
imuls source, destination
imuls immediate, source, destination
imul source
Intel®
Syntax imul destination, source
imul destination, source, immediate
In the one-operand format the signed multiply instruction uses the rdx:rax register combination
in the same way as the mul instruction.
In its two-operand format the destination must be a register. The source can be a register,
an immediate value, or a memory location. The source and destination are multiplied, and the
12.3. MULTIPLICATION 277
result is stored in the destination register. Unfortunately, if the result is too large to fit into the
destination register, it is simply truncated. In this case, both the CF and OF flags are set to 1. If
the result was able to fit into the destination register, both flags are set to 0.
In its three-operand format the destination must be a register. The source can be a register
or a memory location. The source is multiplied by the immediate value and the result is stored
in the destination register. As in the two-operand form, if the result is too large to fit into the
destination register, it is simply truncated. In this case, both the CF and OF flags are set to 1. If
the result was able to fit into the destination register, both flags are set to 0.
The difference between signed and unsigned multiplication can be illustrated with the fol-
lowing multiplication of two 16-bit values. Given the declaration:
.data
mOne: .word -1
we will multiply the two 16-bit values in the memory location mOne and the register ax. Notice
that if we consider them to be signed integers, both values represent -1, and we would expect
the result to be +1 (= 00000001 16 ). However, if we consider them to be unsigned integers, they
both represent 6553510, and we would expect the result to be 429483622510 (= fffe0001 16 ).
Indeed, starting with the initial conditions above, the instruction:
mulw mOne
yields:
We see that the register combination dx:ax = fffe:0001. And with the same initial conditions,
the instruction
imulw mOne
yields:
With signed multiplication we get dx:ax = 0000:0001. Both of these operations multiplied 16-bit
values to provide a 32-bit result. They each used the sixteen low-order bits of the rax and rdx
registers for the result. Notice that the upper 48 bits of these registers were not changed and
that neither “ax” nor “dx” appeared in either instruction.
Multiplication is used on line 18 in the decToInt function shown in Listing 12.7.
1 /*
2 * decToUInt.c
3 * Converts decimal character string to int.
4 * Bob Plantz - 15 June 2009
5 */
6
7 #include "decToUInt.h"
8 #define NUMERALMASK 0xf
9
15
16 current = *stringPtr;
17 while (current != ’\0’)
18 {
19 accumulator = accumulator * base;
20 current = current & NUMERALMASK;
21 accumulator += (int)current;
22 stringPtr++;
23 current = *stringPtr;
24 }
25 return accumulator;
26 }
As we can see on line 17 in Listing 12.8 the compiler has chosen to use the imull instruction
for multiplication.
1 .file "decToUInt.c"
2 .text
3 .globl decToUInt
4 .type decToUInt, @function
5 decToUInt:
6 pushq %rbp
7 movq %rsp, %rbp
8 movq %rdi, -24(%rbp)
9 movl $0, -12(%rbp)
10 movl $10, -8(%rbp)
11 movq -24(%rbp), %rax
12 movzbl (%rax), %eax
13 movb %al, -1(%rbp)
14 jmp .L2
15 .L3:
16 movl -12(%rbp), %eax # destination must
17 imull -8(%rbp), %eax # be in eax
18 movl %eax, -12(%rbp) # register
19 andb $15, -1(%rbp)
20 movzbl -1(%rbp), %eax
21 addl %eax, -12(%rbp)
22 addq $1, -24(%rbp)
23 movq -24(%rbp), %rax
24 movzbl (%rax), %eax
25 movb %al, -1(%rbp)
26 .L2:
27 cmpb $0, -1(%rbp)
28 jne .L3
29 movl -12(%rbp), %eax
30 leave
31 ret
32 .size decToUInt, .-decToUInt
33 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
34 .section .note.GNU-stack,"",@progbits
Listing 12.8: Convert decimal text string to int (gcc assembly language).
Recall that the destination must be a register. So the value to be multiplied must be loaded from
its memory location into a register, multiplied, then stored back into memory:
16 movl -12(%rbp), %eax # destination must
12.3. MULTIPLICATION 279
It may appear that the compiler has made an error here. Since both the multiplier and the
multiplicand are 32-bit values, the product can be 64 bits wide. However, the compiler has
chosen code that assumes the product will be no wider than 32 bits. This can lead to arithmetic
errors when multiplying large integers, but according to the C programming language standard
[10] this is acceptable:
A computation involving unsigned operands can never overflow, because a result that
cannot be represented by the resulting unsigned integer type is reduced modulo the
number that is one greater than the largest value that can be represented by the
resulting type.
5 # Calling sequence:
6 # rdi <- address of decimal string to be converted
7 # call decToUInt
8 # returns 64-bit int represented by the decimal string
9
10 # Useful constants
11 .equ NUMERALMASK,0xf
12 .equ DECIMAL,10
13 # Stack frame, showing local variables and arguments
14 .equ accumulator,-8
15 .equ current,-1
16 .equ localSize,-16
17
18 .text
19 .globl decToUInt
20 .type decToUInt, @function
21 decToUInt:
22 pushq %rbp # save base pointer
23 movq %rsp, %rbp # new base pointer
24 addq $localSize, %rsp # local vars. and arg.
25
Listing 12.9: Convert decimal text string to int (programmer assembly language).
And since this is a leaf function, the register used to pass the address of the text string (rdi) is
simply used as the pointer variable rather than allocate a register save area in the stack frame:
27 loop:
28 movb (%rdi), %sil # load character
29 cmpb $0, %sil # at end yet?
30 je done # yes, all done
31
This is safe because no other functions are called within this loop. Of course, the programmer
must be careful that the pointer variable (rdi) is not changed unintentionally.
12.4 Division
Division poses a different problem. In general, the quotient will not be larger than the dividend
(except when attempting to divide by zero). Division is also complicated by the existence of a
remainder. The divide instruction starts with a dividend that is twice as wide as the divisor.
Both the quotient and remainder are the same width as the divisor. The unsigned division
instruction is:
divs source
Intel®
Syntax div source
The source operand specifies the divisor. It can be either a register or a memory location. Table
12.3 shows how to set up the registers with the dividend and where the quotient and remainder
will be located after the unsigned division instruction, div, is executed. Notice that the quotient
is the C “/” operation, and the remainder is the “%” operation.
12.4. DIVISION 281
For example, let’s see how the computation 93 ÷ 19 = 4 with remainder 17 looks in 8-bit,
16-bit, and 32-bit values. First, note that:
byteDivisor:
.byte 19
wordDivisor:
.word 19
longDivisor:
.long 19
• byteDivisor → allocate one byte of memory and set the bit pattern of the byte to 0x13.
• wordDivisor → allocate two bytes of memory and set the bit pattern of those two bytes to
0x0013.
• longDivisor → allocate four bytes of memory and set the bit pattern of those four bytes to
0x00000013.
First, consider 8-bit division. If eax contains the bit pattern 0x????005d, then
divb byteDivisor
changes eax such that it contains 0x????1104. Notice that ah had to be set to 0 before executing
divb even though the dividend fits into one byte. That’s because the divb instruction starts with
the ah:al pair as a 16-bit number. We also see that after executing the instruction, ax contains
what appears to be a much larger number as a result of the division. Of course, we no longer
consider ax, but al (the quotient) and ah (the remainder) as two separate numbers.
Next, consider 16-bit division. If eax contains 0x????005d and edx 0x????0000, then
divw wordDivisor
changes eax to contain 0x????0004 and edx to contain 0x????0011. You may wonder why the divw
instruction does not start with the 32-bit dividend in eax. This is for backward compatibility —
Intel processors prior to the 80386 had only 16-bit registers.
Finally, 32-bit division. If eax contains 0x0000005d and edx 0x00000000, then
divl longDivisor
changes eax to contain 0x00000004 and edx to contain 0x00000011. Again, we see that the entire
edx register must be filled with zeros before executing the divl instruction, even though the
dividend fits within two bytes.
282 CHAPTER 12. BIT OPERATIONS; MULTIPLICATION AND DIVISION
One of the more common errors with division occurs when performing repeated division of
a number. Since the first division places the remainder in the area occupied by the high-order
portion of the dividend, you must remember to set that area to zero before dividing again.
The signed division instruction is:
idivs source
Intel®
Syntax idiv source
Unlike the signed multiply instruction, signed divide only has one form, which is the same as
unsigned divide. That is, the divisor is in the source operand, and the dividend is set up in the
rax and rdx registers as shown in Table 12.3.
We can see the difference between signed and unsigned division by dividing a 32-bit value
by a 16-bit value. Given the declaration:
.data
mOne: .word -1
we load the 32-bit dividend, +1, into the dx:ax register pair
movw $0, %dx
movw $1, %ax
Now, the unsigned value in the mOne variable is 0xff16 = 25510 . When we divide 1 by 255 we
expect to get 0 with a remainder of 1. Indeed, unsigned division:
divw mOne
yields:
rdx 0x7fffb6d00001 140736260472833
rax 0x2ae8f4310000 47180017631232
yields:
rdx 0x7fffb6d00000 140736260472832
rax 0x2ae8f431ffff 47180017696767
The quotient is in ax and is ffff16 (= −1), while the remainder (in dx) is 0.
The “/” operation is used on line 34 in the intToUDec function shown in Listing 12.7, and the
“%” operation is used on line 35.
1 /*
2 * intToUDec.c
3 *
4 * Converts an int to corresponding unsigned text
12.4. DIVISION 283
5 * string representation.
6 *
7 * input:
8 * 32-bit int
9 * pointer to at least 10-char array
10 * output:
11 * null-terminated string in array
12 *
13 * Bob Plantz - 15 June 2009
14 */
15
16 #include "intToUDec.h"
17 #define TOASCII 0x30
18
The assembly language generated by gcc is shown in Listing 12.11 (with comments added).
1 .file "intToUDec.c"
2 .text
3 .globl intToUDec
4 .type intToUDec, @function
5 intToUDec:
6 pushq %rbp
7 movq %rsp, %rbp
8 movq %rdi, -40(%rbp)
9 movl %esi, -44(%rbp)
284 CHAPTER 12. BIT OPERATIONS; MULTIPLICATION AND DIVISION
Listing 12.11: Convert unsigned int to decimal text string (gcc assembly language).
12.4. DIVISION 285
5 # Calling sequence
6 # esi <- value of the int
7 # rdi <- address of place to store string
8 # call intToUDec
9 # Useful constant
10 .equ asciiNumeral,0x30
11 # Stack frame
12 .equ reverseArray,-12
13 .equ localSize,-16
14 # Read only data
15 .section .rodata
16 ten: .long 10
17 # Code
18 .text
19 .globl intToUDec
20 .type intToUDec, @function
21 intToUDec:
22 pushq %rbp # save caller’s base ptr
23 movq %rsp, %rbp # our stack frame
24 addq $localSize, %rsp # local char array
25
45 copyLup:
46 decq %r8 # decrement pointer
47 movb (%r8), %dl # get char
48 movb %dl, (%rdi) # store it
49 incq %rdi # increment storage pointer
50 cmpb $0, %dl # NUL character?
51 jne copyLup # no, keep copying
52
Listing 12.12: Convert unsigned int to decimal text string (programmer assembly language).
On line 38 the high-order 32 bits of the quotient (edx register) are set to 0.
38 movl $0, %edx # yes, high-order = 0
39 divl ten # divide by ten
40 orb $asciiNumeral, %dl # convert to ascii
41 movb %dl, (%r8) # store character
The division on line 39 leaves “x / base” in the eax register for the next execution of the loop
body. It also places “x % base” in the edx register. We know that this value is in the range 0 – 9
and thus fits entirely within the dl portion of the register. Lines 40 and 41 show how the value
is converted to its ASCII equivalent and stored in the local char array.
As in the decToUInt function (Listing 12.9), since this is a leaf function, the register used
to pass the address of the text string (rdi) is simply used as the pointer variable rather than
allocate a register save area in the stack frame. Similarly, the eax register is used as the local
“x” variable.
negs source
Intel®
Syntax neg source
12.6. INSTRUCTIONS INTRODUCED THUS FAR 287
neg performs a two’s complement operation on the value in the operand, which can be either a
memory location or a register. Any of the addressing modes that we have covered can be used to
specify a memory location.
12.6.1 Instructions
data movement:
opcode source destination action see page:
cmovcc %reg/mem %reg conditional move 230
movs $imm/%reg %reg/mem move 141
movsss $imm/%reg %reg/mem move, sign extend 216
movzss $imm/%reg %reg/mem move, zero extend 217
popw %reg/mem pop from stack 163
pushw $imm/%reg/mem push onto stack 163
s = b, w, l, q; w = l, q; cc = condition codes
arithmetic/logic:
opcode source destination action see page:
adds $imm/%reg %reg/mem add 189
adds mem %reg add 189
ands $imm/%reg %reg/mem bit-wise and 258
ands mem %reg bit-wise and 258
cmps $imm/%reg %reg/mem compare 209
cmps mem %reg compare 209
decs %reg/mem decrement 220
divs %reg/mem unsigned divide 280
idivs %reg/mem signed divide 282
imuls %reg/mem signed multiply 276
incs %reg/mem increment 220
leaw mem %reg load effective address 167
subs $imm/%reg %reg/mem subtract 190
muls %reg/mem unsigned multiply 275
negs %reg/mem negate 286
ors $imm/%reg %reg/mem bit-wise inclusive or 258
ors mem %reg bit-wise inclusive or 258
sals $imm/%cl %reg/mem shift arithmetic left 269
sars $imm/%cl %reg/mem shift arithmetic right 268
shls $imm/%cl %reg/mem shift left 269
shrs $imm/%cl %reg/mem shift right 268
subs mem %reg subtract 190
tests $imm/%reg %reg/mem test bits 210
tests mem %reg test bits 210
xors $imm/%reg %reg/mem bit-wise exclusive or 258
xors mem %reg bit-wise exclusive or 258
s = b, w, l, q; w = l, q
288 CHAPTER 12. BIT OPERATIONS; MULTIPLICATION AND DIVISION
12.7 Exercises
12-1 (§12.2) Write a program in assembly language that
Your program should use the writeStr function from Exercise 11-3 to display the user
prompt. And it should use the readStr function from Exercise 11-4 or 11-6 to read the
user’s input.
12.7. EXERCISES 289
Your program should read the user’s input into the local char array, then perform the
conversion using the stored characters. Do not do the conversion as the characters are
entered by the user.
Your program does not need to check for user errors. You can assume that the user will
enter only ones and zeros. And you can assume that the user will not enter more that 32
bits. (Be careful when you test your program.)
12-2 (§12.2) Write a program in assembly language that allows the user to enter a decimal
integer then displays it in binary.
Your program should convert the decimal integer into the corresponding C-style text string
of ones and zeros, then use the writeStr function from Exercise 11-3 to display the text
string.
This program will require some careful planning in order to get the bits to print in the
correct order.
12-3 (§12.3) Write a function, mul16, in assembly language that takes two 16-bit integers as
arguments and returns the 32-bit product of the argument. Write a main driver function
to test mul16. You may use printf and scanf in the main function for the user interface.
Hint: Notice that most of the numbers in this problem are 16-bit unsigned integers. Read
the man pages for printf and scanf. In particular, the ”u” flag character is used to indicate
a short (16-bit) int.
12-4 (§12.4) Write a function, div32, in assembly language that implements the C / operation.
The function takes two 32-bit integers as arguments and returns the 32-bit quotient of the
first argument divided by the second. Write a main driver function to test div32. You may
use printf and scanf in the main function for the user interface.
12-5 (§12.4) Write a function, mod32, in assembly language that implements the C % operation.
The function takes two 32-bit integers as arguments and returns the 32-bit quotient of the
first argument divided by the second. Write a main driver function to test mod32. You may
use printf and scanf in the main function for the user interface.
12-6 (§12.4) Write a function in assembly language, decimal2uint, that takes two arguments: a
pointer to a char, and a pointer to an unsigned int.
The function assumes that the first argument points to a C-style text string that contains
only numeric characters representing an unsigned decimal integer. It computes the binary
value of the integer and stores the result at the location where the second argument points.
It returns zero.
Write a program that demonstrates the correctness of decimal2uint. Your program will
allocate a char array, call readStr (from Exercise 11-4 or 11-5) to get a decimal integer
from the user, and call decimal2uint to convert the text string to binary format. Then it
adds an integer to the user’s input integer and uses printf to display the result.
Hint: Start with the program from Exercise 12-1. Rewrite it so that the conversion from
the text string to the binary number is performed by a function. Then modify the function
so that it performs a decimal conversion instead of binary.
12-7 (§12.4) Write a function in assembly language, uint2dec, that takes two arguments: a
pointer to a char, and an unsigned int.
The function assumes that the first argument points to a char array that is large enough to
hold a text string that represents the largest possible 32-bit unsigned integer in decimal.
It computes the characters that represent the integer (the second argument) and stores
290 CHAPTER 12. BIT OPERATIONS; MULTIPLICATION AND DIVISION
this representation as a C-style text string where the first argument points. It returns
zero.
Write an assembly language program that demonstrates the correctness of uint2dec. Your
program will allocate one char array, call readStr (from Exercise 11-4) or 11-6 to get a
decimal integer from the user, and call decimal2uint (from Exercise 12-6) to convert the
text string to binary format. It should add a constant integer to this converted value. Then
it calls uint2dec to convert the sum to its corresponding text string, storing the string in
the char array.
Hint: Start with the program from Exercise 10-6. Rewrite it so that the conversion from
the binary number to the text string is performed by a function. Then modify the function
so that it performs a decimal conversion instead of binary.
12-8 (§12.3) Modify the program in Exercise 12-7 so that it deals with signed ints. Hint: Write
the function decimal2sint, which will call decimal2uint, and write the function sint2dec,
which will call uint2dec.
Chapter 13
Data Structures
An essential part of programming is determining how to organize the data. Homogeneous data
is often grouped in an array, and heterogeneous data in a struct. In this chapter, we study how
both these data structures are implemented.
13.1 Arrays
An array in C/C++ consists of one or more elements, all of the same type, arranged contiguously
in memory. To access an element in an array you need to specify two address-related items:
you can store an integer, say 123, in the ith element with the statement
array[i] = 123;
In this example the beginning of the array is specified by using the name, and the number of
the element is specified by the [...] syntax, as illustrated by the program in Listing 13.1.
1 /*
2 * arrayElement.c
3 * Stores a value in one element of an array.
4 * Bob Plantz - 15 June 2009
5 */
6
7 #include <stdio.h>
8
9 int main(void)
10 {
11 int myArray[50];
12 int i = 25;
13
14 myArray[i] = 123;
15 printf("The value is %i\n", myArray[i]);
16
17 return 0;
18 }
291
292 CHAPTER 13. DATA STRUCTURES
We would expect this program to allocate 4 × 50 = 200 bytes for myArray, plus 4 bytes for i
in the local variable area. Indeed, the gcc-generated assembly language in Listing 13.2 shows
that this total (204) has been increased to the next multiple of sixteen, and 208 bytes have been
allocated in the stack frame.
1 .file "arrayElement.c"
2 .section .rodata
3 .LC0:
4 .string "The value is %i\n"
5 .text
6 .globl main
7 .type main, @function
8 main:
9 pushq %rbp
10 movq %rsp, %rbp
11 subq $208, %rsp # myArray and i
12 movl $25, -4(%rbp) # i = 25;
13 movl -4(%rbp), %eax # load i
14 cltq # convert to 64-bit
15 movl $123, -208(%rbp,%rax,4) # myArray[i] = 123;
16 movl -4(%rbp), %eax # load i
17 cltq # convert to 64-bit
18 movl -208(%rbp,%rax,4), %esi # esi <- myArray[i]
19 movl $.LC0, %edi
20 movl $0, %eax
21 call printf
22 movl $0, %eax
23 leave
24 ret
25 .size main, .-main
26 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
27 .section .note.GNU-stack,"",@progbits
Listing 13.2: Storing a value in one element of an array (gcc assembly language).
Next, we see that the number of the element that is being accessed, 25, is stored in the variable
i. Then it is loaded into eax and converted from 32-bit to 64-bit.
uses an addressing mode for the destination that is new to you, indexed. The syntax in the GNU
assembly language is
indexed: The data value is located in memory. The address of the memory location is the sum
of the value in the base register plus the scale factor times the value in the index register,
plus the offset.
13.1. ARRAYS 293
syntax: place parentheses around the comma separated list — base_register, in-
dex_register, scale — and preface it with the offset.
example: -16(%rdx, %rax, 4)
Intel®
Syntax [rdx + rax*4 - 16]
9 int main(void)
10 {
11 int intArray[10];
12 int index;
13
14 index = 0;
15 while (index < 10)
16 {
17 intArray[index] = 0;
18 index++;
19 }
20 index = 0;
21 while (index < 10)
22 {
23 printf("intArray[%i] = %i\n", index, intArray[index]);
24 index++;
25 }
26 return 0;
27 }
The gcc compiler generated the assembly language shown in Listing 13.4 for this array clear-
ing program.
1 .file "clearArray1.c"
2 .section .rodata
3 .LC0:
4 .string "intArray[%i] = %i\n"
5 .text
6 .globl main
7 .type main, @function
8 main:
9 pushq %rbp
10 movq %rsp, %rbp
11 subq $48, %rsp
12 movl $0, -4(%rbp) # index = 0;
13 jmp .L2
14 .L3:
15 movl -4(%rbp), %eax # load current index value
16 cltq # convert to 64 bits
17 movl $0, -48(%rbp,%rax,4) # intArray[index] = 0;
18 addl $1, -4(%rbp) # index++;
19 .L2:
20 cmpl $9, -4(%rbp)
21 jle .L3
22 movl $0, -4(%rbp) # index = 0;
23 jmp .L4
24 .L5:
25 movl -4(%rbp), %eax # load current index value
26 cltq # convert to 64 bits
27 movl -48(%rbp,%rax,4), %edx # load array element
28 movl -4(%rbp), %esi # load current index value
29 movl $.LC0, %edi
30 movl $0, %eax # no floats
31 call printf
32 addl $1, -4(%rbp) # index++;
33 .L4:
34 cmpl $9, -4(%rbp)
35 jle .L5
36 movl $0, %eax
37 leave
38 ret
39 .size main, .-main
40 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
41 .section .note.GNU-stack,"",@progbits
that the address of the first element of this array is 0 ∗ 4 − 48 = −48 from the address in rbp, and
the address of the last element is 9 ∗ 4 − 48 = −12 from the address in rbp. Since this function
does not call any others, the array is stored in the red zone.
Indexing through the array is accomplished by loading the current value of the index variable
into the rax register. Although this example simply increments the index through the array,
you can see that the value used to index the array element is independent of maintaining the
beginning address of the array.
13.1. ARRAYS 295
The third value in the parentheses, 4, allows you to use the actual element number as the
array index. This is clearly more convenient — hence, less error prone — than having to compute
the number of bytes from the beginning of the array using the index value.
If we did not have this addressing mode, we would have to do something like:
# clear the array
.L3:
movl -4(%rbp), %eax
cltq
salq $2, %rax # multiply index by 4
leaq -48(%rbp), %rsi # address of array start
addq %rax, %rsi # address of current element
movl $0, (%rsi) # store zero there
addl $1, -4(%rbp) # index ++
.L2:
cmpl $9, -4(%rbp)
jle .L3
Although this is logically correct, it requires two more instructions and uses more registers.
RISC architectures (e.g., PowerPC, MIPS, Itanium) typically do not have the indexed ad-
dressing mode, hence this algorithm must be used.
1 # clearArray2.s
2 # Allocates an int array, stores zero in each element,
3 # and prints results.
4 # Bob Plantz - 16 June 2009
5
6 # Stack frame
7 .equ intArray,-40 # space for 10 ints in the array
8 .equ rbxSave,-48 # preserve registers
9 .equ r12Save,-56
10 .equ localSize,-64
11
12 # Constant data
13 .section .rodata
14 format: .string "intArray[%i] = %i\n"
15
16 # The progam
17 .text
18 .globl main
19 .type main, @function
20 main:
21 pushq %rbp # save caller base pointer
22 movq %rsp, %rbp # set our base pointer
23 addq $localSize, %rsp # local variables
24 movq %rbx, rbxSave(%rbp) # save regs.
25 movq %r12, r12Save(%rbp)
26
This version uses a do-while loop instead of a while loop entered at the bottom. The index and
the address of the beginning of the array are maintained in registers.
Using a register for the index value presents a potential problem. Recall that some registers
are guaranteed to be preserved by a function (Table 6.4, page 121). We have used r12 for the
print do-while loop in this program because it calls another function — printf. The current
value of index must be copied to the correct argument register for the call to printf:
41 movl %r12d, %esi # get index value
Although the operating system probably does not depend on registers being saved, we have
done so in this program:
24 movq %rbx, rbxSave(%rbp) # save regs.
25 movq %r12, r12Save(%rbp)
and:
49 movq rbxSave(%rbp), %rbx # restore regs.
50 movq r12Save(%rbp), %r12
just to be safe.
1 /*
2 * structField1.c
3 * Allcates two structs and assigns a value to each field
4 * in each struct.
5 * Bob Plantz - 16 June 2009
6 */
7
8 #include <stdio.h>
9
10 struct theTag
11 {
12 char aByte;
13 int anInt;
14 char anotherByte;
15 };
16
17 int main(void)
18 {
19 struct theTag x;
20 struct theTag y;
21
22 x.aByte = ’a’;
23 x.anInt = 123;
24 x.anotherByte = ’b’;
25 y.aByte = ’1’;
26 y.anInt = 456;
27 y.anotherByte = ’2’;
28
The name of the struct variable is specified first, followed by a dot (.), followed by the field
name. The field names and their individual data types are declared between the {. . . } pair of the
struct declaration.
The amount of memory required by a struct variable is equal to the sum of the amount
of memory required by each of its fields. Thus in the above program, the amount of memory
required is:
aByte: 1 byte
anInt: 4 bytes
anotherByte: 1 byte
total = 6 bytes
If we were to allocate these six bytes of memory without some thought, the first char variable
could occupy the first byte, the int variable the next four bytes, and the second char variable
the following byte. That is, relative to the address of the beginning of the struct,
• aByte would be stored in byte number 0,
298 CHAPTER 13. DATA STRUCTURES
However, the ABI [25] specifies that the alignment of each element should be the same as
that of the “most strictly aligned component.” In this example the int element should be aligned
on a 4-byte boundary. So even though the char elements only require one byte, they should also
be aligned on 4-byte boundaries. Also, as explained in Section 8.4 (page 173), we should allocate
memory in multiples of sixteen for local variables (see Exercise 13-4). These requirements sug-
gest that each struct variable will be allocated on the stack as shown in Figure 13.1. Thus we
y.aByte ’2’
y.anInt 456
y.anotherByte ’1’
x.aByte ’b’
x.anInt 123
x.anotherByte ’a’
Figure 13.1: Memory allocation for the variables x and y from the C program in Listing 13.6.
Shaded areas are padding bytes used to properly align the address of each variable;
no data is stored in them.
see that each of the struct variables in Listing 13.6 requires that we allocate sixteen bytes in
the stack frame.
The next issue is access each of the fields in these two structs. You learned in Section 9.1
(page 183) that assignment in C is implemented with the mov instruction. So in this program
assignment at the assembly language level is implemented:
movb $’a’, address_of_aByte_field_in_x
movl $123, address_of_anInt_field_in_x
movb $’b’, address_of_anotherByte_field_in_x
The base register plus offset addressing mode (page 165) provides a convenient way to access
each field in a struct. Simply load the address of the struct variable into a register, then use
the offset of the field. We can see how the compiler has implemented this in Listing 13.7.
1 .file "structField1.c"
2 .section .rodata
3 .align 8
4 .LC0:
5 .string "x: %c, %i, %c and y: %c, %i, %c\n"
6 .text
7 .globl main
8 .type main, @function
9 main:
10 pushq %rbp
11 movq %rsp, %rbp
12 subq $48, %rsp
13 movb $97, -16(%rbp)
14 movl $123, -12(%rbp)
15 movb $98, -8(%rbp)
16 movb $49, -32(%rbp)
17 movl $456, -28(%rbp)
18 movb $50, -24(%rbp)
19 movzbl -24(%rbp), %eax
13.2. STRUCTS (RECORDS) 299
20 movsbl %al,%edx
21 movl -12(%rbp), %ecx
22 movzbl -32(%rbp), %eax
23 movsbl %al,%esi
24 movzbl -8(%rbp), %eax
25 movsbl %al,%edi
26 movl -12(%rbp), %r10d
27 movzbl -16(%rbp), %eax
28 movsbl %al,%eax
29 movl %edx, (%rsp)
30 movl %ecx, %r9d
31 movl %esi, %r8d
32 movl %edi, %ecx
33 movl %r10d, %edx
34 movl %eax, %esi
35 movl $.LC0, %edi
36 movl $0, %eax
37 call printf
38 movl $0, %eax
39 leave
40 ret
41 .size main, .-main
42 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
43 .section .note.GNU-stack,"",@progbits
The compiler allocated 48 bytes in the stack frame. Thirty-two are for the two struct variables.
The additional sixteen are needed for passing the seventh argument to the printf function (line
29), while maintaining 16-byte addressing of the stack pointer. Rather than load the address
of each struct into a register, the compiler has computed the total offset to each of the fields in
each of the structs.
As usual, we equate symbolic names to these numbers when writing in assembly language.
We have tried to make our assembly language version (Listing 13.8) a little more readable than
the version generated by gcc.
1 # structField2.s
2 # Allcates two structs and assigns a value to each field
3 # in each struct.
4 # Bob Plantz - 18 June 2009
5
23 main:
24 pushq %rbp # save frame pointer
25 movq %rsp, %rbp # our frame pointer
26 addq $localSize, %rsp # local variables
27 andq $-16, %rsp # align stack pointer
28
41 # print values
42 movq $formatString, %rdi
43 leaq x(%rbp), %rax # the x struct
44 movb aByte(%rax), %sil
45 movl anInt(%rax), %edx
46 movb anotherByte(%rax), %cl
47 leaq y(%rbp), %rax # the y struct
48 movb aByte(%rax), %r8b
49 movl anInt(%rax), %r9d
50 movb anotherByte(%rax), %al
51 movb %al, (%rsp) # pass on stack
52 movl $0, %eax # no floating point
53 call printf
54
The version written in assembly language loads the address of a struct variable into a register
before accessing the fields. This allows the use of the symbolic names for each field:
and:
This technique would be necessary with, say, an array of structs or a function that takes an
address of a struct as an argument.
13.3. STRUCTS AS FUNCTION ARGUMENTS 301
While C++ supports pass by reference for output parameters, C does not. In C, a pass by ref-
erence is simulated by passing a pointer to the variable to the function. Then the function can
change the variable, thus effecting an output. At the assembly language level, pass by reference
is implemented in C++ by passing a pointer, so these two rules can be restated:
Some languages, e.g., ADA, also support passing an “update.” In this case the function re-
places the original value with a new value that depends upon the original value. Passing an
argument for update is also implemented by passing its address.
There is an important exception to the rule of passing a copy for inputs. When the amount of
data is large, making a copy of it is inefficient. So we organize it into a single entity and pass
the address of that entity.
The most common example of this is an array. In fact, it is so common that in C arrays are
automatically passed by address. Thus, in C/C++
void f(int a, int b[]);
----
int x;
int y[100];
----
f(x, y);
----
7 #include <stdio.h>
8 #include "loadStruct1.h" // includes struct theTag def.
9
10 int main(void)
11 {
12 struct theTag x;
302 CHAPTER 13. DATA STRUCTURES
13 struct theTag y;
14
22 return 0;
23 }
1 /*
2 * structPass1.h
3 * Assigns values to the fields of a struct.
4 *
5 * precondition
6 * aStruct is the address of a theTag struct
7 * postcondition
8 * firstChar is stored in the aByte field of aStruct
9 * aNumber is stored in the anInt field of aStruct
10 * secondChar is stored in the anotherByte field of aStruct
11 * Bob Plantz - 16 June 2009
12 */
13
14 #ifndef LOADSTRUCT_H
15 #define LOADSTRUCT_H
16
17 struct theTag {
18 char aByte;
19 int anInt;
20 char anotherByte;
21 };
22
1 /*
2 * loadStruct1.c
3 * Assigns values to the fields of a struct.
4 *
5 * precondition
6 * aStruct is the address of a theTag struct
7 * postcondition
8 * firstChar is stored in the aByte field of aStruct
9 * aNumber is stored in the anInt field of aStruct
10 * secondChar is stored in the anotherByte field of aStruct
11 * Bob Plantz - 16 June 2009
12 */
13
20 aStruct->anInt = aNumber;
21 aStruct->anotherByte = secondChar;
22 }
Listing 13.9: Passing struct variables (C). (There are three files here.)
In Listing 13.10 we examine the compiler-generated assembly language for the loadStruct
function.
1 .file "loadStruct1.c"
2 .text
3 .globl loadStruct
4 .type loadStruct, @function
5 loadStruct:
6 pushq %rbp
7 movq %rsp, %rbp
8 movq %rdi, -8(%rbp) # save address of struct
9 movl %edx, -16(%rbp) # save aNumber
10 movb %sil, -12(%rbp) # save firstChar
11 movb %cl, -20(%rbp) # save secondChar
12 movq -8(%rbp), %rdx # load struct addresss
13 movzbl -12(%rbp), %eax # load firstChar
14 movb %al, (%rdx) # aStruct->aByte = firstChar;
15 movq -8(%rbp), %rdx # load struct addresss
16 movl -16(%rbp), %eax # load firstChar
17 movl %eax, 4(%rdx) # aStruct->anInt = aNumber;
18 movq -8(%rbp), %rdx # load struct addresss
19 movzbl -20(%rbp), %eax # load firstChar
20 movb %al, 8(%rdx) # aStruct->anotherByte = secondChar;
21 leave
22 ret
23 .size loadStruct, .-loadStruct
24 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
25 .section .note.GNU-stack,"",@progbits
Listing 13.10: Passing struct variables (gcc assembly language). Only the loadStruct function
is shown.
The type declaration in the function signature, struct theTag* aStruct, together with the struct
definition in the header file, loadStruct1.h, tell the compiler what offsets to use for the struct
fields on lines 14, 17, and 20.
We have already covered all the assembly language instructions and addressing modes needed
to express the program in Listing 13.9 in assembly language. However, the .include assembler
directive will make things much easier. The syntax is
.include "filename"
which causes the assembler to insert everything in the file named “filename” into the source
at the point of the .include directive. This assembler directive is essentially the same as
the #include directive in C/C++. The assembly language version of our structPass program
is shown in Listing 13.11.
Pay particular attention to the header file, loadStruct.h, which defines the offset to each field
within the struct and provides overall size of the struct. This header file must be .included in
any file that uses the struct.
Note that specifying the overall size of the struct makes it easier to allocate space for it. For
example, we use
8 .equ y,x-structSize # Space for y struct
9 .equ x,-structSize # Space for x struct
304 CHAPTER 13. DATA STRUCTURES
in Listing 13.11 to compute the offsets to the x and y variables in the stack frame.
1
2 # loadStruct2.h
3 # The struct definition
4 # Bob Plantz - 16 June 2009
5
6 # struct definition
7 .equ aByte,0
8 .equ anInt,4
9 .equ anotherByte,8
10 .equ structSize,16
1 # structPass2.s
2 # Demonstrates passing structs as arguments in assembly language
3 # Bob Plantz - 16 June 2009
4
5 .include "loadStruct2.h"
6
7 # Stack frame
8 .equ y,x-structSize # Space for y struct
9 .equ x,-structSize # Space for x struct
10 .equ round16,-16 # 0xfffffffffffffff0
11 .equ passArgs,-16 # Space for passing args
12 # Read only data
13 .data
14 formatString:
15 .string "x: %c, %i, %c and y: %c, %i, %c\n"
16 # Code
17 .text
18 .globl main
19 .type main, @function
20 main:
21 pushq %rbp # save frame pointer
22 movq %rsp, %rbp # our frame pointer
23 addq $y, %rsp # local variables
24 andq $round16, %rsp # round down to 16-byte boundary
25 addq $passArgs, %rsp # for passing 7th arg
26
39 # print values
40 movq $formatString, %rdi
41 leaq x(%rbp), %rax # the x struct
42 movb aByte(%rax), %sil
43 movl anInt(%rax), %edx
44 movb anotherByte(%rax), %cl
13.3. STRUCTS AS FUNCTION ARGUMENTS 305
1 # loadStruct2.s
2 # Stores values in struct fields
3 # Calling sequence:
4 # rdi <- address of the struct
5 # sil <- first character to be stored
6 # edx <- integer to be stored
7 # cl <- second character to be stored
8 # call loadStruct
9 # Bob Plantz - 16 June 2009
10
11 .include "loadStruct2.h"
12
13 .text
14 .globl loadStruct
15 .type loadStruct, @function
16 loadStruct:
17 pushq %rbp # save caller’s frame pointer
18 movq %rsp, %rbp # our frame pointer
19
Listing 13.11: Passing struct variables (programmer assembly language). (There are three files
here.)
deserve some discussion here. After allocating space for the two structs on the stack, there is
no way to know if the stack pointer is aligned on a 16-byte boundary. If it is not, the lowest-
order four bits will be non-zero. The andq instruction sets these bits to zero, thus rounding the
address down to the next lower 16-byte address boundary. Notice that this works because the
stack grows toward numerically lower addresses.
The prologue and epilogue in loadStruct are not really needed in this simple function. But
it is good to get in the habit of coding them into all your functions. It certainly has a negligible
effect on execution time, and they help establish a structure to the function if it is ever changed.
306 CHAPTER 13. DATA STRUCTURES
1 /*
2 * incFraction.cc
3 * Gets a fraction from user and increments by one
4 * Bob Plantz - 18 June 2009
5 */
6
7 #include "fraction.h"
8 #include "writeStr.h"
9
10 int main(void)
11 {
12 // char array is used because writeStr takes
13 // a pointer to a C-style string.
14 char newline[] = "\n";
15 fraction x;
16
17 x.get();
18 x.add(1);
19 x.display();
20 writeStr(newline);
21 return 0;
22 }
1 /*
2 * fraction.h
3 * simple fraction class
4 * Bob Plantz - 18 June 2009
5 */
6
7 #ifndef FRACTION_H
8 #define FRACTION_H
9
10 class fraction
11 {
12 public:
13 fraction(); // default constructor
14 void get(); // gets user’s values
15 void display(); // displays fraction
16 void add(int); // adds integer
17 private:
13.4. STRUCTS AS C++ OBJECTS 307
22 #endif
1 /*
2 * fraction.cc
3 * simple fraction class
4 * Bob Plantz - 18 June 2009
5 */
6
7 #include "writeStr.h"
8 #include "getUint.h"
9 #include "putUint.h"
10 #include "fraction.h"
11
12 fraction::fraction()
13 {
14 num = 0;
15 den = 1;
16 }
17
18 void fraction::get()
19 {
20 // char arrays are used because writeStr takes
21 // a pointer to a C-style string.
22 char numMsg[] = "Enter numerator: ";
23 char denMsg[] = "Enter denominator: ";
24
25 writeStr(numMsg);
26 num = getUint();
27
28 writeStr(denMsg);
29 den = getUint();
30 }
31
32 void fraction::display()
33 {
34 // char array is used because writeStr takes
35 // a pointer to a C-style string.
36 char over[] = "/";
37
38 putUint(num);
39 writeStr(over);
40 putUint(den);
41 }
42
Listing 13.12: Add 1 to user’s’ fraction (C++). The C functions getUint, putUint, and writeStr
are not shown here. (There are three files here.)
Let us consider the main function and see how arguments are passed on the stack. Recall
308 CHAPTER 13. DATA STRUCTURES
calls the constructor function. As we said above, the address of the object (actually a struct) is
passed to the constructor function, even though it is not explicitly stated in the object declaration
statement. When program flow is passed to the constructor, the address of the x object is placed
in the rdi register. The same thing occurs when the other member functions are called. The
add member function takes an explicit argument, which is actually the second argument to the
function, so is passed in the rsi register.
Before showing how the program of Listing 13.12 could be implemented in assembly lan-
guage, we look at a C implementation, since we already know a lot about the transition from C
to assembly language.
In C, the member data would be explicitly implemented as a struct. Implementing the mem-
ber functions in C may seem very straightforward, but there is an important issue to consider
— how do the member functions gain access to the data members? Since the data members are
organized as a struct, passing its address as an argument to the member functions will allow
each of them access to the data members. This is effectively what C++ does. The address of the
“object” (actually, a struct) is passed as an implicit argument to each member function.
Another issue arises if you think about the possible names of member functions. Different
C++ classes can have the same member function names, but functions in C do not belong to
any class, so each must have a unique name. (Actually, “free” functions in C++ must also have
unique names.) The C++ compiler takes care of this by adding the class name to the member
function name. This is called name mangling. (There is no standard for how this is actually
done, so each compiler may do it differently.) We do our own “name mangling” for the C version
of the program as shown in Listing 13.13.
1 /*
2 * createFraction.c
3 * creates a fraction and gets user’s values
4 * Bob Plantz - 16 June 2009
5 */
6
7 #include "fraction.h"
8 #include "fractionGet.h"
9 #include "fractionAdd.h"
10 #include "fractionDisplay.h"
11 #include "writeStr.h"
12
13 int main(void)
14 {
15 struct fraction x;
16
17 fraction(&x); // "constructor"
18 fractionGet(&x);
19 fractionAdd(&x, 1);
20 fractionDisplay(&x);
21 writeStr("\n");
22 return 0;
23 }
1 /*
2 * fraction.h
3 * A fraction "constructor" in C
4 * Bob Plantz - 16 June 2009
5 */
6
7 #ifndef FRACTION_H
13.4. STRUCTS AS C++ OBJECTS 309
8 #define FRACTION_H
9
10 struct fraction
11 {
12 int num;
13 int den;
14 };
15
18 #endif
1 /*
2 * fraction.c
3 * A fraction "constructor" in C
4 * Bob Plantz - 16 June 2009
5 */
6
7 #include "fraction.h"
8
1 /*
2 * fractionGet.h
3 * Gets numerator and denominator from user.
4 * Bob Plantz - 16 June 2009
5 */
6
7 #ifndef FRACTION_ADD_H
8 #define FRACTION_ADD_H
9
10 #include "fraction.h"
11
14 #endif
1 /*
2 * fractionGet.c
3 * Gets user values for a fraction
4 * Bob Plantz - 16 June 2009
5 */
6
7 #include "writeStr.h"
8 #include "getUint.h"
9 #include "fractionGet.h"
10
17 this->den = getUint();
18 }
1 /*
2 * fractionAdd.h
3 * adds an integer to the fraction
4 * Bob Plantz - 16 June 2009
5 */
6
7 #ifndef FRACTION_ADD_H
8 #define FRACTION_ADD_H
9 #include "fraction.h"
10
13 #endif
1 /*
2 * fractionAdd.c
3 * adds an integer to the fraction
4 * Bob Plantz - 16 June 2009
5 */
6
8 #include "fractionAdd.h"
9
1 /*
2 * fractionDisplay.h
3 * Displays a fraction in num/den format
4 * Bob Plantz - 16 June 2009
5 */
6
7 #ifndef FRACTION_DISPLAY_H
8 #define FRACTION_DISPLAY_H
9
10 #include "fraction.h"
11
14 #endif
1 /*
2 * fractionDisplay.c
3 * Displays a fraction in num/den format
4 * Bob Plantz - 16 June 2009
5 * precondition
6 * this points to fraction, both num and den within 0 - 9
7 * postcondition
8 * num/den displayed on the screen
9 */
10
13.4. STRUCTS AS C++ OBJECTS 311
11 #include "writeStr.h"
12 #include "putUint.h"
13 #include "fractionDisplay.h"
14
Listing 13.13: Add 1 to user’s’ fraction (C). (There are nine files here.)
Notice the use of the this pointer in the C equivalents of the “member” functions. Its place
in the parameter list coincides with the “implicit” argument to C++ member functions — that
is, the address of the object. The this pointer is implicitly available for use within C++ member
functions. Its use depends upon the specific algorithm. Listing 13.13 should give you a good
idea of how C++ implements objects.
From the C version in Listing 13.13 it is straightforward to move to the assembly language
version in Listing 13.14.
1 # incFraction.s
2 # adds one to a fraction
3 # Bob Plantz - 18 June 2009
4
38
1 # fraction.h
2 # simple fraction class
3 # Bob Plantz - 18 June 2009
4
5 # struct definition
6 .equ num,0 # numerator
7 .equ den,4 # denominator
8 .equ fractionSize,8 # total size needed for struct
1 # fraction.s
2 # constructs a fraction to be 0/1
3 # Bob Plantz - 18 June 2009
4 # Calling sequence:
5 # rdi <- address of object
6 # call decToUInt
7 # Include object data definition
8 .include "fraction.h"
9 # Read only data
10 .section .rodata
11 zero: .long 0
12 one: .long 1
13 # Code
14 .text
15 .globl fraction
16 .type fraction, @function
17 fraction:
18 pushq %rbp # save frame pointer
19 movq %rsp, %rbp # our frame pointer
20
1 # fractionGet.s
2 # gets user values for a fraction
3 # Bob Plantz - 18 June 2009
4 # Calling sequence:
5 # rdi <- address of object
6 # call fractionGet
7 # Include object data definition
8 .include "fraction.h"
9 # local register save area
10 .equ this,-8 # pointer to object
11 .equ localSize,-16
12 # Read only data
13 .section .rodata
13.4. STRUCTS AS C++ OBJECTS 313
14 numPrompt:
15 .string "Enter numerator: "
16 denPrompt:
17 .string "Enter denominator: "
18 # Code
19 .text
20 .globl fractionGet
21 .type fractionGet, @function
22 fractionGet:
23 pushq %rbp # save frame pointer
24 movq %rsp, %rbp # our frame pointer
25 addq $localSize, %rsp
26 movq %rdi, this(%rbp) # save this pointer
27
1 # fractionDisplay.s
2 # Displays a fraction in num/den format
3 # Bob Plantz - 18 June 2009
4 # Calling sequence:
5 # rdi <- address of object
6 # call fractionDisplay
7 # Include object data definition
8 .include "fraction.h"
9 # local register save area
10 .equ this,-8 # pointer to object
11 .equ localSize,-16
12 # Read only data
13 .section .rodata
14 slash:
15 .string " / "
16 # Code
17 .text
18 .globl fractionDisplay
19 .type fractionDisplay, @function
20 fractionDisplay:
21 pushq %rbp # save frame pointer
22 movq %rsp, %rbp # our frame pointer
23 addq $localSize, %rsp # local vars
24 movq %rdi, this(%rbp) # save this pointer
25
28
1 # fractionAdd.s
2 # adds input value to a fraction
3 # Bob Plantz - 18 June 2009
4 # Calling sequence:
5 # esi <- int to be added
6 # rdi <- address of object
7 # call fractionAdd
8 # Include object data definition
9 .include "fraction.h"
10 # local register save area
11 .equ this,-8 # pointer to object
12 # Code
13 .text
14 .globl fractionAdd
15 .type fractionAdd, @function
16 fractionAdd:
17 pushq %rbp # save frame pointer
18 movq %rsp, %rbp # our frame pointer
19 movq %rdi, this(%rsp) # save this pointer in
20 # red zone
21 movl %esi, %eax # int to be added
22 mull den(%rdi) # times denominator
23 movq this(%rsp), %rdi # restore this pointer
24 addl %eax, num(%rdi) # add to numerator
25
Listing 13.14: Add 1 to user’s’ fraction (programmer assembly language). (There are six files
here; note that the assembly language header file, fraction.h, is different from
the C++ version.)
The “header” file, fraction.h, contains offsets for the fields of the struct that defines the state
variables for a fraction object. Notice on line 8 that it includes a symbolic name for the size of
the object.
8 .equ fractionSize,8 # total size needed for struct
We then do:
20 # Stack frame
13.5. INSTRUCTIONS INTRODUCED THUS FAR 315
technique to allocate space on the stack and make sure the stack pointer is on a 16-byte bound-
ary.
In the fraction constructor function we see the use of the field names that are defined in the
header file to access the state variables of the object:
The fraction_add function is a leaf function. So we use the red zone for saving the this
pointer:
16 fraction_add:
17 pushq %rbp # save base pointer
18 movq %rsp, %rbp # our frame pointer
19 movq %rdi, this(%rsp) # save this pointer in
20 # red zone
This summary shows the assembly language instructions introduced thus far in the book. The
page number where the instruction is explained in more detail, which may be in a subsequent
chapter, is also given. This book provides only an introduction to the usage of each instruction.
You need to consult the manuals ([2] – [6], [14] – [18]) in order to learn all the possible uses of
the instructions.
13.5.1 Instructions
data movement:
opcode source destination action see page:
cmovcc %reg/mem %reg conditional move 230
movs $imm/%reg %reg/mem move 141
movsss $imm/%reg %reg/mem move, sign extend 216
movzss $imm/%reg %reg/mem move, zero extend 217
popw %reg/mem pop from stack 163
pushw $imm/%reg/mem push onto stack 163
s = b, w, l, q; w = l, q; cc = condition codes
316 CHAPTER 13. DATA STRUCTURES
arithmetic/logic:
opcode source destination action see page:
adds $imm/%reg %reg/mem add 189
adds mem %reg add 189
ands $imm/%reg %reg/mem bit-wise and 258
ands mem %reg bit-wise and 258
cmps $imm/%reg %reg/mem compare 209
cmps mem %reg compare 209
decs %reg/mem decrement 220
divs %reg/mem unsigned divide 280
idivs %reg/mem signed divide 282
imuls %reg/mem signed multiply 276
incs %reg/mem increment 220
leaw mem %reg load effective address 167
subs $imm/%reg %reg/mem subtract 190
muls %reg/mem unsigned multiply 275
negs %reg/mem negate 286
ors $imm/%reg %reg/mem bit-wise inclusive or 258
ors mem %reg bit-wise inclusive or 258
sals $imm/%cl %reg/mem shift arithmetic left 269
sars $imm/%cl %reg/mem shift arithmetic right 268
shls $imm/%cl %reg/mem shift left 269
shrs $imm/%cl %reg/mem shift right 268
subs mem %reg subtract 190
tests $imm/%reg %reg/mem test bits 210
tests mem %reg test bits 210
xors $imm/%reg %reg/mem bit-wise exclusive or 258
xors mem %reg bit-wise exclusive or 258
s = b, w, l, q; w = l, q
13.6 Exercises
13-1 (§13.1) Write a program in assembly language that allocates a twenty-five element integer
array and stores the index value in each element. That is, the first element will be assigned
zero, the second element one, etc. Use four-byte integers.
After the array has been completely initialized, display the contents of the array in a
column.
13-2 (§13.1) Write a program in assembly language that allocates a ten element integer array
and prompts the user to enter an integer to be stored in each element of the array. Use
four-byte integers.
After the user’s values have been stored in the array, compute the sum of the integers. (Do
not accumulate the sum as the numbers are entered.) Display the sum.
13-3 (§13.1) Write a program in assembly language that allocates a ten element integer array
and prompts the user to enter an integer to be stored in each element of the array. Use
four-byte integers.
After the user’s values have been stored in the array, compute the average of the integers.
(Do not accumulate the sum as the numbers are entered.) Display the average.
13-4 (§13.2) Modify the program in Listing 13.6 so that it displays the total number of bytes
allocated for the struct. Hint: use the C sizeof operator.
13-5 (§13.2) Modify the program of Exercise 13-4 such that it also displays the offset of each
field within a struct. Hint: use the C & operator to get addresses.
13-6 (§13.2) Enter the program in Listing 13.8 and make sure that you understand how it
works.
318 CHAPTER 13. DATA STRUCTURES
13-7 (§13.3) Enter the three files from Listing 13.11 and get the program to work.
a) Create a makefile to assemble and link the files into a program.
b) Using the debugger, gdb, set breakpoints in the main function at each call to loadStruct
and at the instruction immediately following each call.
c) Use the debugger to observe the values that are stored in the fields of the aByte, anInt,
and anotherByte fields each time loadStruct is called. Hint: Note the address in rdi
just before executing each function call.
13-8 (§13.3) Modify the program in Listing 13.8 such that it has separate functions for:
• getting the data from the user for a struct, and
• displaying the data in a struct.
Add two more struct variables to the program. Your program will then call the first func-
tion three times, once for each variable. Then it calls the second function three times, also
once for each variable.
13-9 (§13.3) Write a program in assembly language that allocates three variables of the type:
struct item {
char name[50];
int cost;
};
That is, each variable will have two fields, one for the name of the item, and one for its
cost.
The user will be prompted to enter the name and cost of each item, and the user’s input
will be stored in the respective variables.
After the data for all three items is entered, the program will list the name and cost of
each of the three items and then display the total cost for all three items.
13-10 (§13.4) Implement the program in Listing 13.14 such that it allows the user to enter an
integer value to be added to the fraction.
13-11 (§13.4) Modify the program in Exercise 13-10 such that it allows the user to enter a
fractional value, then adds the two fractions.
13-12 (§13.4) Write a program that allows the user to maintain an address book. Each entry
into the address book should allow the user to enter
• 48 characters for the name
• 80 characters for the street address
• 24 characters for the city
• 2 characters for the state (abbreviation)
• 5 characters for the zip code
The user should be able to display the entries.
13-13 (§13.4) Modify the program in Exercise 13-12 so that the user can sort the address book
entries on any of the five fields.
13-14 (§13.4) Write a program that allows the user to set up and maintain two bank accounts.
Each account should have a unique account name. The user should be able to credit or
debit the account.
13-15 (§13.4) Modify the program in Exercise 13-14 so that it requires a pin number in order to
access each of the bank accounts.
Chapter 14
Fractional Numbers
So far in this book we have used only integers for numerical values. In this chapter you will see
two methods for storing fractional values — fixed point and floating point. Storing numbers in
fixed point format requires that the programmer keep track of the location of the binary point1
within the bits allocated for storage. In the floating point format, the number is essentially
written in scientific format and both the significand2 and exponent are stored.
because
d−1 × 2−1 = 1 × 0.5
−2
d−2 × 2 = 0 × 0.25
−3
d−3 × 2 = 1 × 0.125
d−4 × 2−4 = 1 × 0.0625
and thus
0.1011 2 = 0.510 + 0.12510 + 0.062510
= 0.687510
See Exercise 14-1 for an algorithm to convert decimal fractions to binary. We assume that
you can convert the integral part and that Equation 14.1 is sufficient for converting from binary
to decimal.
Although any integer can be represented as a sum of powers of two, an exact representation
of fractional values in binary is limited to sums of inverse powers of two. For example, consider
an 8-bit representation of the fractional value 0.9. From
0.11100110 2 = 0.8984375010
0.11100111 2 = 0.9023437510
1 The binary point is equivalent to the decimal point when a number is stored in binary. In particular, it separates
the integral and fractional parts.
2 This is often called the “mantissa,” which means the fractional part of a logarithm.
319
320 CHAPTER 14. FRACTIONAL NUMBERS
In fact,
0.910 = 0.1110011002
We note here that two’s complement also works correctly for storing negative fractional val-
ues. You are asked to show this in Exercise 14-2.
8 int main(void)
9 {
10 int x, y, fraction_part, sum;
11
26 sum = x + y;
27 printf("Their sum is %d and %d/16 inches\n",
28 (sum >> 4), (sum & 0xf));
29
30 return 0;
31 }
Listing 14.1: Fixed point addition. The high-order 28 bits are used for the integral part, the
low-order 4 for the fractional part.)
The numbers are input to the nearest 1/16th inch, so the programmer has allocated four bits for
the fractional part. This leaves 28 bits for the integral part. After the integral part is read, the
stored number must be shifted four bit positions to the left to put it in the high-order 28 bits.
Then the fractional part (in number of sixteenths) is added into the low-order four bits with
a simple bit-wise or operation. Printing the answer also requires some bit shifting and some
masking to filter out the fractional part.
This is clearly a contrived example. A program using floats would work just as well and
be somewhat easier to write. However, the program in Listing 14.1 uses integer instructions,
which execute faster than floating point. The hardware issues have become less significant in
recent times. Modern CPUs use various parallelization schemes such that a mix of floating point
and integer instructions may actually execute faster than only integer instructions. Fixed point
arithmetic is often used in embedded applications where the CPU is small and may not have
floating point capabilities.
Notice that the number is normalized such that only one digit appears to the left of the decimal
point. The exponent of 10 is adjusted accordingly.
If we agree that each number is normalized and that we are working in base 10, then each
floating point number is completely specified by three items:
1. The significand.
2. The exponent.
3. The sign.
That is, in the above examples
• 1024, 3, and + represent 1.024 × 103 (The “+” is understood.)
• 89372, -5, and - represent 8.9372 × 10−5
The advantage of using a floating point format is that, for a given number of digits, we can
represent a larger range of values. To illustrate this, consider a four-digit, unsigned decimal
system. The range of integers that could be represented is
0 ≤ N ≤ 9999
Now, let’s allocate two digits for the significand and two for the exponent. We will restrict
our scheme to unsigned numbers, but we will allow negative exponents. So we will need to use
one of the exponent digits to store the sign of the exponent. We will use 0 for positive and 1 for
negative. For example, 3.9 × 10−4 would be stored
exponent
sign
significand B exponent
H
AU Hj
H BN
3 9 1 4
where each box holds one decimal digit. Some other examples are:
Our normalization scheme requires that there be a single non-zero digit to the left of the decimal
point. We should also allow the special case of 0.0:
A little thought shows that this scheme allows numbers in the range
That is, we have increased the range of possible values by a factor of 1014 !
However, it is important to realize that in both storage schemes, the integer and the floating
point, we have exactly the same number of possible values — 104 .
Although floating point formats can provide a much greater range of numbers, the distance
between any two adjacent numbers depends upon the value of the exponent. Thus, floating point
is generally less accurate than an integer representation, especially for large numbers.
14.4. IEEE 754 323
To see how this works, let’s look at a plot of numbers (using our current scheme) in the range
Notice that the larger numbers are further apart than the smaller ones. (See Exercise 14-7 after
you read Section 14.4.)
Let us pick some numbers from this range and perform some addition.
If we add these values, we get 0.91 + 0.93 = 1.84. Now we need to round off our “paper” addition
in order to fit this result into our current floating point scheme:
and adding these values, we get 0.94 + 0.93 = 1.87. Rounding off, we get:
So we see that starting with two values expressed to the nearest 1/100th, their sum is accurate
only to the nearest 1/10.
To compare this with fixed point arithmetic, we could use the same four digits to store 0.93
this way
It is clear that this storage scheme allows us to perform both additions (0.91 + 0.93 and 0.94 +
0.93) and store the results exactly.
These round off errors must be taken into account when performing floating point arithmetic.
In particular, the errors can occur in intermediate results when doing even moderately complex
computations, where they are very difficult to detect.
In the IEEE 754 4-byte format, one bit is used for the sign, eight for the exponent, and
twenty-three for the significand. The IEEE 754 8-byte format specifies one bit for the sign,
eleven for the exponent, and fifty-two for the significand.
In this section we describe the 4-byte format in order to save ourselves (hand) computation
effort. The goal is to get a feel for the limitations of floating point formats. The normalized form
of the number in binary is given by Equation 14.2.
(a) s e+127 f
31 30 23 22 0
(b) s e+1023 f
63 62 52 51 0
Figure 14.1: IEEE 754 bit patterns. (a) Float. (b) Double.
As in decimal, the exponent is adjusted such that there is only one non-zero digit to the left
of the binary point. In binary, though, this digit is always one. Since it is always one, it need not
be stored. Only the fractional part of the normalized value needs to be stored as the significand.
This adds one bit to the significance of the fractional part. The integer part (one) that is not
stored is sometimes called the hidden bit.
The sign bit, s, refers to the number. Another mechanism is used to represent the sign of
the exponent, e. Your first thought is probably to use two’s complement. However, the IEEE
format was developed in the 1970s, when floating point computations took a lot of CPU time.
Many algorithms depend upon only the comparison of two numbers, and the computer scientists
of the day realized that a format that allowed integer comparison instructions would result in
faster execution times. So they decided to store a biased exponent as an unsigned int. The
amount of the bias is one-half the range allocated for the exponent. In the case of an 8-bit
exponent, the bias amount is 127.
Example 14-a
97.812510 = 1100001.1101 2
= (−1)0 × 1100001.1101 × 20
s = 0
e + 127 = 6 + 127
= 133
= 10000101 2
f = 10000111010000000000000
14.4. IEEE 754 325
Finally, use Figure 14.1 to place the bit patterns. (Remember that the hidden bit is not stored;
it is understood to be there.)
Example 14-b
Using IEEE 754 32-bit format, what decimal number does the bit pattern 3e40000016 represent?
First, convert the hexadecimal to binary, using spaces suggested by Figure 14.1.
s = 0
e + 127 = 01111100 2
= 12410
e = −310
f = 10000000000000000000000
Finally, plug these values into Equation 14.2. (Remember to add the hidden bit.)
Example 14-c
Using IEEE 754 32-bit format, what decimal number would the bit pattern 0000000016 repre-
sent? (The specification states that it is an exception to the format and is defined to represent
0.0. This example provides some motivation for this exception.)
s = 0
e + 127 = 00000000 2
e = −12710
f = 00000000000000000000000
Finally, plug these values into Equation 14.2. (Remember to add the hidden bit.)
This last example illustrates a problem with the hidden bit — there is no way to represent
zero. To address this issue, the IEEE 754 standard has several special cases.
• Zero — all the bits in the exponent and significand are zero. Notice that this allows for
-0.0 and +0.0, although (-0.0 == +0.0) computes to true.
326 CHAPTER 14. FRACTIONAL NUMBERS
• Denormalized — all the bits in the exponent are zero. In this case there is no hidden bit.
Zero can be thought of as a special case of denormalized.
• Infinity — all the bits in the exponent are one, and all the bits in the significand are zero.
The sign bit allows for −∞ and +∞.
• NaN — all the bits in the exponent are one, and the significand is non-zero. This is used
when the results of an operation are undefined. For example, ±nonzero ÷ 0.0 yields infin-
ity, but ±0.0 ÷ ±0.0 yields NaN.
• SSE2 instructions operate on 32-bit or 64-bit values.5 Four 32-bit values or two 64-bit
values can be processed simultaneously.
All three floating point instruction sets include a wide variety of instructions to perform the
following operations:
• Move data from memory to a register, from a register to memory, and from a register to
another register.
• Convert data from integer to floating point, and from floating point to integer formats.
• Perform the usual add, subtract, multiply, and divide arithmetic operations. They also
provide square root instructions.
In addition, the x87 includes instructions for transcendental functions — sine, cosine, tangent,
and arc tangent, and logarithm functions.
We will not cover all the instructions in this book. The following subsections provide an
introduction to how each of the three sets of instructions is used. See the manuals [2] – [6] and
[14] – [18] for details.
5 Newer x86-64 processors have later versions of SSE, but SSE2 is part of the definition of x86-64, so it is the only
and the SSE compare instructions also affect the status flags in the rflags register. Thus the
regular conditional jump instructions (Section 10.1.2, page 211) are used to control program
flow based on floating-point computations.
The instruction mnemonics used by the gnu assembler are mostly the same as given in the
manuals, [2] – [6] and [14] – [18]. Since they are quite descriptive with respect to operand sizes,
a size letter is not appended to the mnemonic, except when one of the operands is in memory
and the size is ambiguous. Of course, the operand order used by the gnu assembler is still
reversed compared to the manufacturers’ manuals, and the register names are prefixed with
the “%” character.
A very important set of instructions provided for working with floating point values are those
to convert between integer and floating point formats. The scalar conversion SSE2 instructions
are shown in Table 14.2.
Data movement and arithmetic instructions must distinguish between scalar and vector op-
erations on values in the xmm registers. The low-order portion of the register is used for scalar
operations. Vector operations are performed on multiple data values packed into a single regis-
ter. See Table 14.3 for a sampling of SSE2 data movement and arithmetic instructions.
328 CHAPTER 14. FRACTIONAL NUMBERS
Table 14.2: SSE scalar floating point conversion instructions. Source and destination xmm reg-
isters must be different. The low-order portion of the xmm register is used. When
reading from or writing to memory, the “q” suffix is used to designate a 64-bit value.
Table 14.3: Some SSE floating point arithmetic and data movement instructions. Source and
destination xmm registers must be different. Scalar instructions use the low-order
portion of the xmm registers.
Notice that the code for the basic operation is followed by a “p” or “s” for “packed” or “scalar.”
This character is then followed by a “d” or “s” for “double” (64-bit) or “single” (32-bit) data item.
14.5. FLOATING POINT HARDWARE 329
We will use the program in Listing 14.2 to illustrate a few floating point operations.
1 /*
2 * frac2float.c
3 * Converts fraction to floating point.
4 * Bob Plantz - 18 June 2009
5 */
6
7 #include <stdio.h>
8
9 int main(void)
10 {
11 int x, y;
12 double z;
13
Compiling this program in 64-bit mode produced the assembly language in Listing 14.3.
1 .file "frac2float.c"
2 .section .rodata
3 .LC0:
4 .string "Enter two integers: "
5 .LC1:
6 .string "%i %i"
7 .LC2:
8 .string "%i / %i = %lf\n"
9 .text
10 .globl main
11 .type main, @function
12 main:
13 pushq %rbp
14 movq %rsp, %rbp
15 subq $16, %rsp
16 movl $.LC0, %edi
17 movl $0, %eax
18 call printf
19 leaq -8(%rbp), %rdx # address of y
20 leaq -4(%rbp), %rsi # address of x
21 movl $.LC1, %edi
22 movl $0, %eax # no xmm arguments
23 call scanf
24 movl -4(%rbp), %eax # load x
25 cvtsi2sd %eax, %xmm1 # convert x to double
26 movl -8(%rbp), %eax # load y
27 cvtsi2sd %eax, %xmm0 # convert y to double
28 movapd %xmm1, %xmm2 # move aligned packed double
29 divsd %xmm0, %xmm2 # z = (double)x / y;
30 movapd %xmm2, %xmm0 # move aligned packed double
31 movsd %xmm0, -16(%rbp) # store z
32 movl -8(%rbp), %edx # load y
330 CHAPTER 14. FRACTIONAL NUMBERS
Before the division is performed, both integers must be converted to floating point. This
takes place on lines 24 – 27:
24 movl -4(%rbp), %eax # load x
25 cvtsi2sd %eax, %xmm1 # convert x to double
26 movl -8(%rbp), %eax # load y
27 cvtsi2sd %eax, %xmm0 # convert y to double
The cvtsi2sd instruction on lines 25 and 27 converts a signed integer to a scalar double-precision
floating point value. The signed integer can be either 32 or 64 bits and can be located in a gen-
eral purpose register or in memory. The double-precision float will be stored in the low-order 64
bits of the specified xmmn register. The high-order 64 bits to the xmmn register are not changed.
The division on line 29 leaves the result in the low-order 64 bits of xmm2, which is then stored
in z:
29 divsd %xmm0, %xmm2 # z = (double)x / y;
30 movapd %xmm2, %xmm0 # move aligned packed double
31 movsd %xmm0, -16(%rbp) # store z
The movapd instruction moves the entire 128 bits, and the movsd instruction moves only the
low-order 64 bits.
The floating point arguments are passed in the registers xmm0, xmm1, . . . , xmm15 in left-to-right
order. So the value of z is loaded into the (xmm0) register for passing to the printf function, and
the number of floating point values passed to it must be stored in eax:
34 movsd -16(%rbp), %xmm0 # load z
35 movl $.LC2, %edi
36 movl $1, %eax # one xmm argument (in xmm0)
37 call printf
Example 14-d
Show how 97.8125 is stored in 80-bit extended IEEE 754 binary format.
97.812510 = 1100001.1101 2
= (−1)0 × 1100001.1101 × 20
Compute s, e+16383.
s = 0
e + 16383 = 6 + 16383
= 16389
= 100000000000101 2
The 16-bit Floating Point Unit Status Word register shows the results of floating point oper-
ations. The meaning of each bit is shown in Table 14.4.
Figure 14.2 shows a pictorial representation of the floating point registers. The absolute
locations are named fpr0, fpr1,. . . ,fpr7 in this figure. The floating point registers are accessed
by program instructions as a stack with st(0) being the register at the top of the stack. It
“grows” from higher number registers to lower. The TOP field (bits 13 – 11) in the FPU Status
Word holds the (absolute) register number that is currently the top of the stack. If the stack is
full, i.e., fpr0 is the top of the stack, a push causes the TOP field to roll over, and the next item
goes into register fpr7. (The value that was in fpr7 is lost.)
332 CHAPTER 14. FRACTIONAL NUMBERS
st(5) fpr0
status
word st(6) fpr1
st(1) fpr4
st(2) fpr5
st(3) fpr6
Figure 14.2: x87 floating point register stack. The fpri represent the absolute locations. The
st(j) are the stack names, which are used by the instructions. In this example the
top of the stack is at fpr3, as shown in bits 13 – 11 of the x87 status register.
The instructions that read data from memory automatically push the value onto the top of
the register stack. Arithmetic instructions are provided that operate on the value(s) on the top
of the stack. For example, the faddp instruction adds the two values on the top of the stack and
leaves their sum on the top. The stack has one less value on it. The original two values are gone.
Many floating point instructions allow the programmer to access any of the floating point
registers, %st(i), where i = 0...7, relative to the top of the stack. Fortunately, the programmer
does not need to keep track of where the top is. When using this format, %st(i) refers to the
ith register from the top of the stack. For example, if fpr3 is the current top of the stack, the
instruction
fadd %st(2), %st(0)
will add the value in the fpr5 register to the value in the fpr3 register, leaving the result in the
fpr3 register.
Table 14.5 provides some examples of the floating point instruction set. Notice that the
instructions that deal only with the floating point register stack do not use the size suffix letter,
s. To avoid ambiguity the gnu assembler requires a single letter suffix on the floating point
instructions that access memory. The suffixes are:
’s’ for single precision – 32-bit
’l’ for long (or double) precision – 64-bit
’t’ for ten-byte – 80-bit
Most of the floating point instructions have several variants. See [2] – [6] and [14] – [18] for
details. In general,
• Data cannot be moved directly between the integer and floating point registers. Only data
stored in memory or another floating point register can be pushed onto the floating point
register stack.
• Many floating point instructions have a pop variant. The mnemonic includes a ‘p’ after the
basic mnemonic, immediately before the size character. For example,
fistl someplace(%ebp)
converts the 80-bit floating point number in st(0) to a 32-bit integer and stores it at the
specified memory location. Using the pop variant,
fistpl someplace(%ebp)
does the same thing but also pops one from the floating point register stack.
14.5. FLOATING POINT HARDWARE 333
Table 14.5: A sampling of x87 floating point instructions. Size characters are: s = 32-bit, l =
64-bit, t = 80-bit.
Compiling the fraction conversion program of Listing 14.2 in 32-bit mode shows (Listing 14.4)
that the compiler uses the x87 floating-point instructions. This ensures backward compatibility
since the x86-32 architecture does not need to include SSE instructions.
1 .file "frac2float.c"
2 .section .rodata
3 .LC0:
4 .string "Enter two integers: "
5 .LC1:
6 .string "%i %i"
7 .LC2:
8 .string "%i / %i = %lf\n"
9 .text
10 .globl main
11 .type main, @function
12 main:
13 leal 4(%esp), %ecx
14 andl $-16, %esp
15 pushl -4(%ecx)
16 pushl %ebp
17 movl %esp, %ebp
18 pushl %ecx
19 subl $52, %esp
20 movl $.LC0, (%esp)
21 call printf
22 leal -16(%ebp), %eax # address of x
23 movl %eax, 8(%esp)
334 CHAPTER 14. FRACTIONAL NUMBERS
We add comments to lines 22 – 27 to show where the x and y variables are located in the stack
frame.
22 leal -12(%ebp), %eax # address of x
23 movl %eax, 8(%esp)
24 leal -8(%ebp), %eax # address of y
25 movl %eax, 4(%esp)
26 movl $.LC1, (%esp)
27 call scanf
Rather than actually push the arguments onto the stack, enough space was allocated on the
stack (line 19) to directly store the values in the location where they would be if they had been
pushed there. This is more efficient that pushing each argument.
Casting an int to a float requires a conversion in the storage format. This conversion is
done by the x87 FPU as an integer is pushed onto the floating point register stack using the
fildl instruction. This conversion can only be done to an integer that is stored in memory. The
compiler uses a location on the call stack to temporarily store each integer so it can be converted:
28 movl -8(%ebp), %edx # load y
29 movl -12(%ebp), %eax # load x
30 pushl %edx # put y into memory
31 fildl (%esp) # convert to 80-bit float
32 movl %eax, (%esp) # put x into memory
33 fildl (%esp) # convert to 80-bit float
34 leal 4(%esp), %esp # restore stack pointer
14.6. COMMENTS ABOUT NUMERICAL ACCURACY 335
The fildl instructions on lines 31 and 33 each convert a 32-bit integer to an 80-bit float and
pushes the float onto the x87 register stack. At this point the floating-point equivalent of x is
at the top of the stack, and the floating-point equivalent of y is immediately below it. Then the
floating-point division instruction:
divides the number at st(0) (the (0) can be omitted) by the number at st(1) and pops the x87
register stack so that the result is now at the top of the stack.
Finally, the fstpl instruction is used to pop the value off the top of the x87 register stack and
store it in memory — at its proper location on the call stack. The “l” suffix indicates that 64 bits
of memory should be used for storing the floating-point value. So the 80-bit value on the top of
the x87 register stack is rounded to 64 bits as it is stored in memory. The other three arguments
are also stored on the call stack.
Note that the 32-bit version of printf does not receive arguments in registers, so eax is not used.
• Try to scale the data such that integer arithmetic can be used.
• All floating point computations are performed in 80-bit extended format. So there is no
processing speed improvement from using floats instead of doubles.
• Try to arrange the order of computations so that similarly sized numbers are added or
subtracted.
• Avoid complex arithmetic statements, which may obscure incorrect intermediate results.
• Choose test data that “stresses” your algorithm. For example, 0.00390625 can be stored
exactly in eight bits, but 0.1 has no exact binary equivalent.
336 CHAPTER 14. FRACTIONAL NUMBERS
This summary shows the assembly language instructions introduced thus far in the book. The
page number where the instruction is explained in more detail, which may be in a subsequent
chapter, is also given. This book provides only an introduction to the usage of each instruction.
You need to consult the manuals ([2] – [6], [14] – [18]) in order to learn all the possible uses of
the instructions.
14.7.1 Instructions
data movement:
opcode source destination action see page:
cmovcc %reg/mem %reg conditional move 230
movs $imm/%reg %reg/mem move 141
movsss $imm/%reg %reg/mem move, sign extend 216
movzss $imm/%reg %reg/mem move, zero extend 217
popw %reg/mem pop from stack 163
pushw $imm/%reg/mem push onto stack 163
s = b, w, l, q; w = l, q; cc = condition codes
arithmetic/logic:
opcode source destination action see page:
adds $imm/%reg %reg/mem add 189
adds mem %reg add 189
ands $imm/%reg %reg/mem bit-wise and 258
ands mem %reg bit-wise and 258
cmps $imm/%reg %reg/mem compare 209
cmps mem %reg compare 209
decs %reg/mem decrement 220
divs %reg/mem unsigned divide 280
idivs %reg/mem signed divide 282
imuls %reg/mem signed multiply 276
incs %reg/mem increment 220
leaw mem %reg load effective address 167
muls %reg/mem unsigned multiply 275
negs %reg/mem negate 286
ors $imm/%reg %reg/mem bit-wise inclusive or 258
ors mem %reg bit-wise inclusive or 258
sals $imm/%cl %reg/mem shift arithmetic left 269
sars $imm/%cl %reg/mem shift arithmetic right 268
shls $imm/%cl %reg/mem shift left 269
shrs $imm/%cl %reg/mem shift right 268
subs $imm/%reg %reg/mem subtract 190
subs mem %reg subtract 190
tests $imm/%reg %reg/mem test bits 210
tests mem %reg test bits 210
xors $imm/%reg %reg/mem bit-wise exclusive or 258
xors mem %reg bit-wise exclusive or 258
s = b, w, l, q; w = l, q
14.7. INSTRUCTIONS INTRODUCED THUS FAR 337
14.8 Exercises
14-1 (§14.1) Develop an algorithm for converting decimal fractions to binary. Hint: Multiply
both sides of Equation 14.1 by two.
14-2 (§14.1) Show that two’s complement works correctly for fractional values. What is the
decimal range of 8-bit, two’s complement fractional values? Hint: +0.5 does not exist, but
-0.5 does.
7 #include <stdio.h>
8
9 int main()
10 {
11 float number;
12 int counter = 20;
13
14 number = 0.5;
15 while ((number != 0.0) && (counter > 0))
16 {
17 printf("number = %.10f and counter = %i\n", number, counter);
14.8. EXERCISES 339
18
19 number -= 0.1;
20 counter -= 1;
21 }
22
23 return 0;
24 }
Listing 14.5: Use float for Loop Control Variable?
Explain the behavior. What happens if you change the decrement of number from 0.1 to
0.0625? Explain.
14-4 (§14.3 §14.4) Copy the following program and run it:
1 /*
2 * exer14_3.c
3 * Are floats accurate?
4 * Bob Plantz - 18 June 2009
5 */
6
7 #include <stdio.h>
8
9 int main()
10 {
11 float fNumber = 2147483646.0;
12 int iNumber = 2147483646;
13
21 return 0;
22 }
Listing 14.6: Are floats accurate?
Explain the behavior. What is the maximum value of fNumber such that adding 1.0 to it
works?
14-5 (§14.4) Convert the following decimal numbers to 32-bit IEEE 754 format by hand:
a) 1.0 e) -3125.3125
b) -0.1 f) 0.33
c) 2005.0 g) -0.67
d) 0.00390625 h) 3.14
14-6 (§14.4) Convert the following 32-bit IEEE 754 bit patterns to decimal.
14-7 (§14.4) Show that half the floats (in 32-bit IEEE 754 format) are between -2.0 and +2.0.
7 #include <stdio.h>
8
9 int main()
10 {
11 int x;
12 double y, z;
13
20 return 0;
21 }
Thus far in this book, all programs have been executed under the Linux operating system. An
operating system (OS) can be viewed as a set of programs that provide services to application
programs. These services allow the application programs to use the hardware, but only under
the auspices of the OS.
Linux allows multiple programs to be executing concurrently, and each of the programs is
accessing the hardware resources of the computer. One of the jobs of the OS is to manage the
hardware resources in such a way that the programs do not interfere with one another. In this
chapter we introduce the CPU features that enable Linux to carry out this management task.
The read system call is a good example of a program using the services of the OS. It requests
input from the keyboard. The OS handles all input from the keyboard, so the read function
must first request keyboard input from the OS. One of the reasons this request must be funneled
through the OS is that other programs may also be requesting input from the keyboard, and the
OS needs to ensure that each program gets the keyboard input intended for it.
Once the request for input has been made, it would be very inefficient for the OS to wait until
a user strikes a key. So the OS allows another program to use the CPU, and the keyboard notifies
the OS when a key has been struck. To avoid losing a character, this notification interrupts the
CPU so that the OS can read the character from the keyboard.
Another example comes from something you probably did not intend to do. Unless you are a
perfect programmer, you have probably seen a “segmentation fault.” This can occur when your
program attempts to access memory that has not been allocated for your program. I have gotten
this error (yes, I still make programming mistakes!) when I have made a mistake using the
stack, or when I dereference a register that contains a bad address.
We can summarize these three types of events:
• a software interrupt can be used to request a service from the OS.
• most I/O devices can generate a hardware interrupt when they are ready to transfer data.
• certain conditions within the CPU (typically caused by our programming errors) generate
exceptions.
In response to any of these events, the CPU performs an operation that is very similar to the
call instruction. The value in the rip register is pushed onto the stack, and another address
is placed in the rip register. The net effect is that a function is called, just as in the call
instruction, but the address of the called function is specified in a different way, and additional
information is pushed onto the stack. Before describing the differences, we discuss what ought
to occur in order for the OS to deal with each of these events.
342
15.2. EXCEPTIONS 343
in order to avoid losing the keystroke, we would like to read the character immediately after the
cmpb instruction is executed but before the CPU starts working on the je instruction.
The function that reads the character from the keyboard is called an interrupt handler or
simply handler. Handlers are part of the OS. In Linux they can either be built into the kernel
or loaded as separate modules as needed.
The timing — between the two instructions — means that the CPU will acknowledge an
interrupt only between instruction execution cycles. Just before executing the je instruction
the rip register has the address of the instruction, and it is that address that gets pushed onto
the stack. That is, since calling a handler occurs automatically and does not involve fetching
an instruction, the current value of the rip pushed onto the stack is the correct return address
from the handler.
There is another important issue. It is almost certain that the rflags register will be changed
by the handler that gets called. When program control returns to the je instruction (which is
supposed to depend on the state of the rflags register as a result of executing the cmpb instruc-
tion), there is little chance that the program will do what the programmer intended. Thus we
conclude that in addition to saving the rip register,
• an interrupt causes the CPU to save the rflags register on the stack.
The next issue is the question of how the CPU knows the address of the appropriate handler
to call. In the call instruction, the address of the function to call is specified as an operand to
the instruction. For example,
call toUpperCase
Since the keyboard has no knowledge of the software, there must be some other mechanism for
specifying the address of the handler to call. The answer to this problem is that addresses of
interrupt handlers are stored in an Interrupt Descriptor Table (IDT). Each possible interrupt in
the system is associated with a unique entry in the IDT.
The IDT table entries are data structures (128 bits in 64-bit mode, 64 bits in 32-bit mode)
called gate descriptors. In addition to the handler address, they contain information that the
CPU uses to help protect the integrity of the OS.
After it has completed execution of the current instruction, the following actions must occur
when there is an interrupt from a device external external to the CPU:
• The address in the rip register must be saved so that the CPU can return to the current
program after it has handled the interrupting device.
• The address of the handler associated with this interrupt must be placed in the rip regis-
ter.
15.2 Exceptions
We next consider exceptions. These are typically the result of a number that the CPU cannot
deal with. Examples are
• division by zero
• an invalid instruction
• an invalid address
In a perfect world, the application software would include all the checks that would prevent the
occurrence of many of these errors. The reality is that no program is perfect, so some of these
errors will occur.
344 CHAPTER 15. INTERRUPTS AND EXCEPTIONS
When they do occur, it is the responsibility of the OS to take an appropriate action. The
currently executing instruction may have caused the exception to occur. So the CPU often reacts
to an exception in the midst of a normal instruction execution cycle. The actions that the CPU
must take in response to an exception are essentially the same as those for an interrupt:
• The address in the rip register must be saved. Depending on the nature of the excep-
tion, the handler may or may not return to the current program after it has handled the
exception.
• The address of the handler associated with this exception must be placed in the rip regis-
ter.
Not all exceptions are due to actual program errors. For example, when a program references
an address in another part of the program that has not yet been loaded into memory, it causes
a page fault exception. The OS must provide a handler that loads the appropriate part of the
program from the disk into memory, then continues with normal program execution.
to make system calls. The code corresponding to the desired action is loaded into eax and the
arguments are loaded into the proper registers before the system call is executed. The recom-
mended technique for making system calls is discussed in Section 15.6 on page 345.
15.4. CPU RESPONSE TO AN INTERRUPT OR EXCEPTION 345
4. Load the handler address from the gate descriptor into the rip register.
The CPU continues with a normal instruction processing cycle — fetch the instruction at the
address in rip, etc. Thus, control will transfer to the handler function.
Depending upon the nature of an exception and what actually caused it, CPU execution may
or may not be returned to the program that was executing when the exception occurred.
iret
that correctly pops everything off the stack into the rip and rflags registers and restores the
privilege level to where it was before the handler function was invoked. (The privilege level
information was also stored on the stack.)
syscall
Now the CPU has been switched to privilege level 0, and the OS has control and can enforce
orderly use of the hardware.
The program in Listing 15.1 illustrates the use of syscall to do system calls without using
the C libraries. See Exercise 15-1 for using syscall within the C runtime environment.
1 # myCat.s
2 # Writes a file to standard out
3 # Does not use C libraries
4 # Bob Plantz -- 18 June 2009
5
6 # Useful constants
7 .equ STDIN,0
8 .equ STDOUT,1
9 # from asm/unistd_64.h
10 .equ READ,0
11 .equ WRITE,1
12 .equ OPEN,2
13 .equ CLOSE,3
14 .equ EXIT,60
15 # from bits/fcntl.h
16 .equ O_RDONLY,0
17 .equ O_WRONLY,1
18 .equ O_RDWR,3
19 # Stack frame
20 .equ aLetter,-16
21 .equ fd, -8
22 .equ localSize,-16
23 .equ fileName,24
24 # Code
25 .text # switch to text segment
26 .globl __start
27 .type __start, @function
28 __start:
29 pushq %rbp # save caller’s frame pointer
30 movq %rsp, %rbp # establish our frame pointer
31 addq $localSize, %rsp # for local variable
32
45 writeLoop:
46 cmpl $0, %eax # any chars?
47 je allDone # no, must be end of file
48 movl $1, %edx # yes, 1 character
49 leaq aLetter(%rbp), %rsi # place to store character
50 movl $STDOUT, %edi # standard out
51 movl $WRITE, %eax
52 syscall # request kernel service
15.6. THE SYSCALL AND SYSRET INSTRUCTIONS 347
53
Listing 15.1: Using syscall to cat a file. Use “ld -e __start -o myCat myCat.o” after assem-
bling this file.
In Section 8.1 (page 154) we saw how to call the write system call function to write characters
to standard out (the screen). write and the other system call functions are simply C wrappers
that load the proper code in eax and the arguments into the appropriate registers.
Several system call codes are shown in Table 15.1. For additional system call codes see the
Table 15.1: Some system call codes for the syscall instruction.
unistd_64.h file on your system. The arguments for each system call are given in the man page
for the corresponding C version. For example,
bob@bob-desktop:~$ man 2 write
sysret
15.7 Summary
We summarize the differences between a call instruction and an interrupt/exception. The sim-
ilarities are
• the address in the rip is pushed onto the stack, thus providing a way for the CPU to return
to the normal flow of the application program (if appropriate), and
• the address of the function to be called is placed in the rip.
The additional features of the interrupt/exception are
• the value in the rflags register is also pushed onto the stack,
• the address of the called function is stored in the IDT table instead of being specified by
the programmer, and
• the privilege level of the called function can be changed (and it usually is).
15.8. INSTRUCTIONS INTRODUCED THUS FAR 349
This summary shows the assembly language instructions introduced thus far in the book. The
page number where the instruction is explained in more detail, which may be in a subsequent
chapter, is also given. This book provides only an introduction to the usage of each instruction.
You need to consult the manuals ([2] – [6], [14] – [18]) in order to learn all the possible uses of
the instructions.
15.8.1 Instructions
data movement:
opcode source destination action see page:
cmovcc %reg/mem %reg conditional move 230
movs $imm/%reg %reg/mem move 141
movsss $imm/%reg %reg/mem move, sign extend 216
movzss $imm/%reg %reg/mem move, zero extend 217
popw %reg/mem pop from stack 163
pushw $imm/%reg/mem push onto stack 163
s = b, w, l, q; w = l, q; cc = condition codes
arithmetic/logic:
opcode source destination action see page:
adds $imm/%reg %reg/mem add 189
adds mem %reg add 189
ands $imm/%reg %reg/mem bit-wise and 258
ands mem %reg bit-wise and 258
cmps $imm/%reg %reg/mem compare 209
cmps mem %reg compare 209
decs %reg/mem decrement 220
divs %reg/mem unsigned divide 280
idivs %reg/mem signed divide 282
imuls %reg/mem signed multiply 276
incs %reg/mem increment 220
leaw mem %reg load effective address 167
muls %reg/mem unsigned multiply 275
negs %reg/mem negate 286
ors $imm/%reg %reg/mem bit-wise inclusive or 258
ors mem %reg bit-wise inclusive or 258
sals $imm/%cl %reg/mem shift arithmetic left 269
sars $imm/%cl %reg/mem shift arithmetic right 268
shls $imm/%cl %reg/mem shift left 269
shrs $imm/%cl %reg/mem shift right 268
subs $imm/%reg %reg/mem subtract 190
subs mem %reg subtract 190
tests $imm/%reg %reg/mem test bits 210
tests mem %reg test bits 210
xors $imm/%reg %reg/mem bit-wise exclusive or 258
xors mem %reg bit-wise exclusive or 258
s = b, w, l, q; w = l, q
350 CHAPTER 15. INTERRUPTS AND EXCEPTIONS
15.9 Exercises
15-1 (§15.6) Modify the program in Figure 15.1 so that it uses the C environment. That is, turn
it into a main function using the prototype int main(int argc, char **argv);. argc is the
number of space-delimited strings on the command line, including the command to execute
the program. argv is a pointer to an array of pointers to each of the command line strings.
Chapter 16
Input/Output
In this chapter we discuss the I/O subsystem. The I/O subsystem is the means by which the
CPU communicates with the outside world. By “outside world” we mean devices other than the
CPU and memory.
As you have learned, the CPU executes instructions, and memory provides a place to store
data and instructions. Most programs read data from one or more input devices, process the
data, then write the results to one or more output devices.
Typical input devices are keyboards and mice. Common output devices are display screens
and printers. Although most people do not think of them as such, magnetic disks, CD drives,
etc. are considered as I/O devices. It may be a little more obvious that a connection with the
internet is also seen as I/O. The reasons will become clearer in this chapter, where we discuss
how I/O devices are programmed.
Aside: As pointed out in Section 1.2 (page 4), the three-bus description given here shows
the logical interaction between the CPU and I/O. Most modern general purpose computers
employ several types of buses. The way in which the CPU connects to the various buses
is handled by hardware controllers. A programmer generally deals only with the logical
view.
Two types of RAM are commonly used in PCs.
• SRAM holds its values as long as power is on. Access times are very fast. It requires more
components to do this, so it is more expensive and larger.
• DRAM uses passive components that hold data values for only a few fractions of a second.
Thus DRAM includes circuitry that automatically refreshes the data values before the
values are completely lost. It is much less expensive than SRAM, but also much slower.
Most of the memory in a PC is DRAM because it is much less expensive and smaller than SRAM.
Of course, each instruction must be fetched from memory, so slow memory access would limit
CPU speed. This problem is solved by using cache systems made from SRAM.
A cache is a small amount of fast memory placed between the CPU and main memory. When
the CPU needs to access a byte in main memory, that byte, together with several surrounding
352
16.2. I/O DEVICE TIMING 353
bytes, are copied into the cache memory. There is a high probability that the surrounding bytes
will be accessed soon, and the CPU can work with the values in the much faster cache. This is
handled by the system hardware. See [28] and [31] for more details.
Modern CPUs include cache memory on the same chip, which can be accessed at CPU speeds.
Even small cache systems are very effective in speeding up memory access. For example, the
CPU in my desktop system (built in 2005) has 64 KB of Level 1 instruction cache, 64 KB of Level
1 data cache, and 512 KB of Level 2 cache (both instructions and data). In contrast, most of the
memory in the system consists of 1 GB of DDR 400 memory.
The important point here is that memory is matched to the CPU by the hardware. Very
seldom is memory access speed a programming issue.
Aside: There are some cases where knowing how to manipulate memory caches can speed
up execution time. The x86 has instructions for working directly with cache. Optimizing
cache usage is an advanced topic beyond the scope of this book.
Modern computer systems employ both types of buses. A typical PC arrangement is shown in
Figure 16.1.
CPU
Memory Graphics
Memory
Controller Processor
SATA audio
I/O
PATA ethernet
Controller
USB PCI
Figure 16.1: Typical bus controllers in a modern PC. The Memory Controller is often called the
North Bridge; it provides synchronous communication with main memory and the
graphics interface. The I/O Controller is often called the South Bridge; it provides
asynchronous communication with the several types of buses that connect to I/O
devices.
speakers connected to them. Ultimately, the CPU must be able to communicate with I/O devices
in bit patterns at the speed of the device.
The hardware between the CPU and the actual I/O device consists of two subsystems — the
controller and the interface. The controller is the portion that works directly with the device.
For example, a keyboard controller detects which keys are pressed and converts this to a code. It
also detects whether a key is pressed or not. A disk controller moves the read/write head to the
requested track. It then detects the sector number and waits until the requested sector comes
into position. Some very simple devices do not need a controller.
The interface subsystem provides registers that the CPU can read from or write to. An I/O
device is programmed through the interface registers. In general, the following types of registers
are provided:
• Status — Provides information about the current state of the device, including the con-
troller.
• Control — Allows a program to send commands to the controller and to change its set-
tings.
It is common for one register to provide multiple functionality. For example, there may be one
register for transmitting and receiving, its functionality depending on whether the CPU writes
to or reads from the register. And it is common for an interface to have more than one register
of the same type, especially control registers.
16.5. I/O PORTS 355
Intel®
Syntax in destination, source
The in instruction moves data from the I/O port specified by the source into the register specified
by the destination. The source operand can be either an immediate value, or a value in the dx
register. The destination must be al, ax, or eax, consistent with the operand size. For example,
the instruction
inb $4, %al
Intel®
Syntax out destination, source
The out instruction moves data to the I/O port specified by the destination from the register
specified by the source. The destination operand can be either an immediate value, or a value in
the dx register. The source must be al, ax, or eax, consistent with the operand size. For example,
the instruction
outb %al, $6
handlers so that the hardware is utilized in an efficient manner. In Linux, a device handler may
either be compiled into the kernel or in a separate module that is loaded into memory only if
needed.
Thus, programming I/O devices generally means changing the operating system kernel. This
can be done, but it requires considerably more knowledge than is provided in this book. It is
possible to give user applications permission to directly access specific I/O devices, but this can
produce disastrous results, especially in a multi-user environment.
We will not do any direct I/O programming in this book, but we will look at the general
concepts. Listing 16.1 sketches the general algorithms in C. The code was abstracted from some
I/O routines that work with a Dual Asynchronous Universal Receiver/Transmitter (DUART) on
a single board computer. It is incomplete code and does not run on any known computer, but it
illustrates the basic concepts.
This example uses memory-mapped I/O. The program calls three functions:
• init io — Initialize the I/O interface. This includes placing the hardware in an “all clear”
state and setting parameters such as speed, etc.
12 /* register offsets */
13 #define MR 0x01 /* mode register */
14 #define SR 0x03 /* status register */
15 #define CSR 0x03 /* clock select register */
16 #define CR 0x05 /* command register */
17 #define RR 0x07 /* receiver register */
18 #define TR 0x07 /* transmitter register */
19 #define ACR 0x09 /* auxilliary control register */
20 #define IMR 0x0B /* interrupt mask register */
21
22 /* status bits */
23 #define RxRDY 1 /* receiver ready */
24 #define TxRDY 4 /* transmitter ready */
25
26 /* commands */
27 #define RESETRECEIVER 0x20
28 #define RESETTRANSMIT 0x30
29 #define RESETERROR 0x40
30 #define RESETMODE 0x10
31 #define TIMER 0xF0
32 #define NOPARITY8BITS 0x13
33 #define STOPBIT2 0x0F
34 #define BAUD19200 0xC
35 #define BAUDRATE BAUD19200+(BAUD19200<<4)
16.6. PROGRAMMING ISSUES 357
39 void init_io();
40 unsigned char charin();
41 void charout( unsigned char c );
42
43 int main() {
44 unsigned char aCharacter;
45
46 init_io();
47 aCharacter = charin();
48 charout(aCharacter);
49
50 return 0;
51 }
52
53 void init_io() {
54 unsigned char* port = (unsigned char*) 0xff000;
55
73 do
74 {
75 status = *(port+SR);
76 } while ((status & RxRDY) != 0);
77 character = *(port+RR);
78 return character;
79 }
80
Listing 16.1: Sketch of basic I/O functions using memory-mapped I/O — C version.
358 CHAPTER 16. INPUT/OUTPUT
Lines 12 – 37 define symbolic names for values that are used to program the device. Notice
that some names have the same value. For example, on lines 17 and 18 the receiver register (RR)
and transmitter register (TR) are actually the same register. The CPU receives when it reads
from this register and transmits when it writes to it. A similar situation is seen on lines 14 and
15. Reading from register 0x03 provides status information, and the clock selection commands
are written to the same register. This illustrates an important point — I/O interface registers
are not simply data storage places like CPU registers. It would probably be more accurate to call
them “interface ports,” but “registers” is the commonly used terminology.
This example uses memory-mapped I/O, so simple assignment statements are used to access
the I/O interface registers. The memory addresses 0xff0000 – 0xff020 are associated with I/O
registers for this device instead of physical memory. The base address of the device is assigned
to a pointer variable on line 54 in the init io function. Then the commands to initialize the
device are written to the appropriate registers on lines 56 – 66. It is not important that you
completely understand what this function is doing, but the comments should give you a rough
idea.
Lines 56 – 59 assign four different values to the same location:
*(port+CR) = RESETRECEIVER; /* reset receiver */
*(port+CR) = RESETTRANSMIT; /* reset transmitter */
*(port+CR) = RESETERROR; /* clear any errors */
*(port+CR) = RESETMODE; /* make sure we’re using MR1 */
If these were assignment to an actual memory location or to a CPU register, only the final
statement would be required. But the Command Register is an I/O interface register. And as
described above, it really is not a storage register, even on the I/O interface. In fact, these are
four different commands that are sent to the Command Register “port” on the I/O interface.
The order in which commands are sent to the I/O interface may also be important. For
example, on this particular device, the sequence on lines 62 – 63
*(port+MR) = NOPARITY8BITS; /* no parity, 8 bits */
*(port+MR) = STOPBIT2; /* stop bit length 2.000 */
must be performed in this order. There are actually two Mode Registers, which are both accessed
through the same I/O interface register. The first time the register is accessed, it is connected
to Mode Register 1. This access causes the hardware to automatically switch to Mode Register
2 for all subsequent accesses. Now you can understand the reason for sending the “RESETMODE”
command to the Command Register on line 59. It’s important to ensure that the first access will
be to Mode Register 1.
When compiling I/O functions, it is very important not to use optimization. If you do, the
compiler may try coalesce command values into one value. (See Exercise 1.)
The next function is charin(). Its job is to read a character from the DUART. In the lab
where this code was used, the DUART receiver was connected to a keyboard. The DUART must
wait until somebody presses a key on the keyboard, then convert the code for that key to an
eight-bit ASCII code representing the character. When the DUART has a character ready to be
read from its receiver register, it sets the “receiver ready” bit in its status register to one. The
do-while loop on lines 73 – 76 in charin show how the code must wait for this event.
When the status indicates that a character is ready, line 77 shows how it is read from the
receiver register.
The charout() function writes a character to the transmitter. As you might expect, the
transmitter was connected to a computer monitor. Although it is clear that keyboard input is
very slow, writing on a monitor screen is also slow compared to CPU processing. Thus, we need a
similar do-while loop (lines 83 – 88) to wait until the monitor is ready to accept a new character.
Once the value provided by the status register shows it is ready, line 89 shows how the character
is written to the DUART’s transmitter register.
Listing 16.2 shows the assembly language generated by the gcc compiler for the C program
in Listing 16.1. Some comments have been added to explain the general concepts.
1 .file "io_sketch_mm.c"
2 .text
16.6. PROGRAMMING ISSUES 359
3 .globl main
4 .type main, @function
5 main:
6 pushq %rbp
7 movq %rsp, %rbp
8 subq $16, %rsp
9 movl $0, %eax
10 call init_io
11 movl $0, %eax
12 call charin
13 movb %al, -1(%rbp)
14 movzbl -1(%rbp), %edi
15 call charout
16 movl $0, %eax
17 leave
18 ret
19 .size main, .-main
20 .globl init_io
21 .type init_io, @function
22 init_io:
23 pushq %rbp
24 movq %rsp, %rbp
25 movq $1044480, -8(%rbp) # initialize pointer variable to 0xff000
26 movq -8(%rbp), %rax # base address of DUART
27 addq $5, %rax # address of command register
28 movb $32, (%rax) # reset receiver
29 movq -8(%rbp), %rax
30 addq $5, %rax
31 movb $48, (%rax) # reset transmitter
32 movq -8(%rbp), %rax
33 addq $5, %rax
34 movb $64, (%rax) # reset error
35 movq -8(%rbp), %rax
36 addq $5, %rax
37 movb $16, (%rax) # reset mode
38 movq -8(%rbp), %rax # base address of DUART
39 addq $9, %rax # address of auxiliary control register
40 movb $-16, (%rax) # baud set, crystal rate
41 movq -8(%rbp), %rax
42 addq $1, %rax
43 movb $19, (%rax)
44 movq -8(%rbp), %rax
45 addq $1, %rax
46 movb $15, (%rax)
47 movq -8(%rbp), %rax
48 addq $3, %rax
49 movb $-52, (%rax)
50 movq -8(%rbp), %rax
51 addq $11, %rax
52 movb $0, (%rax)
53 movq -8(%rbp), %rax
54 addq $5, %rax
55 movb $5, (%rax)
56 leave
57 ret
58 .size init_io, .-init_io
360 CHAPTER 16. INPUT/OUTPUT
59 .globl charin
60 .type charin, @function
61 charin:
62 pushq %rbp
63 movq %rsp, %rbp
64 movq $1044480, -16(%rbp) # initialize pointer variable to 0xff000
65 .L6:
66 movq -16(%rbp), %rax # base address of DUART
67 addq $3, %rax # address of status register
68 movzbl (%rax), %eax # read status
69 movb %al, -2(%rbp) # and save locally
70 movzbl -2(%rbp), %eax
71 andl $1, %eax # check receiver status
72 testb %al, %al # if bit is 0
73 jne .L6 # recheck
74 movq -16(%rbp), %rax # receiver ready, get DUART address
75 addq $7, %rax # address of receiver register
76 movzbl (%rax), %eax # read input byte
77 movb %al, -1(%rbp) # store locally
78 movzbl -1(%rbp), %eax # return value
79 leave
80 ret
81 .size charin, .-charin
82 .globl charout
83 .type charout, @function
84 charout:
85 pushq %rbp
86 movq %rsp, %rbp
87 movb %dil, -20(%rbp)
88 movq $1044480, -16(%rbp) # initialize pointer variable to 0xff000
89 .L9:
90 movq -16(%rbp), %rax # base address of DUART
91 addq $3, %rax # address of status register
92 movzbl (%rax), %eax # read status
93 movb %al, -1(%rbp) # and save locally
94 movzbl -1(%rbp), %eax
95 andl $4, %eax # check transmitter status
96 testl %eax, %eax # if bit is 0
97 jne .L9 # recheck
98 movq -16(%rbp), %rax # transmitter ready, get DUART address
99 leaq 7(%rax), %rdx # address of transmitter register
100 movzbl -20(%rbp), %eax # load byte to send
101 movb %al, (%rdx) # send it
102 leave
103 ret
104 .size charout, .-charout
105 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
106 .section .note.GNU-stack,"",@progbits
Listing 16.2: Memory-mapped I/O in assembly language. Comments have been added to explain
the code.
The comments on line 25 – 40 in the init_io function describe how values are written to the
appropriate memory addresses, which are mapped to I/O registers.
Lines 65 – 73 in the charin function make up a loop that waits until the receiver has a
character ready to be read. The readiness of the receiver is indicated by bit 2 in the status
register. The address of the receiver register is computed on lines 74 – 75, then the character is
16.6. PROGRAMMING ISSUES 361
read from that register on line 75. A similar loop is used on lines 89 – 97 in the charout function
to wait until the status register shows that the transmitter is ready for another character. When
it is ready, the address of the transmitter register is computed on lines 98 – 99, the byte to be
sent is loaded into the eax register on line 100, and it is written to the transmitter register on
line 101.
As we saw in Section 16.5, special instructions are required to access isolated I/O. The Linux
kernel source includes macros to use these instructions. The macros are defined in the file io.h.
Listing 16.3 illustrates the use of these macros to write the same program as in Listing 16.1 if
the DUART interface were connected to the isolated I/O system.
1 /*
2 * io_sketch_iso.c
3 * This code sketches the algorithms to intialize
4 * a DUART, read one character and echo it using
5 * isolated I/O.
6 * WARNING: This code does not run on any known
7 * device. It is meant to sketch some
8 * general I/O concepts only.
9 * Bob Plantz - 18 June 2009
10 */
11 #include <sys/io.h>
12
13 /* register offsets */
14 #define MR 0x01 /* mode register */
15 #define SR 0x03 /* status register */
16 #define CSR 0x03 /* clock select register */
17 #define CR 0x05 /* command register */
18 #define RR 0x07 /* receiver register */
19 #define TR 0x07 /* transmitter register */
20 #define ACR 0x09 /* auxilliary control register */
21 #define IMR 0x0B /* interrupt mask register */
22
23 /* status bits */
24 #define RxRDY 1 /* receiver ready */
25 #define TxRDY 4 /* transmitter ready */
26
27 /* commands */
28 #define RESETRECEIVER 0x20
29 #define RESETTRANSMIT 0x30
30 #define RESETERROR 0x40
31 #define RESETMODE 0x10
32 #define TIMER 0xF0
33 #define NOPARITY8BITS 0x13
34 #define STOPBIT2 0x0F
35 #define BAUD19200 0xC
36 #define BAUDRATE BAUD19200+(BAUD19200<<4)
37 #define ENABLE 0x05
38 #define NOINTERRUPT 0x00
39 #define NOINTERRUPT 0x00
40
41 void init_io();
42 unsigned char charin();
43 void charout( unsigned char c );
44
45 int main() {
46 unsigned char aCharacter;
47
362 CHAPTER 16. INPUT/OUTPUT
48 init_io();
49 aCharacter = charin();
50 charout(aCharacter);
51
52 return 0;
53 }
54
55 void init_io() {
56 outb(CR, RESETRECEIVER);
57 outb(CR, RESETTRANSMIT);
58 outb(CR, RESETERROR);
59 outb(CR, RESETMODE);
60 outb(ACR, TIMER);
61 outb(MR, NOPARITY8BITS);
62 outb(MR, STOPBIT2);
63 outb(CSR, BAUDRATE);
64 outb(IMR, NOINTERRUPT);
65 outb(CR, ENABLE);
66 }
67
71 do
72 {
73 status = inb(SR);
74 } while ((status & RxRDY) != 0);
75 character = inb(RR);
76 return character;
77 }
78
The use of the outb() macro can be seen in lines 55 – 64. And on line 72 we see the inb() macro
being used to read the status register.
The gcc compiler generates assembly language as shown in Listing 16.4
1 .file "io_sketch_iso.c"
2 .text
3 .globl main
4 .type main, @function
5 main:
6 pushq %rbp
7 movq %rsp, %rbp
8 subq $16, %rsp
16.6. PROGRAMMING ISSUES 363
Listing 16.4: Isolated I/O in assembly language. Comments have been added to explain the code.
Looking at lines 58 – 73 and lines 95 – 110, we see that the outb() and inb() macros generate
functions. The actual outb instruction is used on line 68 and inb is used on line 103.
At the points were the macros are called in the C source code, the compiler generates calls to
the appropriate function. For example, the C sequence
55 outb(CR, RESETRECEIVER);
56 outb(CR, RESETTRANSMIT);
handling function is being executed. The programmer must decide whether the interrupt should
be allowed or not. In general, it cannot be ignored because this would cause the loss of I/O data.
On the other hand, spending too much time handling the second interrupt may cause the first
device to lose data.
16.9 Exercises
16-1 (§16.6) Enter the C program in Listing 16.1. Compile it to the assembly language stage
(use the -S option) with different levels of optimization. For example, -O1, -O2. Compare
the results with the non-optimized version in Listing 16.2.
16-2 (§16.6) Enter the C program in Listing 16.3. Compile it to the assembly language stage
(use the -S option) with different levels of optimization. For example, -O1, -O2. Compare
the results with the non-optimized version in Listing 16.4.
Appendix A
Reference Material
x y x+y
0 0 0
2. OR gate (page 56) 0 1 1
x 1 0 1
x+y
y
1 1 1
x x′
0 0
3. NOT gate (page 56) x x′ 0 1
x y (x · y)′
0 0 1
4. NAND gate (page 77) 0 1 1
x 1 0 1
(x · y)′
y
1 1 0
x y (x + y)′
0 0 1
5. NOR gate (page 77) 0 1 0
x 1 0 0
(x + y)′
y
1 1 0
367
368 APPENDIX A. REFERENCE MATERIAL
(page 121)
This summary shows the assembly language instructions used in this book. The page number
where the instruction is explained in more detail, is also given. This book provides only an
introduction to the usage of each instruction. You need to consult the manuals ([2] – [6], [14] –
[18]) in order to learn all the possible uses of the instructions.
data movement:
opcode source destination action see page:
cmovcc %reg/mem %reg conditional move 230
movs $imm/%reg %reg/mem move 141
movsss $imm/%reg %reg/mem move, sign extend 216
movzss $imm/%reg %reg/mem move, zero extend 217
popw %reg/mem pop from stack 163
pushw $imm/%reg/mem push onto stack 163
s = b, w, l, q; w = l, q; cc = condition codes
370 APPENDIX A. REFERENCE MATERIAL
arithmetic/logic:
opcode source destination action see page:
adds $imm/%reg %reg/mem add 189
adds mem %reg add 189
ands $imm/%reg %reg/mem bit-wise and 258
ands mem %reg bit-wise and 258
cmps $imm/%reg %reg/mem compare 209
cmps mem %reg compare 209
decs %reg/mem decrement 220
divs %reg/mem unsigned divide 280
idivs %reg/mem signed divide 282
imuls %reg/mem signed multiply 276
incs %reg/mem increment 220
leaw mem %reg load effective address 167
muls %reg/mem unsigned multiply 275
negs %reg/mem negate 286
ors $imm/%reg %reg/mem bit-wise inclusive or 258
ors mem %reg bit-wise inclusive or 258
sals $imm/%cl %reg/mem shift arithmetic left 269
sars $imm/%cl %reg/mem shift arithmetic right 268
shls $imm/%cl %reg/mem shift left 269
shrs $imm/%cl %reg/mem shift right 268
subs $imm/%reg %reg/mem subtract 190
subs mem %reg subtract 190
tests $imm/%reg %reg/mem test bits 210
tests mem %reg test bits 210
xors $imm/%reg %reg/mem bit-wise exclusive or 258
xors mem %reg bit-wise exclusive or 258
s = b, w, l, q; w = l, q
This discussion covers the fundamental concepts employed in a Makefile. My intent is to show
you how to write Makefiles that help you debug your programs. The problem with many discus-
sions of make is that they show how to use many of the “features” of the make program. Many
problems students have when debugging their programs are actually caused by errors in the
Makefile that cause make to use its default behavior.
For example, if you try to compile the program myProg with the command:
make myProg
make will look for its instructions in a file in the current directory named Makefile (or makefile).
Assuming Makefile exists, make searches the file for a target (defined later in this chapter) named
myProg. If there is no Makefile, make searches for a file named myProg.s, myProg.c or myProg.cc.
If either of the .s or .c source files exists, make issues the command
cc myProg.c -o myProg
Notice that the compiler is invoked with only the -o option. For example, you cannot use gdb
to debug the program because the -g option does not get used. This means that if one of the
entries in your Makefile is incorrect, the default behavior may cause make to compile a source
file without the debugging option. It is much easier to avoid these problems if you:
• keep your Makefiles very simple, and
• read what make writes on the screen very carefully when it executes a Makefile.
A Makefile consists of a series of entries. Each entry in a Makefile consists of:
1. One dependency line. The format of a dependency line is:
Prerequisites are names of files or other targets in this Makefile. Use spaces as separators
between the prerequisites, not tabs.
2. Zero or more Unix command lines. The format of a command line is:
(a) Each line must begin with a tab character (not a group of spaces). Be careful if you
write a Makefile with an editor on another platform; some editors automatically re-
place tabs with spaces.
373
374 APPENDIX B. USING GNU MAKE TO BUILD PROGRAMS
There can also be (should be!) comment lines that begin with a #.
If you invoke make with no argument:
make
make clean
causes make to start with the clean entry. Since there are no prerequisites, the tree ends here,
and the target is always “out of date.” Hence, the command for this entry will always be exe-
cuted, but none of the other entries is executed.
IMPORTANT: When using the make program, it will echo each command as it is executed
and provide diagnostic error messages on the screen. You must read this display very
carefully in order to determine what has taken place.
Listing B.1: An example of a Makefile for an assembly language program with one source file.
The functions in real programs are distributed amongst many files. When changes are made,
it is clearly a waste of time to recompile all the functions. With a properly designed Makefile,
make only recompiles files that have been changed. (This is a motivation for placing each function
is its own file.) Listing B.2 illustrates a Makefile for a program where the main function and
one subfunction are written in C and one subfunction is written in assembly language. Notice
that header files have been created for both subfunctions to provide prototype statements for
the main function, which is written in C. The assembly language source file does not #include
its own header file because prototype statements do not apply to assembly language.
375
14 sub2.o: sub2.s
15 as --gstabs sub2.s -o sub2.o
16
Listing B.2: An example of a Makefile for a program with both C and assembly language source
files.
As you can see in Listing B.2, there is quite a bit of repetition in a Makefile. Variables provide
a good way to reduce the chance of typing errors. Listing B.3 illustrates the use of variables to
simplify the Makefile from Listing B.2.
1 # Makefile for biggerProg
2 # Bob Plantz - 19 June 1009
3
11 biggerProg: $(objects)
12 gcc -o biggerProg $(objects)
13
20 sub2.o: sub2.s
21 as $(asmflags) -o sub2.o sub2.s
22
23 clean:
24 rm $(objects) biggerProg *~
Executing make with the Makefile in Listing B.3 shows that the two C source files are com-
piled, and the assembly source file is assembled with the proper flags:
bob$ make
376 APPENDIX B. USING GNU MAKE TO BUILD PROGRAMS
(gdb) li
4 * 4 Jan. 06 - R. Plantz
5 */
6
7 #include <stdio.h>
8 #include "sub1.h"
9 #include "sub2.h"
10
11 int main()
12 {
13 printf("Starting in main, about to call sub1...\n");
(gdb) run
Starting program: /home/bob/progs/appendB/biggerProg/biggerProg
Starting in main, about to call sub1...
In sub1
Back in main, about to call sub2...
In sub2
Back in main.
Program ending.
12 biggerProg: $(objects)
377
15 clean:
16 rm $(objects) biggerProg *~
Listing B.4: Incomplete Makefile. Several entries are missing, so make invokes its default be-
havior.
executing make shows that the two C source files are compiled, and the assembly source file is
assembled using the make program’s own default behavior to give:
bob$ make
cc -c -o biggerProg.o biggerProg.c Read what make
cc -c -o sub1.o sub1.c does. This is
as -o sub2.o sub2.s DEFAULT
behavior.
gcc -o biggerProg biggerProg.o sub1.o sub2.o
bob$
Note that make does not give any error messages even though our Makefile is incomplete. It
appears to have created the program correctly. However, when we try to use gdb we see:
bob$ gdb biggerProg
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...
(gdb) li
1 ../sysdeps/x86_64/elf/start.S: No such file or directory.
in ../sysdeps/x86_64/elf/start.S
(gdb) run
Starting program: /home/bob/progs/appendB/biggerProg/biggerProg
Starting in main, about to call sub1...
In sub1
Back in main, about to call sub2...
In sub2
Back in main.
Program ending.
Reading the make messages on the screen shows that it created the program without using
all our flags. The important lesson to note here is that an error-free execution of make is not
sufficient to guarantee your program was built as you intended. You need to read the screen
messages written on the screen when using make.
To learn more about using make see [30].
Appendix C
The program in Listing 10.5 uses a while loop to write “Hello World” on the screen one character
at a time. A common programming error is to create an “infinite” loop. It would be nice to have
a tool that allows us to stop such a program in the middle of the loop so we can observe the state
of registers and memory locations. That can help us to determine such things as whether the
loop control variable is being changed as we planned.
Fortunately, the gnu program development environment includes a debugger, gdb (see [29]),
that allows us to do just that. The gdb debugger allows you to load another program into memory
and use gdb commands to control the execution of the other program — the target program —
and to observe the states of its variables.
There is another, very important, reason for learning how to use gdb. This book describes
how registers and memory are controlled by computer instructions. The gdb program is a very
valuable learning tool, since it allows you to observe the behavior of each instruction, one step
at a time.
gdb has a large number of commands, but the following are the most common ones that will
be used in this book:
• li lineNumber — lists ten lines of the source code, centered at the specified line number.
• run — begins execution of a program that has been loaded under control of gdb.
• n — execute current source code statement of a program that has been running; if it’s a
call to a function, the entire function is executed.
• s — execute current source code statement of a program that has been running; if it’s a
call to a function, step into the function.
• si — execute current (machine) instruction of a program that has been running; if it’s a
call to a function, step into the function.
• i r — info registers — displays the contents of the registers, except floating point and
vector.
378
379
Here is a screen shot of how I assembled, linked, and then used gdb to control the execution
of the program and observe its behavior. User input is boldface and the session is annotated in
italics.
bob@ubuntu:~$ as --gstabs -o helloWorld3.o helloWorld3.s
bob@ubuntu:~$ gcc -o helloworld3 helloWorld3.o
bob@ubuntu:~$ gdb helloworld3
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...
After assembling and linking the program, we start gdb program and load the helloworld
program — the one we want to observe — into memory. This leaves me in gdb. The
target program is not running.
(gdb) li
1 # helloWorld3.s
2 # "hello world" program using the write() system call
3 # one character at a time.
4 # Bob Plantz - 12 June 2009
5
6 # Useful constants
7 .equ STDOUT,1
8 # Stack frame
9 .equ aString,-8
10 .equ localSize,-16
We are trying to observe the while loop. Providing an argument to the li command
causes it to list ten lines centered around the value of the argument. We still do not
see the entire loop. Pressing the Enter key tells gdb to repeat the immediately previous
command. The li command is smart enough to list the next ten lines (only eight in
this example since that takes us to the end of the source code in this file).
380 APPENDIX C. USING THE GDB DEBUGGER FOR ASSEMBLY LANGUAGE
(gdb) br 29
Breakpoint 1 at 0x400523: file helloWorld3.s, line 29.
(gdb) br 37
Breakpoint 1 at 0x400523: file helloWorld3.s, line 29.
From the listed source code, we can see that the decision to exit the loop is made on line
29 in the source code. The jump to the allDone label will occur if the cmpb instruction
on line 28 shows that the rsi register is pointing to a byte that contains zero — the
ASCII NUL character. I set a breakpoint at line 29 so we can see what esi is pointing
to.
I also set a breakpoint at line 37, the target of the jump. This second breakpoint serves
as a sort of “safety net” in case I did not read the code correctly. If the program does
not reach the breakpoint within the loop, perhaps I can work backwards and figure
out my error from examining the registers and memory at this point.
(gdb) run
Starting program: /home/bob/progs/chap10//helloWorld3
The run command causes the target program, helloworld, to execute until it reaches a
breakpoint. Control then returns to the gdb program.
IMPORTANT: The instruction at the breakpoint is not executed when the break oc-
curs. It will be the first instruction to be executed when we command gdb to resume
execution of the target program.
(gdb) i r
rax 0x7f12d857fac0 139718915783360
rbx 0x400560 4195680
rcx 0x0 0
rdx 0x7fffe079fbc8 140736959478728
rsi 0x40063c 4195900
rdi 0x1 1
rbp 0x7fffe079fae0 0x7fffe079fae0
rsp 0x7fffe079fad0 0x7fffe079fad0
r8 0x7f12d857e2e0 139718915777248
r9 0x7f12d8591ef0 139718915858160
r10 0x7fffe079f920 140736959478048
r11 0x7f12d822f380 139718912308096
r12 0x400420 4195360
r13 0x7fffe079fbb0 140736959478704
r14 0x0 0
r15 0x0 0
rip 0x400523 0x400523 <whileLoop+7>
eflags 0x206 [ PF IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
fctrl 0x37f 895
fstat 0x0 0
ftag 0xffff 65535
fiseg 0x0 0
381
fioff 0x0 0
foseg 0x0 0
fooff 0x0 0
fop 0x0 0
mxcsr 0x1f80 [ IM DM ZM OM UM PM ]
(gdb) i r rsi
ebx 0x40063c 4195900
The i r command (notice the space between “i” and “r”) is used to display all the
registers. The left-hand column shows the contents of the register in hexadecimal, and
the right-hand column is in decimal. Addresses are usually stated in hexadecimal, so
the contents of registers that are supposed to hold only addresses are not converted to
decimal.
Since our primary interest is the rsi register, we can simplify the display by explicitly
specifying which register(s) to display.
(gdb) help x
Examine memory: x/FMT ADDRESS.
ADDRESS is an expression for the memory address to examine.
FMT is a repeat count followed by a format letter and a size letter.
Format letters are o(octal), x(hex), d(decimal), u(unsigned decimal),
t(binary), f(float), a(address), i(instruction), c(char) and s(string).
Size letters are b(byte), h(halfword), w(word), g(giant, 8 bytes).
The specified number of objects of the specified size are printed
according to the format.
Defaults for format and size letters are those previously used.
Default count is 1. Default address is following last thing printed
with this command or "print".
(gdb) x/10cb 0x40063c
0x40063c <theString>: 72 ’H’101 ’e’108 ’l’108 ’l’111 ’o’32 ’ ’119 ’w’111 ’o’
0x400644 <theString+8>: 114 ’r’108 ’l’
We should examine the byte that rsi is pointing to because that determines whether
this jump instruction transfers control or not. The help x command provides a very
brief reminder of the codes to use. The character display (c) shows two values for
each byte — first in decimal, then the equivalent ASCII letter. We can see that rsi is
pointing to the beginning of the text string. I chose to display ten characters to confirm
that this is the correct text string.
(gdb) si
31 movl $1, %edx # one character
(gdb)
32 movl $STDOUT, %edi # standard out
(gdb)
33 call write # invoke write function
(gdb)
0x0000000000400408 in write@plt ()
We use the si command to single-step through a portion of the program. Recall that
simply pushing the Enter key repeats the immediately previous gdb command.
The last step in this sequence gave an odd result. It caused the program to execute
the call instruction, which took us into the write function. Since write is a library
function, gdb does not have access to its source code. Hence, it cannot display the
source code for us.
(gdb) cont
382 APPENDIX C. USING THE GDB DEBUGGER FOR ASSEMBLY LANGUAGE
Continuing.
H
Breakpoint 1, whileLoop () at helloWorld3.s:29
29 je allDone # yes, all done
(gdb) i r rsi
rsi 0x40063d 4195901
(gdb) x/10cb 0x40063d
0x40063d <theString+1>: 101 ’e’108 ’l’108 ’l’111 ’o’32 ’ ’119 ’w’111 ’o’114 ’r’
0x400645 <theString+9>: 108 ’l’100 ’d’
Not wanting to single-step through the write function, I use the cont command. The
program displays the first letter of the string, “H”, on the screen, then loops back and
breaks again at line 20. I display register rsi and examine the memory it is pointing
to. We can see that the pointer variable, aString, is marching through the text string
one character at a time.
(gdb) cont
Continuing.
e
Breakpoint 1, whileLoop () at helloWorld3.s:29
29 je allDone # yes, all done
(gdb) clear 29
Deleted breakpoint 1
Continuing the program shows that it will break back into gdb each time through the
loop. We are reasonably confident that the loop is executing properly, so we remove the
breakpoint in the loop.
(gdb) cont
Continuing.
llo world.
With the breakpoint inside the loop removed, continuing the program displays the
remainder of the text. Then it breaks at the breakpoint we set outside the loop. Recall
that I set the breakpoint at line 37, but the program breaks at line 32. The reason is
that there is no instruction on line 37, just a label. The first instruction following the
label is on line 38.
I then look at the address in rsi. By examining two bytes previous to where it is cur-
rently pointing, we can easily see the last two characters that the program displayed
before reaching the NUL character. And it is the NUL character that caused the loop to
terminate.
(gdb) cont
Continuing.
Continuing the program, it completes normally. Notice that even though our target
program has completed, we are still in gdb. We need to use the q command to exit from
gdb.
Appendix D
The gcc C compiler has an extension to standard C that allows a programmer to write assembly
language instructions within a C function. Of course, you need to be very careful when doing
this because you do not know how the compiler has allocated memory and/or registers for the
variables. Yes, you can use the “-S” option to see what the compiler did, but if anybody make one
change to the function, even compiling it with a different version of gcc, things almost certainly
will have changed.
The way to do this is covered in the info pages for gcc. In my version (4.1.2) I found it
by going to “C Extensions,” then “Extended Asm.” (No, it’s not obvious to me, either.) The
presentation here is a very brief introduction.
The overall format is a C statement of the form:
The output operands are destinations for the assembly_language_instruction, and the input
operands are sources. Each operand is of the form
"operand_constraint" : C_expression
where the operand_constraint describes what type of register, memory location, etc. should be
used for the operand, and C_expression is a C expression, often just a variable name. If there is
more than one operand, they are separated by commas.
The assembly_language_instruction can refer to each operand numerically with the “%n”
syntax, starting with n = 0 for the first operand, 1 for the second, etc.
For example, let us consider a case where we wish to add two 32-bit integers. (Yes, there is a
C operation to do this, but it is generally better to start with simple examples.) The program is
shown in Listing D.1.
1 /*
2 * embedAsm1.c
3 * Very simple example of how to embed assembly language
4 * in a C function.
5 * Bob Plantz - 18 June 2009
6 */
7
8 #include <stdio.h>
9
10 int main()
11 {
12 int x, y;
13
383
384 APPENDIX D. EMBEDDING ASSEMBLY CODE IN A C FUNCTION
15 scanf("%i", &x);
16 printf("Enter another integer: ");
17 scanf("%i", &y);
18 asm("addl %1, %0" : "=m" (x) : "r" (y));
19 printf("There sum is %i\n", x);
20
21 return 0;
22 }
There is only one output (destination), and its operand constraint is "=m". The ‘=’ sign is required
to show that it is an output. The ‘m’ character shows that this operand is located in memory.
Now, recall that the addl instruction requires that at least one of its operands be a register.
So we specify the input operand as a register with the "r" operand constraint. We have to do
this for the assembly language instruction even though the C code does not specify whether the
variable, y, is in memory or in a register.
The operand constraints are described in the info pages for gcc. In my version (4.1.2) I found
it by going to “C Extensions,” then “Constraints.” The documentation covers all the architectures
supported by gcc, so it is difficult to wade through.
Listing D.2 shows the assembly language actually generated by the compiler.
1 .file "embedAsm1.c"
2 .section .rodata
3 .LC0:
4 .string "Enter an integer: "
5 .LC1:
6 .string "%i"
7 .LC2:
8 .string "Enter another integer: "
9 .LC3:
10 .string "There sum is %i\n"
11 .text
12 .globl main
13 .type main, @function
14 main:
15 pushq %rbp
16 movq %rsp, %rbp
17 subq $16, %rsp
18 movl $.LC0, %edi
19 movl $0, %eax
20 call printf
21 leaq -4(%rbp), %rsi
22 movl $.LC1, %edi
23 movl $0, %eax
24 call scanf
25 movl $.LC2, %edi
26 movl $0, %eax
27 call printf
28 leaq -8(%rbp), %rsi
29 movl $.LC1, %edi
30 movl $0, %eax
31 call scanf
32 movl -8(%rbp), %eax
33 #APP
34 # 18 "embedAsm1.c" 1
35 addl %eax, -4(%rbp)
36 # 0 "" 2
385
37 #NO_APP
38 movl -4(%rbp), %esi
39 movl $.LC3, %edi
40 movl $0, %eax
41 call printf
42 movl $0, %eax
43 leave
44 ret
45 .size main, .-main
46 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
47 .section .note.GNU-stack,"",@progbits
Listing D.2: Embedding an assembly language instruction in a C function gcc assembly lan-
guage.
In fact, the compiler did allocate y in memory, at -8(%rbp). It had to do that because scanf needs
an address when reading a value from the keyboard.
The embedded assembly language is between the #APP and #NO_APP comments on lines 33
and 37, respectively.
32 movl -8(%rbp), %eax
33 #APP
34 # 18 "embedAsm1.c" 1
35 addl %eax, -4(%rbp)
36 # 0 "" 2
37 #NO_APP
The movl instruction on line 32 loads x into a register so that the addl instruction on line 35 can
add the value to a memory location (y). Of course, it would have had to do that even if we had
used a C statement for the addition instead of embedding an assembly language instruction.
There may be situations where you need to use a specific register for a variable. Listing D.3
shows how to do this.
1 /*
2 * embed_asm2.c
3 * Shows two assembly language instructions embedded
4 * in a C function.
5 * Bob Plantz - 18 June 2009
6 */
7
8 #include <stdio.h>
9
10 int main()
11 {
12 int x, y;
13 register int z asm("edx");
14
22 return 0;
23 }
Listing D.3: Embedding more than one assembly language instruction in a C function and spec-
ifying a register (C).
386 APPENDIX D. EMBEDDING ASSEMBLY CODE IN A C FUNCTION
shows how to request that the compiler use the edx register for the variable z.
We have decided to embed three assembly language instructions. Recall that each assembly
language statement is on a separate line. And on the next line, we tab to the place where the
operation code begins. In C, the newline character is ’\n’ and the tab character is ’t’. So if you
read line 18 carefully, you will see that there are three lines of assembly language. The first one
is terminated by a ’\n’. The second instruction begins with a ’\t’ and is terminated by a ’\n’.
And the third begins with a ’\t’.
The assembly language results are shown in Listing D.4.
1 .file "embedAsm2.c"
2 .section .rodata
3 .LC0:
4 .string "Enter an integer: "
5 .LC1:
6 .string "%i"
7 .LC2:
8 .string "Enter another integer: "
9 .align 8
10 .LC3:
11 .string "Sixteen times there sum is %i\n"
12 .text
13 .globl main
14 .type main, @function
15 main:
16 pushq %rbp
17 movq %rsp, %rbp
18 subq $16, %rsp
19 movl $.LC0, %edi
20 movl $0, %eax
21 call printf
22 leaq -4(%rbp), %rsi
23 movl $.LC1, %edi
24 movl $0, %eax
25 call scanf
26 movl $.LC2, %edi
27 movl $0, %eax
28 call printf
29 leaq -8(%rbp), %rsi
30 movl $.LC1, %edi
31 movl $0, %eax
32 call scanf
33 #APP
34 # 19 "embedAsm2.c" 1
35 movl -4(%rbp), %edx
36 addl -8(%rbp), %edx
37 sall $4, %edx
38 # 0 "" 2
39 #NO_APP
40 movl %edx, %esi
41 movl $.LC3, %edi
42 movl $0, %eax
43 call printf
44 movl $0, %eax
45 leave
387
46 ret
47 .size main, .-main
48 .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
49 .section .note.GNU-stack,"",@progbits
Listing D.4: Embedding more than one assembly language instruction in a C function and spec-
ifying a register (gcc assembly language).
This has been a very abbreviated introduction to embedding assembly language in C. Each
situation will be unique, and you will need to study the info pages for gcc in order to determine
what needs to be done. You can also expect the rules to change — hopefully become easier to
use — as gcc evolves.
Appendix E
Exercise Solutions
The solutions to most of the exercises in the book are in this Appendix. You should attempt to
work the exercise before looking at the solution. But don’t allow yourself to get bogged down. If
the solution does not come to you within a reasonable amount of time, peek at the solution for a
hint.
A word of warning: I have proofread these solutions many times. Each time has turned up
several errors. I am amazed at how difficult it is to make everything perfect. If you find an error,
please email me and I will try to correct the next printing.
When reading my programming solutions, be aware that my goal is to present simple, easy-
to-read code that illustrates the point. I have not tried to optimize, neither for size nor perfor-
mance.
I am also aware that each of us has our own programming style. Yours probably differs from
mine. If you are working with an instructor, I encourage you to discuss programming style with
him or her. I probably will not change my style, but I support other people’s desire to use their
own style.
2 -1 a) 4567 c) fedc
b) 89ab d) 0250
2 -3 a) 32 d) 16
b) 48 e) 8
c) 4 f) 32
2 -4 a) 2 d) 3
b) 8 e) 5
c) 16 f) 2
2 -5 r = 10, n = 8, d7 = 2, d6 = 9, d5 = 4, d4 = 5, d3 = 8, d2 = 2, d1 = 5, d0 = 4.
r = 16, n = 8, d7 = 2, d6 = 9, d5 = 4, d4 = 5, d3 = 8, d2 = 2, d1 = 5, d0 = 4.
388
E.2. DATA STORAGE FORMATS 389
2 -6 a) 170 e) 128
b) 85 f) 99
c) 240 g) 123
d) 15 h) 255
2 -7 a) 43981 e) 32768
b) 4660 f) 1024
c) 65244 g) 65535
d) 2000 h) 12345
a) 160 e) 100
b) 80 f) 12
c) 255 g) 17
d) 137 h) 200
a) 40960 e) 34952
b) 65535 f) 400
c) 1024 g) 43981
d) 4369 h) 21845
2 -10 a) 64 e) ff
b) 7b f) 10
c) 0a g) 20
d) 58 h) 80
2 -12 Since there are 12 values, we need 4 bits. Any 4-bit code would work. For example,
code grade
0000 A
0001 A-
0010 B+
0011 B
0100 B-
0101 C+
0110 C
0111 C-
1000 D+
1001 D
1010 D-
1011 F
2 -13 The addressing in Figure 2.1 uses only four bits. This limits us to a 16-byte addressing
space. In order to increase our space to 17 bytes, we need another bit for the address. The
17th byte would be number 10000.
2 -17 The range of 32-bit unsigned ints is 0 – 4,294,967,295, so four bytes will be required.
If the storage area begins at byte number 0x2fffeb96, the number will also occupy bytes
number 0x2fffeb97, 0x2fffeb98, 0x2fffeb99.
E.2. DATA STORAGE FORMATS 391
10 #include <stdio.h>
11
12 int main(void)
13 {
14 int x;
15 unsigned int y;
16
17 while(1)
18 {
19 printf("Enter a decimal integer: ");
20 scanf("%i", &x);
21 if (x == 0) break;
22
33 return 0;
392 APPENDIX E. EXERCISE SOLUTIONS
34 }
2 -28
1 /*
2 * stringInHex.c
3 * displays "Hello world" in hex.
4 *
5 * Bob Plantz - 19 June 2009
6 */
7
8 #include <stdio.h>
9
10 int main(void)
11 {
12 char *stringPtr = "Hello world.\n";
13
23 return 0;
24 }
2 -29 Keyboard input is line buffered by the operating system and is not available to the appli-
cation program until the user presses the enter key. This action places two characters in
the keyboard buffer – the character key pressed and the end of line character. (The “end of
line” character differs in different operating systems.)
The call to the read function gets one character from the keyboard buffer – the one cor-
responding to the key the user pressed. Since there is a breakpoint at the instruction
following the call to read, control returns to the debugger, gdb. But the end of line charac-
ter is still in the keyboard buffer, and the operating system dutifully provides it to gdb.
The net result is the same as if you had pushed the enter key immediately in response to
gdb’s prompt. This causes gdb to execute the previous command, which was the continue
command. So the program immediately loops back to its prompt.
Experiment with this. Try to enter more than one character before pressing the enter key.
It is all very consistent. You just have to think through exactly which keys you are pressing
when using the debugger to determine what your call to read are doing.
2 -30
1 /*
2 * echoString1.c
3 * Echoes a string entered by user.
4 *
5 * Bob Plantz - 19 June 2009
6 */
7
8 #include <unistd.h>
9 #include <string.h>
10
11 int main(void)
12 {
13 char aString[200];
E.2. DATA STORAGE FORMATS 393
37 return 0;
38 }
2 -31
1 /*
2 * echoString2.c
3 * Echoes a string entered by user. Converts input
4 * to C-style string.
5 * Bob Plantz - 19 June 2009
6 */
7
8 #include <stdio.h>
9 #include <unistd.h>
10 #include <string.h>
11
12 int main(void)
13 {
14 char aString[200];
15 char *stringPtr = aString;
16
31 return 0;
32 }
2 -32
1 /*
2 * echoString3.c
3 * Echoes a string entered by user.
4 *
5 * Bob Plantz - 19 June 2009
6 */
7
8 #include "readLn.h"
9 #include "writeStr.h"
10
11 int main(void)
12 {
13 char aString[STRLEN]; // limited to 5 for testing readStr
14 // change to 200 for use
15 writeStr("Enter a text string: ");
16 readLn(aString, STRLEN);
17 writeStr("You entered:\n");
18 writeStr(aString);
19 writeStr("\n");
20
21 return 0;
22 }
1 /*
2 * writeStr.h
3 * Writes a line to standard out.
4 *
5 * input:
6 * pointer to C-style text string
7 * output:
8 * to screen
9 * returns number of chars written
10 *
11 * Bob Plantz - 19 June 2009
12 */
13
14 #ifndef WRITESTR_H
15 #define WRITESTR_H
16 int writeStr(char *);
17 #endif
1 /*
2 * writeStr.c
3 * Writes a line to standard out.
4 *
5 * input:
6 * pointer to C-style text string
7 * output:
8 * to screen
9 * returns number of chars written
10 *
11 * Bob Plantz - 19 June 2009
E.2. DATA STORAGE FORMATS 395
12 */
13
14 #include <unistd.h>
15 #include "writeStr.h"
16
28 return count;
29 }
1 /*
2 * readLn.h
3 * Reads a line from standard in.
4 * Drops newline character. Eliminates
5 * excess characters from input buffer.
6 *
7 * input:
8 * from keyboard
9 * output:
10 * null-terminated text string
11 * returns number of chars in text string
12 *
13 * Bob Plantz - 19 June 2009
14 */
15
16 #ifndef READLN_H
17 #define READLN_H
18 int readLn(char *, int);
19 #endif
1 /*
2 * readLn.c
3 * Reads a line from standard in.
4 * Drops newline character. Eliminates
5 * excess characters from input buffer.
6 *
7 * input:
8 * from keyboard
9 * output:
10 * null-terminated text string
11 * returns number of chars in text string
12 *
13 * Bob Plantz - 19 June 2009
14 */
15
16 #include <unistd.h>
17 #include "readLn.h"
396 APPENDIX E. EXERCISE SOLUTIONS
18
35 return count;
36 }
In two’s complement, zero does not have a representation of opposite sign. (-0.0 does exist
in IEEE 754 floating point.) Also, −2n−1 does not have a representation of opposite sign.
3 -7 a) +85 e) -128
b) -86
f) +99
c) -16
d) +15 g) +123
3 -8 a) +4660 e) -32768
b) -4660 f) +1024
c) -292 g) -1
d) +2000 h) +30767
3 -9 a) 64 e) 7f
b) ff f) f0
c) f6 g) e0
d) 58 h) 80
3 -11 a) ff d) de
CF = 0 ⇒ unsigned right CF = 0 ⇒ unsigned right
OF = 0 ⇒ signed right OF = 1 ⇒ signed wrong
b) 45 e) 0e
CF = 1 ⇒ unsigned wrong CF = 1 ⇒ unsigned wrong
OF = 0 ⇒ signed right OF = 0 ⇒ signed right
c) fb f) 00
CF = 0 ⇒ unsigned right CF = 1 ⇒ unsigned wrong
OF = 0 ⇒ signed right OF = 1 ⇒ signed wrong
3 -14
1 /*
2 * hexTimesTen.c
3 * Multiplies a hex number by 10.
4 * Bob Plantz - 19 June 2009
5 */
6
7 #include "readLn.h"
8 #include "writeStr.h"
398 APPENDIX E. EXERCISE SOLUTIONS
9 #include "hex2int.h"
10 #include "int2hex.h"
11
12 int main(void)
13 {
14 char aString[9];
15 unsigned int x;
16
26 return 0;
27 }
1 /*
2 * hex2int.h
3 *
4 * Converts a hexadecimal text string to corresponding
5 * unsigned int format.
6 * Assumes text string is valid hex chars.
7 *
8 * input:
9 * pointer to null-terminated text string
10 * output:
11 * returns the unsigned int.
12 *
13 * Bob Plantz - 19 June 2009
14 */
15
16 #ifndef HEX2INT_H
17 #define HEX2INT_H
18
21 #endif
1 /*
2 * hex2int.c
3 *
4 * Converts a hexadecimal text string to corresponding
5 * unsigned int format.
6 * Assumes text string is valid hex chars.
7 *
8 * input:
9 * pointer to null-terminated text string
10 * output:
11 * returns the unsigned int.
12 *
13 * Bob Plantz - 19 June 2009
14 */
E.3. COMPUTER ARITHMETIC 399
15
16 #include "hex2int.h"
17
23 x = 0; // initialize result
24 while (*hexString != ’\0’) // end of string?
25 {
26 x = x << 4; // make room for next four bits
27 aChar = *hexString;
28 if (aChar <= ’9’)
29 x = x + (aChar & 0x0f);
30 else
31 {
32 aChar = aChar & 0x0f;
33 aChar = aChar + 9;
34 x = x + aChar;
35 }
36 hexString++;
37 }
38
39 return x;
40 }
1 /*
2 * int2hex.h
3 *
4 * Converts an unsigned int to corresponding
5 * hex text string format.
6 * Assumes char array is big enough.
7 *
8 * input:
9 * unsigned int
10 * output:
11 * null-terminated text string
12 *
13 * Bob Plantz - 19 June 2009
14 */
15
16 #ifndef INT2HEX_H
17 #define INT2HEX_H
18
21 #endif
1 /*
2 * int2hex.c
3 *
4 * Converts an unsigned int to corresponding
5 * hex text string format.
6 * Assumes char array is big enough.
7 *
400 APPENDIX E. EXERCISE SOLUTIONS
8 * input:
9 * unsigned int
10 * output:
11 * null-terminated text string
12 *
13 * Bob Plantz - 19 June 2009
14 */
15
16 #include "int2hex.h"
17
3 -15
1 /*
2 * binTimesTen.c
3 * Multiplies a hex number by 10.
4 *
5 * Bob Plantz - 19 June 2009
6 */
7
8 #include "readLn.h"
9 #include "writeStr.h"
10 #include "bin2int.h"
11 #include "int2bin.h"
12
13 int main(void)
14 {
15 char aString[33];
16 unsigned int x;
17
27 return 0;
28 }
1 /*
2 * bin2int.h
3 *
4 * bin2int.c
5 * Converts a binary text string to corresponding
6 * unsigned int format.
7 * Assumes text string contains valid binary chars.
8 *
9 * input:
10 * pointer to null-terminated text string
11 * output:
12 * returns the unsigned int.
13 *
14 * Bob Plantz - 19 June 2009
15 */
16
17 #ifndef BIN2INT_H
18 #define BIN2INT_H
19
22 #endif
1 /*
2 * bin2int.c
3 * Converts a binary text string to corresponding
4 * unsigned int format.
5 * Assumes text string contains valid binary chars.
6 *
7 * input:
8 * pointer to null-terminated text string
9 * output:
10 * returns the unsigned int.
11 *
12 * Bob Plantz - 19 June 2009
13 */
14
15 #include "bin2int.h"
16
22 x = 0; // initialize result
23 while (*binString != ’\0’) // end of string?
24 {
25 x = x << 1; // make room for next bit
26 aChar = *binString;
27 x |= (0x1 & aChar); // sift out the bit
28 binString++;
29 }
30
402 APPENDIX E. EXERCISE SOLUTIONS
31 return x;
32 }
1 /*
2 * int2bin.h
3 *
4 * Converts an unsigned int to corresponding
5 * binary text string format.
6 * Assumes char array is big enough.
7 *
8 * input:
9 * unsigned int
10 * output:
11 * null-terminated text string
12 *
13 * Bob Plantz - 19 June 2009
14 */
15
16 #ifndef INT2BIN_H
17 #define INT2BIN_H
18
21 #endif
1 /*
2 * int2bin.c
3 *
4 * Converts an unsigned int to corresponding
5 * binary text string format.
6 * Assumes char array is big enough.
7 *
8 * input:
9 * unsigned int
10 * output:
11 * null-terminated text string
12 *
13 * Bob Plantz - 19 June 2009
14 */
15
16 #include "int2bin.h"
17
7 #include "readLn.h"
8 #include "writeStr.h"
9 #include "udec2int.h"
10 #include "int2bin.h"
11
12 int main(void)
13 {
14 char aString[33];
15 unsigned int x;
16
26 return 0;
27 }
1 /*
2 * uDec2int.h
3 *
4 * Converts a decimal text string to corresponding
5 * unsigned int format.
6 * Assumes text string is valid decimal chars.
7 *
8 * input:
9 * pointer to null-terminated text string
10 * output:
11 * returns the unsigned int.
12 *
13 * Bob Plantz - 19 June 2009
14 */
15
16 #ifndef UDEC2INT_H
17 #define UDEC2INT_H
18
21 #endif
1 /*
2 * uDec2int.c
3 *
4 * Converts a decimal text string to corresponding
404 APPENDIX E. EXERCISE SOLUTIONS
16 #include "uDec2int.h"
17
23 x = 0; // initialize result
24 while (*decString != ’\0’) // end of string?
25 {
26 x *= 10;
27 aChar = *decString;
28 x += (0xf & aChar);
29 decString++;
30 }
31
32 return x;
33 }
See above for int2bin. See Section E.2 for writeStr and readLn.
3 -17
1 /*
2 * sDecTimesTen.c
3 * Multiplies a signed decimal number by 10
4 * and shows result in binary.
5 * Bob Plantz - 21 June 2009
6 */
7
8 #include "readLn.h"
9 #include "writeStr.h"
10 #include "sDec2int.h"
11 #include "int2bin.h"
12
13 int main(void)
14 {
15 char aString[33];
16 int x;
17
25 writeStr("\n");
26
27 return 0;
28 }
1 /*
2 * sDec2int.h
3 *
4 * Converts a decimal text string to corresponding
5 * signed int format.
6 * Assumes text string is valid decimal chars.
7 *
8 * input:
9 * pointer to null-terminated text string
10 * output:
11 * returns the signed int.
12 *
13 * Bob Plantz - 19 June 2009
14 */
15
16 #ifndef SDEC2INT_H
17 #define SDEC2INT_H
18
21 #endif
1 /*
2 * sDec2int.c
3 *
4 * Converts a decimal text string to corresponding
5 * signed int format.
6 * Assumes text string is valid decimal chars.
7 *
8 * input:
9 * pointer to null-terminated text string
10 * output:
11 * returns the signed int.
12 *
13 * Bob Plantz - 19 June 2009
14 */
15
16 #include "uDec2int.h"
17 #include "sDec2int.h"
18
24 if (*decString == ’-’)
25 {
26 negative = 1;
27 decString++;
28 }
29 else
406 APPENDIX E. EXERCISE SOLUTIONS
30 {
31 if (*decString == ’+’)
32 decString++;
33 }
34
35 x = uDec2int(decString);
36
37 if (negative)
38 x *= -1;
39
40 return x;
41 }
See above for int2bin and uDec2int. See Section E.2 for writeStr and readLn.
x x·1 x x+0
0 1 0 0 0 0
1 1 1 1 0 1
x x·0 x x+1
0 0 0 0 1 1
1 0 0 1 1 1
x x′ x·0 x x′ x+1
0 1 0 0 1 1
1 0 0 1 0 1
x x x·0 x x x+1
0 0 0 0 0 0
1 1 1 1 1 1
x y z x · (y + z) x·y+x·z
0 0 0 0 0
0 0 1 0 0
0 1 0 0 0
0 1 1 0 0
1 0 0 0 0
1 0 1 1 1
1 1 0 1 1
1 1 1 1 1
E.4. LOGIC GATES 407
x y z x+y·z (x + y) · (x + z)
0 0 0 0 0
0 0 1 0 0
0 1 0 0 0
0 1 1 1 1
1 0 0 1 1
1 0 1 1 1
1 1 0 1 1
1 1 1 1 1
x y = x′ y′
0 1 0
1 0 1
4 -8 Minterms:
F (x, y, z) xy
00 01 11 10
0 m0 m2 m6 m4
z
1 m1 m3 m7 m5
4 -9 Minterms:
F (x, y, z) xz
00 01 11 10
0 m0 m1 m5 m4
y
1 m2 m3 m7 m6
4 -10 The prime numbers correspond to the minterms m2 , m3 , m5 , and m7 . The minterms m10 ,
m11 , m12 , m13 , m14 , m15 cannot occur so are marked “don’t care” on the Karnaugh map.
F (w, x, y, z) yz
00 01 11 10
m0 m1 1 1
00
01 m4 1 1 m6
wx
× × ×
11 ×
10 m8 m9 × ×
F (w, x, y, z) = x · z + x′ · y
x1 x0 y1 y0 F
0 0 0 0 0
0 0 0 1 1
0 0 1 0 1
0 0 1 1 1
0 1 0 0 0
0 1 0 1 0
0 1 1 0 1
0 1 1 1 1
1 0 0 0 0
1 0 0 1 0
1 0 1 0 0
1 0 1 1 1
1 1 0 0 0
1 1 0 1 0
1 1 1 0 0
1 1 1 1 0
x1 x0 y1 y0
Enable = 0 Enable = 1
Current Next Next
n1 n0 n1 n0 J1 K1 J0 K0 n1 n0 J1 K1 J0 K0
0 0 0 0 0 1 0 1 0 1 0 1 1 0
0 1 0 1 0 1 1 0 1 0 1 0 0 1
1 0 1 0 1 0 0 1 1 1 1 0 1 0
1 1 1 1 1 0 1 0 0 0 0 1 0 1
This leads to the following equations for the inputs to the JK flip-flops (using “E” for
“Enable”):
J0 (E, n1 , n0 ) n1 n0 K0 (E, n1 , n0 ) n1 n0
00 01 11 10 00 01 11 10
0 1 1 0 1 1
E E
1 1 1 1 1 1
J1 (E, n1 , n0 ) n1 n0 K1 (E, n1 , n0 ) n1 n0
00 01 11 10 00 01 11 10
0 1 1 0 1 1
E E
1 1
1
1
1
1
J0 = E ′ · n0 + E · n′0
K0 = E ′ · n′0 + E · n1
J1 = E ′ · n1 + n1 · n′0 + E · n′1 · n0
K1 = E ′ · n′1 + n′1 · n′0 + E · n1 · n0
410 APPENDIX E. EXERCISE SOLUTIONS
5 -4 Four-bit up counter.
1 T Q n0
Q0
CLK CK
T Q n1
Q1
CK
T Q n2
Q2
CK
T Q n3
Q3
CK
8 #include <stdio.h>
9
10 int main(void)
11 {
12 unsigned char *ptr;
13 int x, i, bigEndian;
14
33 return 0;
34 }
6 -6
1 /*
2 * endianReg.c
3 * Stores user int in memory then copies to register var.
4 * Use gdb to observe endianess.
5 * Bob Plantz - 22 June 2009
6 */
7
8 #include <stdio.h>
9
10 int main(void)
11 {
12 int x;
13 register int y;
14
18 y = x;
19 printf("You entered %i\n", y);
20
21 return 0;
22 }
When I ran this program with the input -1985229329, I got the results:
(gdb) print &x
$5 = (int *) 0x7ffff74f473c
(gdb) x/4xb 0x7ffff74f473c
0x7ffff74f473c: 0xef 0xcd 0xab 0x89
(gdb) i r rcx
rcx 0xffffffff89abcdef -1985229329
(gdb) print x
$6 = -1985229329
(gdb)
which shows the value stored in rcx (used as the y variable) is in regular order, and the
value store in memory (the x variable) is in little endian.
412 APPENDIX E. EXERCISE SOLUTIONS
5 .text
6 .globl f
7 .type f, @function
8 f:
9 pushq %rbp # save caller’s frame pointer
10 movq %rsp, %rbp # establish ours
11
7 -2
1 # g.s
2 # Does nothing but return to caller.
3 # Bob Plantz - 22 June 2009
4
5 .text
6 .globl g
7 .type g, @function
8 g:
9 pushq %rbp # save caller’s frame pointer
10 movq %rsp, %rbp # establish ours
11
7 -3
1 # h.s
2 # Does nothing but return 123 to caller.
3 # Bob Plantz - 22 June 2009
4
5 .text
6 .globl h
7 .type h, @function
8 h:
9 pushq %rbp # save caller’s frame pointer
10 movq %rsp, %rbp # establish ours
11
7 -4
1 /*
2 * checkRetNos.c
3 * calls three assembly language functions and
4 * prints their return numbers.
5 *
6 * Bob Plantz - 22 June 2009
7 */
8
9 #include <stdio.h>
10 int one();
11 int two();
12 int three();
13
14 int main()
15 {
16 int x;
17
18 x = one();
19 printf("one returns %i, ", x);
20
21 x = two();
22 printf("two returns %i, and ", x);
23
24 x = three();
25 printf("three returns %i.\n", x);
26
27 return 0;
28 }
1 # one.s
2 # returns 1 to calling function.
3 # Bob Plantz - 22 June 2009
4
5 .text
6 .globl one
7 .type one, @function
8 one:
9 pushq %rbp # save caller’s base pointer
10 movq %rsp, %rbp # establish ours
11
1 # two.s
2 # returns 2 to calling function.
3 # Bob Plantz - 22 June 2009
4
5 .text
6 .globl two
7 .type two, @function
8 two:
9 pushq %rbp # save caller’s base pointer
414 APPENDIX E. EXERCISE SOLUTIONS
1 # three.s
2 # returns 3 to calling function.
3 # Bob Plantz - 22 June 2009
4
5 .text
6 .globl three
7 .type three, @function
8 three:
9 pushq %rbp # save caller’s base pointer
10 movq %rsp, %rbp # establish ours
11
7 -5
1 /*
2 * checkRetLtrs.c
3 * calls three assembly language functions and
4 * prints their return characters.
5 *
6 * Bob Plantz - 22 June 2009
7 */
8
9 #include <stdio.h>
10 char el();
11 char em();
12 char en();
13
14 int main()
15 {
16 char letter;
17
18 letter = el();
19 printf("el returns %c, ", letter);
20
21 letter = em();
22 printf("en returns %c, and ", letter);
23
24 letter = en();
25 printf("em returns %c.\n", letter);
26
27 return 0;
28 }
1 # el.s
E.7. PROGRAMMING IN ASSEMBLY LANGUAGE 415
1 # em.s
2 # returns M to calling function.
3 # Bob Plantz - 22 June 2009
4 .text
5 .globl em
6 .type em, @function
7 em:
8 pushq %rbp # save caller’s base pointer
9 movq %rsp, %rbp # establish ours
10
1 # en.s
2 # returns N to calling function.
3 # Bob Plantz - 22 June 2009
4 .text
5 .globl en
6 .type en, @function
7 en:
8 pushq %rbp # save caller’s base pointer
9 movq %rsp, %rbp # establish ours
10
7 -6 The four characters are returned as a 4-byte word and then stored in memory by main.
They are then written to standard out one character at a time. Storage order in memory
is little endian, so the characters are displayed “backwards.”
1 /*
2 * fourLetterWord.c
3 * calls a function to get a four letter word, then
4 * prints it.
5 *
416 APPENDIX E. EXERCISE SOLUTIONS
9 #include <unistd.h>
10 #include "retWord.h"
11
12 int main()
13 {
14 int x;
15 char endl = ’\n’;
16
17 x = retWord();
18 write(STDOUT_FILENO, &x, 4);
19
22 return 0;
23 }
1 # retWord.s
2 # returns 4-letter word to calling function.
3 # Bob Plantz - 22 June 2009
4 .text
5 .globl retWord
6 .type retWord, @function
7 retWord:
8 pushq %rbp # save caller’s base pointer
9 movq %rsp, %rbp # establish ours
10
8 #include <stdio.h>
9
10 int theStack[500];
11 int *stackPointer = &theStack[0];
12
13 /*
14 * precondition:
15 * stackPointer points to data element at top of stack
16 * postcondtion:
17 * address in stackPointer is incremented by four
E.8. PROGRAM DATA – INPUT, STORE, OUTPUT 417
26 /*
27 * precondition:
28 * stackPointer points to data element at top of stack
29 * postcondtion:
30 * data element at top of stack is copied to *data_location
31 * address in stackPointer is decremented by four
32 */
33 void pop(int *data_location)
34 {
35 *data_location = *stackPointer;
36 stackPointer--;
37 }
38
39 int main(void)
40 {
41 int x = 12;
42 int y = 34;
43 int z = 56;
44 printf("Start with the stack pointer at %p", (void *)stackPointer);
45 printf(", and x = %i, y = %i, and z = %i\n", x, y, z);
46
47 push(x);
48 push(y);
49 push(z);
50 x = 100;
51 y = 200;
52 z = 300;
53 printf("Now the stack pointer is at %p", (void *)stackPointer);
54 printf(", and x = %i, y = %i, and z = %i\n", x, y, z);
55 pop(&z);
56 pop(&y);
57 pop(&x);
58
62 return 0;
63 }
8 -3 Use gdb to examine the values in the rbp and rsp registers just before the first and just
before the last instructions are executed.
8 -4 This exercise shows that the text strings and local variables are stored in different areas
of memory.
8 -6
1 # int2hex.s
2 # Prompts user to enter an integer, then displays its hex equivalent
3 # Bob Plantz - 22 June 2009
4
418 APPENDIX E. EXERCISE SOLUTIONS
5 # Stack frame
6 .equ anInt,-4
7 .equ localSize,-16
8 # Read only data
9 .section .rodata
10 prompt:
11 .string "Enter an integer number: "
12 scanFormat:
13 .string "%i"
14 printFormat:
15 .string "%i = %x\n"
16 # Code
17 .text # switch to text segment
18 .globl main
19 .type main, @function
20 main:
21 pushq %rbp # save caller’s base pointer
22 movq %rsp, %rbp # establish our base pointer
23 addq $localSize, %rsp # for local variable
24
8 -7
1 # assignSeveral.s
2 # Assigns values to four chars and four ints and prints them.
3 # Bob Plantz - 22 June 2009
4
5 # Stack frame
6 .equ a,-1
7 .equ b,-2
8 .equ c,-3
9 .equ d,-4
10 .equ w,-8
11 .equ x,-12
12 .equ y,-16
13 .equ z,-20
14 .equ arg7,0
15 .equ arg8,8
16 .equ arg9,16
E.9. COMPUTER OPERATIONS 419
17 .equ localSize,-48
18 # Read only data
19 .section .rodata
20 format:
21 .string "The values are %c, %i, %c, %i, %c, %i, %c, and %i\n"
22 # Code
23 .text
24 .globl main
25 .type main, @function
26 main:
27 pushq %rbp # save calling function’s base pointer
28 movq %rsp, %rbp # establish our base pointer
29 addq $localSize, %rsp # allocate memory for local variables
30
9 -3 The assembly language program in Listing 9.6 uses esi for the y variable and edx for the z
variable. If there is overflow, the call to printf changes the contents of these registers. So
when the results are displayed y and/or z are incorrect.
1 # addAndSubtract3.s
2 # Gets two integers from user, then
3 # performs addition and subtraction
4 # Bob Plantz - 23 June 2009
5 # Stack frame
6 .equ w,-16
7 .equ x,-12
8 .equ y,-8
9 .equ z,-4
10 .equ localSize,-16
11 # Read only data
12 .section .rodata
13 prompt:
14 .string "Enter two integers: "
15 getData:
16 .string "%i %i"
17 display:
18 .string "sum = %i, difference = %i\n"
19 warning:
20 .string "Overflow has occurred.\n"
21 # Code
22 .text
23 .globl main
24 .type main, @function
25 main:
26 pushq %rbp # save caller’s base pointer
27 movq %rsp, %rbp # establish our base pointer
28 addq $localSize, %rsp # for local vars
29
40 ##############################################################
41 # These three instructions could replace the four that follow
42 # this sequence. They work because mov does not affect eflags.
43 # But changes in the code may introduce an instruction before
44 # the jno that does affect eflags, thus breaking the code.
45 # movl w(%rbp), %eax # load w
46 # addl y(%rbp), %eax # add y
47 # movl %eax, y(%rbp) # y = w + x
48 ##############################################################
49 movl w(%rbp), %eax # load w
50 movl %eax, y(%rbp) # y = w
51 movl x(%rbp), %eax # load x
52 addl %eax, y(%rbp) # y = w + x
53 jno nOver1 # skip warning if no OF
E.9. COMPUTER OPERATIONS 421
1 # Exercise_9-6.s
2 # This is not a program. It is a group of
3 # instructions to hand-assemble.
4 # Bob Plantz - 27 June 2009
5 .text
6 .globl main
7 main:
8 0000 55 pushq %rbp
9 0001 4889E5 movq %rsp, %rbp
10
11 0004 B9EFCDAB movl $0x89abcdef, %ecx # a)
11 89
12 0009 66B8CDAB movw $0xabcd, %ax # b)
13 000d B030 movb $0x30, %al # c)
14 000f B431 movb $0x31, %ah # d)
15 0011 4D89C7 movq %r8, %r15 # e)
16 0014 4588CA movb %r9b, %r10b # f)
17 0017 4589DC movl %r11d, %r12d # g)
18 001a 48BEF42C movq $0x7fffec9b2cf4, %rsi # h)
18 9BECFF7F
18 0000
19
20 0024 B8000000 movl $0, %eax
20 00
21 0029 4889EC movq %rbp, %rsp
22 002c 5D popq %rbp
23 002d C3 ret
24
422 APPENDIX E. EXERCISE SOLUTIONS
1 # Exercise_9-7.s
2 # This is not a program. It is a group of
3 # instructions to hand-assemble.
4 # Bob Plantz - 27 June 2009
5 .text
6 .globl main
7 main:
8 0000 55 pushq %rbp
9 0001 4889E5 movq %rsp, %rbp
10
11 0004 81C1EFCD addl $0x89abcdef, %ecx # a)
11 AB89
12 000a 6605CDAB addw $0xabcd, %ax # b)
13 000e 0430 addb $0x30, %al # c)
14 0010 80C431 addb $0x31, %ah # d)
15 0013 4D01E7 addq %r12, %r15 # e)
16 0016 664501C2 addw %r8w, %r10w # f)
17 001a 4400CE addb %r9b, %sil # g)
18 001d 01F7 addl %esi, %edi # h)
19
20 001f B8000000 movl $0, %eax
20 00
21 0024 4889EC movq %rbp, %rsp
22 0027 5D popq %rbp
23 0028 C3 ret
24
1 # Exercise_9-8.s
2 # This is not a program. It is an experiment
3 # to determine the machine code for pushl.
4 # Bob Plantz - 27 June 2009
5 .text
6 .globl main
7 main:
8 0000 55 pushq %rbp
9 0001 4889E5 movq %rsp, %rbp
10
11 0004 50 pushq %rax
12 0005 51 pushq %rcx
13 0006 52 pushq %rdx
14 0007 53 pushq %rbx
15 0008 56 pushq %rsi
16 0009 57 pushq %rdi
17 000a 4150 pushq %r8
18 000c 4151 pushq %r9
19 000e 4152 pushq %r10
20 0010 4153 pushq %r11
21 0012 4154 pushq %r12
22 0014 4155 pushq %r13
23 0016 4156 pushq %r14
E.9. COMPUTER OPERATIONS 423
1 # Exercise_9-10.s
2 # This is not a program. I used the machine code from the
3 # listing to create Exercise 9-9.
4 # Uses a drill and kill approach to learning
5 # how to disassemble machine code
6 # Bob Plantz - 27 June 2009
7 .text
8 .globl main
9 main:
10 0000 55 pushq %rbp
11 0001 4889E5 movq %rsp, %rbp
12
13 #a
14 0004 B0AB movb $0xab, %al
15 0006 B4CD movb $0xcd, %ah
16 0008 41B0EF movb $0xef, %r8b
17 000b 41B701 movb $0x01, %r15b
18
19 #b
20 000e 40B723 movb $0x23, %dil
21 0011 40B634 movb $0x34, %sil
22 0014 B256 movb $0x56, %dl
23 0016 B678 movb $0x78, %dh
24
25 #c
26 0018 B83412CD movl $0xabcd1234, %eax
424 APPENDIX E. EXERCISE SOLUTIONS
26 AB
27 001d BBABCD12 movl $0x3412cdab, %ebx
27 34
28 0022 41B90000 movl $0x0, %r9d
28 0000
29 0028 41BE7B00 movl $0x7b, %r14d
29 0000
30
31 #d
32 002e 66B8CDAB movw $0xabcd, %ax
33 0032 66BBBACD movw $0xcdba, %bx
34 0036 66B93412 movw $0x1234, %cx
35 003a 66BA2143 movw $0x4321, %dx
36
37 #e
38 003e 88C4 movb %al, %ah
39 0040 88C8 movb %cl, %al
40 0042 8808 movb %cl, (%rax)
41 0044 88480A movb %cl, 10(%rax)
42 0047 8A08 movb (%rax), %cl
43 0049 8A480A movb 10(%rax), %cl
44
45 #f
46 004c 89C3 movl %eax, %ebx
47 004e 6689D8 movw %bx, %ax
48 0051 4889CA movq %rcx, %rdx
49 0054 4589C6 movl %r8d, %r14d
50
51 #g
52 0057 04AB addb $0xab, %al
53 0059 80C4CD addb $0xcd, %ah
GAS LISTING Exercise_9-10.s page 2
the second byte in the jmp here1 instruction is 03, which is the number of bytes to the
here1 location.
Single-stepping through the program with gdb and examining the contents of rax, rip, and
pointer shows that jmp *%rax and jmp *pointer use the full address, not just an offset.
10 -3 The program will probably crash. When the write function is called, it returns the number
of characters written. Return values are placed in eax. Hence, the address is overwritten.
In general, it is safer to use variables in the stack frame if their values must remain the
same after another function is called.
10 -4
1 # numerals.s
2 # Displays the numerals on screen
3 # Bob Plantz - 27 June 2009
4 # useful constant
5 .equ STDOUT,1
6 # stack frame
7 .equ theNumeral,-1
8 .equ localSize,-16
9 # read only data
10 .section .rodata
11 newline:
12 .byte ’\n’
13 # code
14 .text
15 .globl main
16 .type main, @function
17 main:
18 pushq %rbp # save caller’s base pointer
19 movq %rsp, %rbp # establish ours
20 addq $localSize, %rsp # local vars.
21
33 allDone:
34 movl $1, %edx # do a newline for user
35 movl $newline, %esi
36 movl $STDOUT, %edi
37 call write
38
10 -5
1 # alphaUpper.s
2 # Displays the upper case alphabet on screen
3 # Bob Plantz - 27 June 2009
4 # useful constant
5 .equ STDOUT,1
6 # stack frame
7 .equ theLetter,-1
8 .equ localSize,-16
9 # read only data
10 .section .rodata
11 newline:
12 .byte ’\n’
13 # code
14 .text
15 .globl main
16 .type main, @function
17 main:
18 pushq %rbp # save caller’s base pointer
19 movq %rsp, %rbp # establish ours
20 addq $localSize, %rsp # local vars.
21
33 allDone:
34 movl $1, %edx # do a newline for user
35 movl $newline, %esi
36 movl $STDOUT, %edi
37 call write
38
10 -6
1 # alphaLower.s
2 # Displays the lower case alphabet on screen
3 # Bob Plantz - 27 June 2009
4 # useful constant
5 .equ STDOUT,1
6 # stack frame
7 .equ theLetter,-1
8 .equ localSize,-16
9 # read only data
10 .section .rodata
11 newline:
428 APPENDIX E. EXERCISE SOLUTIONS
12 .byte ’\n’
13 # code
14 .text
15 .globl main
16 .type main, @function
17 main:
18 pushq %rbp # save caller’s base pointer
19 movq %rsp, %rbp # establish ours
20 addq $localSize, %rsp # local vars.
21
33 allDone:
34 movl $1, %edx # do a newline for user
35 movl $newline, %esi
36 movl $STDOUT, %edi
37 call write
38
10 -7
1 /*
2 * whileLoop.c
3 * While loop multiplication.
4 *
5 * Bob Plantz - 27 June 2009
6 */
7
8 #include<stdio.h>
9
10 int main ()
11 {
12 int x, y, z;
13 int i;
14
With version 4.3.3 of gcc and no optimization (-O0), they both use the same assembly
language for the loop:
jmp .L2
.L3:
movl -4(%rbp), %eax
addl %eax, -12(%rbp)
addl $1, -16(%rbp)
.L2:
movl -8(%rbp), %eax
cmpl %eax, -16(%rbp)
jl .L3
10 -8 After the program executes, the system prompt is displayed twice because the “return
key” is still in the standard in buffer. This can be fixed by reading two characters.
1 /*
2 * yesNo1a.c
3 * Prompts user to enter a y/n response.
4 *
5 * Bob Plantz - 27 June 2009
6 */
7
8 #include <unistd.h>
9
12 int main(void)
13 {
14 register char *ptr;
15
26 if (*response == ’y’)
27 {
28 ptr = "Changes saved.\n";
29 while (*ptr != ’\0’)
30 {
31 write(STDOUT_FILENO, ptr, 1);
32 ptr++;
33 }
34 }
35 else
36 {
37 ptr = "Changes discarded.\n";
430 APPENDIX E. EXERCISE SOLUTIONS
10 -10
1 # others.s
2 # Displays all printable characters other than numerals
3 # and letters.
4 # Bob Plantz - 27 June 2009
5 # useful constants
6 .equ STDOUT,1
7 .equ SPACE,’ ’ # lowest printable character
8 .equ SQUIGGLE,’~’ # highest printable character
9 # stack frame
10 .equ theChar,-1
11 .equ localSize,-16
12 # read only data
13 .section .rodata
14 newline:
15 .byte ’\n’
16 # code
17 .text
18 .globl main
19 .type main, @function
20 main:
21 pushq %rbp # save caller’s base pointer
22 movq %rsp, %rbp # establish ours
23 addq $localSize, %rsp # local vars.
24
51 allDone:
52 movl $1, %edx # do a newline for user
53 movl $newline, %esi
54 movl $STDOUT, %edi
55 call write
56
10 -11
1 # incChars.s
2 # Prompts user to enter a text string, then changes each
3 # character to the next higher one.
4 # Bob Plantz - 27 June 2009
5 # useful constants
6 .equ STDIN,0
7 .equ STDOUT,1
8 .equ SPACE,’ ’ # lowest printable character
9 .equ SQUIGGLE,’~’ # highest printable character
10 # stack frame
11 .equ theString,-256
12 .equ localSize,-256
13 # read only data
14 .section .rodata
15 prompt:
16 .string "Enter a string of characters: "
17 msg:
18 .string "Incrementing each character: "
19 newline:
20 .byte ’\n’
21 # code
22 .text
23 .globl main
24 .type main, @function
25 main:
26 pushq %rbp # save caller’s base pointer
27 movq %rsp, %rbp # establish ours
28 addq $localSize, %rsp # local vars.
29
40 getString:
41 leaq theString(%rbp), %rsi # place to put user input
432 APPENDIX E. EXERCISE SOLUTIONS
54 incChars:
55 movb $0, (%rsi) # null character for C string
56 leaq theString(%rbp), %rsi # pointer to the string
57 incLoop:
58 cmpb $0, (%rsi) # end of string?
59 je doDisplay # yes, display the results
60 incb (%rsi) # change character
61 cmpb $SQUIGGLE, (%rsi) # did we go too far?
62 jbe okay # no
63 movb $SPACE, (%rsi) # yes, wrap to beginning
64 okay:
65 incq %rsi # next char
66 jmp incLoop # check at top of loop
67
68 doDisplay:
69 movl $msg, %esi # print message for user
70 dispLoop:
71 cmpb $0, (%esi) # end of string?
72 je showString # yes, show results
73 movl $1, %edx # no, one character
74 movl $STDOUT, %edi
75 call write
76 incl %esi # next char
77 jmp dispLoop # check at top of loop
78
79 showString:
80 leaq theString(%rbp), %rsi # pointer to the string
81 showLoop:
82 cmpb $0, (%rsi) # end of string?
83 je allDone # yes, get user input
84 movl $1, %edx # no, one character
85 movl $STDOUT, %edi
86 call write
87 incq %rsi # next char
88 jmp showLoop # check at top of loop
89
90 allDone:
91 movl $1, %edx # do a newline for user
92 movl $newline, %esi
93 movl $STDOUT, %edi
94 call write
95
10 -12
1 # decChars.s
2 # Prompts user to enter a text string, then changes each
3 # character to the next lower one.
4 # Bob Plantz - 27 June 2009
5 # useful constants
6 .equ STDIN,0
7 .equ STDOUT,1
8 .equ SPACE,’ ’ # lowest printable character
9 .equ SQUIGGLE,’~’ # highest printable character
10 # stack frame
11 .equ theString,-256
12 .equ localSize,-256
13 # read only data
14 .section .rodata
15 prompt:
16 .string "Enter a string of characters: "
17 msg:
18 .string "Decrementing each character: "
19 newline:
20 .byte ’\n’
21 # code
22 .text
23 .globl main
24 .type main, @function
25 main:
26 pushq %rbp # save caller’s base pointer
27 movq %rsp, %rbp # establish ours
28 addq $localSize, %rsp # local vars.
29
40 getString:
41 leaq theString(%rbp), %rsi # place to put user input
42 movl $1, %edx # one character
43 movl $STDIN, %edi
44 call read
45 readLup:
46 cmpb $’\n’, (%rsi) # end of input?
47 je decChars # yes, process the string
48 incq %rsi # next char
49 movl $1, %edx # one character
50 movl $STDIN, %edi
51 call read
52 jmp readLup # check at top of loop
53
434 APPENDIX E. EXERCISE SOLUTIONS
54 decChars:
55 movb $0, (%rsi) # null character for C string
56 leaq theString(%rbp), %rsi # pointer to the string
57 decLoop:
58 cmpb $0, (%rsi) # end of string?
59 je doDisplay # yes, display the results
60 decb (%rsi) # change character
61 cmpb $SPACE, (%rsi) # did we go too far?
62 jae okay # no
63 movb $SQUIGGLE, (%rsi) # yes, wrap to beginning
64 okay:
65 incq %rsi # next char
66 jmp decLoop # check at top of loop
67
68 doDisplay:
69 movl $msg, %esi # print message for user
70 dispLoop:
71 cmpb $0, (%esi) # end of string?
72 je showString # yes, show results
73 movl $1, %edx # no, one character
74 movl $STDOUT, %edi
75 call write
76 incl %esi # next char
77 jmp dispLoop # check at top of loop
78
79 showString:
80 leaq theString(%rbp), %rsi # pointer to the string
81 showLoop:
82 cmpb $0, (%rsi) # end of string?
83 je allDone # yes, get user input
84 movl $1, %edx # no, one character
85 movl $STDOUT, %edi
86 call write
87 incq %rsi # next char
88 jmp showLoop # check at top of loop
89
90 allDone:
91 movl $1, %edx # do a newline for user
92 movl $newline, %esi
93 movl $STDOUT, %edi
94 call write
95
10 -13
1 # echoN.s
2 # Prompts user to enter a single charcter.
3 # The character is echoed. If it is a numeral, say N,
4 # it is echoed N+1 times
5 # Bob Plantz - 27 June 2009
6 # useful constants
7 .equ STDIN,0
8 .equ STDOUT,1
9 # stack frame
E.10. PROGRAM FLOW CONSTRUCTS 435
10 .equ count,-8
11 .equ response,-4
12 .equ localSize,-16
13 # read only data
14 .section .rodata
15 instruct:
16 .ascii "A single numeral, N, is echoed N+1 times, other characters "
17 .asciz "are\nechoed once. ’q’ ends program.\n\n"
18 prompt:
19 .string "Enter a single character: "
20 msg:
21 .string "You entered: "
22 bye:
23 .string "End of program.\n"
24 newline:
25 .byte ’\n’
26 # code
27 .text
28 .globl main
29 .type main, @function
30 main:
31 pushq %rbp # save caller’s base pointer
32 movq %rsp, %rbp # establish ours
33 addq $localSize, %rsp # local vars
34
45 runLoop:
46 movl $prompt, %esi # prompt user
47 promptLup:
48 cmpb $0, (%esi) # end of string?
49 je getChar # yes, get user input
50 movl $1, %edx # no, one character
51 movl $STDOUT, %edi
52 call write
53 incl %esi # next char
54 jmp promptLup # check at top of loop
55
56 getChar:
57 leaq response(%rbp), %rsi # place to put user input
58 movl $2, %edx # include newline
59 movl $STDIN, %edi
60 call read
61
85 doChar:
86 movl $1, %edx # one character
87 leaq response(%rbp), %rsi # in this mem location
88 movl $STDOUT, %edi
89 call write
90
100 allDone:
101 movl $bye, %esi # ending message
102 doneLup:
103 cmpb $0, (%esi) # end of string?
104 je cleanUp # yes, get user input
105 movl $1, %edx # no, one character
106 movl $STDOUT, %edi
107 call write
108 incl %esi # next char
109 jmp doneLup # check at top of loop
110
111 cleanUp:
112 movl $0, %eax # return 0;
113 movq %rbp, %rsp # delete local vars.
114 popq %rbp # restore caller’s base pointer
115 ret # return to caller
5 hiworld:
6 .string "Hello, world!\n"
7
8 .text
9 .globl main
10
11 main:
12 pushq %rbp # save caller base pointer
13 movq %rsp, %rbp # establish our base pointer
14
1 # writeStr.s
2 # Writes a C-style text string to the standard output (screen).
3 # Bob Plantz - 27 June 2009
4
5 # Calling sequence:
6 # rdi <- address of string to be written
7 # call writestr
8 # returns number of characters written
9
10 # Useful constant
11 .equ STDOUT,1
12 # Stack frame, showing local variables and arguments
13 .equ stringAddr,-16
14 .equ count,-4
15 .equ localSize,-16
16
17 .text
18 .globl writeStr
19 .type writeStr, @function
20 writeStr:
21 pushq %rbp # save base pointer
22 movq %rsp, %rbp # new base pointer
23 addq $localSize, %rsp # local vars. and arg.
24
11 -4
1 # echoString.s
2 # Prompts user to enter a string, then echoes it.
3 # Bob Plantz - 27 June 2009
4 # stack frame
5 .equ theString,-256
6 .equ localSize,-256
7 # read only data
8 .data
9 usrprmpt:
10 .string "Enter a text string:\n"
11 usrmsg:
12 .string "You entered:\n"
13 newline:
14 .string "\n"
15 # code
16 .text
17 .globl main
18 .type main, @function
19 main:
20 pushq %rbp # save caller base pointer
21 movq %rsp, %rbp # establish our base pointer
22 addq $localSize, %rsp # local vars.
23
1 # readLnSimple.s
2 # Reads a line (through the ’\n’ character from standard input. Deletes
3 # the ’\n’ and creates a C-style text string.
4 # Bob Plantz - 27 June 2009
5
6 # Calling sequence:
E.11. WRITING YOUR OWN FUNCTIONS 439
11 # Useful constant
12 .equ STDIN,0
13 # Stack frame, showing local variables and arguments
14 .equ stringAddr,-16
15 .equ count,-4
16 .equ localSize,-16
17
18 .text
19 .globl readLn
20 .type readLn, @function
21 readLn:
22 pushq %rbp # save base pointer
23 movq %rsp, %rbp # new base pointer
24 addq $localSize, %rsp # local vars. and arg.
25
45 endOfString:
46 movq stringAddr(%rbp), %rax # current pointer
47 movb $0, (%rax) # mark end of string
48
11 -5 Note: Some students will try to create a nested loop, the outer one being executed twice.
But the display messages are not nearly as nice, unless the student uses some “goto” state-
ments. In my opinion, two separate change case loops is better software engineering be-
cause it allows maximum flexibility in the user messages. The user will generally complain
about what is seen on the screen, not the cleverness of the code.
1 # changeCase.s
2 # Prompts user to enter a string, echoes it, changes case of alpha
440 APPENDIX E. EXERCISE SOLUTIONS
6 # Stack frame
7 .equ response,-256
8 .equ localSize,-256
9 .data
10 usrprmpt:
11 .string "Enter a text string:\n"
12 usrmsg:
13 .string "You entered:\n"
14 chngmsg:
15 .string "Changing the case gives:\n"
16 newline:
17 .string "\n"
18
19 .text
20 .globl main
21 .type main, @function
22 main:
23 pushq %rbp # save caller base pointer
24 movq %rsp, %rbp # establish our base pointer
25 addq $localSize, %rsp # local vars
26
61 showChange:
62 movl $chngmsg, %edi # tell user about it
63 call writeStr
64
89 showOrig:
90 movl $usrmsg, %edi # show original version
91 call writeStr
92
11 -6
1 # echoString2.s
2 # Prompts user to enter a string, then echoes it.
3 # Bob Plantz - 27 June 2009
4 # stack frame
5 .equ theString,-256
6 .equ localSize,-256
7 # Length of the array. Do not make this larger than 255.
8 # I have used a small number to test readLn for removing
9 # extra characters from the keyboard buffer.
442 APPENDIX E. EXERCISE SOLUTIONS
10 .equ arrayLngth,4
11 # read only data
12 .data
13 usrprmpt:
14 .string "Enter a text string:\n"
15 usrmsg:
16 .string "You entered:\n"
17 newline:
18 .string "\n"
19 # code
20 .text
21 .globl main
22 main:
23 pushq %rbp # save caller base pointer
24 movq %rsp, %rbp # establish our base pointer
25 addq $localSize, %rsp # local vars.
26
1 # readLn.s
2 # Reads a line (through the ’\n’ character from standard input. Deletes
3 # the ’\n’ and creates a C-style text string.
4 # Bob Plantz - 27 June 2009
5
6 # Calling sequence:
7 # rsi <- length of char array
8 # rdi <- address of place to store string
9 # call readLn
10 # returns number of characters read (not including NUL)
11
12 # Useful constant
13 .equ STDIN,0
14 # Stack frame, showing local variables and arguments
15 .equ maxLength,-24
16 .equ stringAddr,-16
17 .equ count,-4
18 .equ localSize,-32
19
E.11. WRITING YOUR OWN FUNCTIONS 443
20 .text
21 .globl readLn
22 .type readLn, @function
23 readLn:
24 pushq %rbp # save base pointer
25 movq %rsp, %rbp # new base pointer
26 addq $localSize, %rsp # local vars. and arg.
27
55 endOfString:
56 movq stringAddr(%rbp), %rax # current pointer
57 movb $0, (%rax) # mark end of string
58
6 # Stack frame
7 .equ theInt,-40
8 .equ buffer,-36
9 .equ localSize,-48
10 # Read only data
11 .section .rodata
12 prompt:
13 .asciz "Please enter an integer in binary: "
14 displayFmt:
15 .asciz "In decimal: %d\n"
16 # Code
17 .text
18 .globl main
19 .type main, @function
20 main:
21 pushq %rbp # save frame pointer
22 movq %rsp, %rbp # new frame pointer
23 addq $localSize, %rsp # local vars.
24
50 # display in decimal
51 movl theInt(%rbp), %esi # int to display
52 movl $displayFmt, %edi # format string
53 movl $0, %eax
54 call printf
E.12. BIT OPERATIONS; MULTIPLICATION AND DIVISION 445
55
12 -2
1 # int2binary.s
2 # Converts decimal int to binary
3 # Bob Plantz - 27 June 2009
4
5 # Stack frame
6 .equ myInt,-44
7 .equ counter,-40
8 .equ buffer,-36
9 .equ localSize,-48
10 # Read only data
11 .section .rodata
12 prompt:
13 .string "Enter an integer: "
14 format:
15 .string "%i"
16 msg1:
17 .string "The stored number is "
18 msg2:
19 .string " in binary.\n"
20 # Code
21 .text
22 .globl main
23 .type main, @function
24 main:
25 pushq %rbp # save frame pointer
26 movq %rsp, %rbp # new frame pointer
27 addq $localSize, %rsp # local vars.
28
53 # display in binary
54 movl $msg1, %edi # nice message for user
55 call writeStr
56
12 -3
1 # multiply.s
2 # Gets two 16-bit integers from user and computes their product.
3 # Bob Plantz - 27 June 2009
4
5 # Stack frame
6 .equ multiplier,-8
7 .equ multiplicand,-4
8 .equ localSize,-16
9 # Read only data
10 .section .rodata
11 prompt:
12 .string "Enter an integer (0 - 65535): "
13 printformat:
14 .string "%hu times %hu = %u\n"
15 scanformat:
16 .string "%hu"
17 # Code
18 .text
19 .globl main
20 .type main, @function
21 main:
22 pushq %rbp # save frame pointer
23 movq %rsp, %rbp # new frame pointer
24 addq $localSize, %rsp # local vars.
25
26 # prompt user
27 movl $prompt, %edi # message address
28 movl $0, %eax
29 call printf
30
36
1 # mul16.s
2 # Multiplies two 16-bit integers and returns 32-bit result
3 # Bob Plantz - 27 June 2009
4
5 # Calling sequence
6 # si <- multiplier
7 # di <- multiplicand
8 # call mul16
9 #Code
10 .text
11 .globl mul16
12 .type mul16, @function
13 mul16:
14 pushq %rbp # save frame pointer
15 movq %rsp, %rbp # new frame pointer
16
12 -4
1 # divide.s
2 # Gets two 32-bit integers from user and computes quotient
3 # of the first divided by the second.
4 # Bob Plantz - 27 June 2009
5
6 # Stack frame
7 .equ divisor,-8
8 .equ dividend,-4
9 .equ localSize,-16
10 # Read only data
11 .section .rodata
12 prompt:
13 .asciz "Enter an integer (0 - 4294967295): "
14 printformat:
15 .asciz "%u div %u = %u\n"
16 scanformat:
17 .asciz "%u"
18 # Code
19 .text
20 .globl main
21 .type main, @function
22 main:
23 pushq %rbp # save frame pointer
24 movq %rsp, %rbp # new frame pointer
25 addq $divisor, %rsp # local vars.
26
27 # prompt user
28 movl $prompt, %edi # message address
29 movl $0, %eax
30 call printf
31
1 # div32.s
2 # divides two 32-bit integers and returns 32-bit quotient
3 # Bob Plantz - 27 June 2009
4
5 # Calling sequence
6 # esi <- divisor
7 # edi <- dividend
8 # call div32
9 # Code
10 .text
11 .globl div32
12 .type div32, @function
13 div32:
14 pushq %rbp # save base pointer
15 movq %rsp, %rbp # new base pointer
16
12 -5
1 # modulo.s
2 # Gets two 32-bit integers from user and computes remainder
3 # of the first divided by the second.
4 # Bob Plantz - 27 June 2009
5
6 # Stack frame
7 .equ divisor,-8
8 .equ dividend,-4
9 .equ localSize,-16
10 # Read only data
11 .section .rodata
12 prompt:
13 .asciz "Enter an integer (0 - 4294967295): "
14 printformat:
15 .asciz "%u mod %u = %u\n"
16 scanformat:
17 .asciz "%u"
18 # Code
19 .text
20 .globl main
21 .type main, @function
22 main:
23 pushq %rbp # save frame pointer
450 APPENDIX E. EXERCISE SOLUTIONS
27 # prompt user
28 movl $prompt, %edi # message address
29 movl $0, %eax
30 call printf
31
1 # mod32.s
2 # divides two 32-bit integers and returns 32-bit remainder
3 # Bob Plantz - 27 June 2009
4
5 # Calling sequence
6 # esi <- divisor
7 # edi <- dividend
8 # call div32
9 # Code
10 .text
11 .globl div32
12 .type div32, @function
13 div32:
14 pushq %rbp # save base pointer
E.12. BIT OPERATIONS; MULTIPLICATION AND DIVISION 451
12 -6
1 # decimal2unt.s
2 # Prompts the user to enter an integer in decimal, then converts
3 # it to int format.
4 # Bob Plantz - 27 June 2009
5
6 # Constant
7 .equ buffSize,12
8
9 # Stack frame
10 .equ buffer,-16
11 .equ theInt,-4
12 .equ localSize,-16
13
21 # Code
22 .text
23 .globl main
24 .type main, @function
25 main:
26 pushq %rbp # save frame pointer
27 movq %rsp, %rbp # new frame pointer
28 addq $localSize, %rsp # local vars.
29
1 # dec2uInt.s
2 # Converts string of numerals to decimal unisgned int
3 # Bob Plantz - 13 June 2009
4
5 # Calling sequence
6 # rsi <- address of place to store the int
7 # rdi <- address of string
8 # call dec2uInt
9 # returns 0
10 # Code
11 .text
12 .globl dec2uInt
13 .type dec2uInt, @function
14 dec2uInt:
15 pushq %rbp # save caller frame ptr
16 movq %rsp, %rbp # our stack frame
17
6 # useful constant
7 theConstant = 12345
8
9 # Stack frame
10 .equ theInt,-16
11 .equ buffer,-12
E.12. BIT OPERATIONS; MULTIPLICATION AND DIVISION 453
12 .equ localSize,-16
13 # Read only data
14 .section .rodata
15 prompt:
16 .asciz "Please enter an integer in decimal: "
17 msg:
18 .asciz "The result is: "
19 endl:
20 .asciz "\n"
21 # Code
22 .text
23 .globl main
24 .type main, @function
25 main:
26 pushq %rbp # save frame pointer
27 movq %rsp, %rbp # new frame pointer
28 addq $localSize, %rsp # local vars.
29
1 # uInt2dec.s
454 APPENDIX E. EXERCISE SOLUTIONS
5 # Calling sequence
6 # esi <- value of the int
7 # rdi <- address of string
8 # call uInt2dec
9 # returns zero
10
11 # Stack frame
12 .equ array,-12
13 .equ localSize,-16
14 # Read only data
15 .section .rodata
16 ten: .long 10
17 # Code
18 .text
19 .globl uInt2dec
20 .type uInt2dec, @function
21 uInt2dec:
22 pushq %rbp # save callers frame ptr
23 movq %rsp, %rbp # our stack frame
24 addq $localSize, %rsp # local vars.
25
39 copyLup:
40 cmpb $0, (%rcx) # NUL char?
41 je allDone # yes, copy it
42 movb (%rcx), %dl # get achar
43 movb %dl, (%rdi) # store it
44 incq %rdi # move pointers
45 decq %rcx
46 jmp copyLup # and check again
47
48 allDone:
49 movb (%rcx), %dl # get NUL char
50 movb %dl, (%rdi) # and store it
51 movl $0, %eax # return count;
52
12 -8
1 # addConstant2.s
2 # Prompts the user to enter an integer in decimal, converts
3 # it to int format, adds a constant, then displays result.
4 # Bob Plantz - 28 June 2009
5
6 # useful constant
7 theConstant = -12345
8
9 # Stack frame
10 .equ theInt,-16
11 .equ buffer,-12
12 .equ localSize,-16
13 # Read only data
14 .section .rodata
15 prompt:
16 .asciz "Please enter an integer in decimal: "
17 msg:
18 .asciz "The result is: "
19 endl:
20 .asciz "\n"
21 # Code
22 .text
23 .globl main
24 .type main, @function
25 main:
26 pushq %rbp # save frame pointer
27 movq %rsp, %rbp # new frame pointer
28 addq $localSize, %rsp # local vars.
29
57 call writeStr
58
1 # dec2sInt.s
2 # Converts string of numerals to decimal int, signed version
3 # Bob Plantz - 13 June 2009
4
5 # Calling sequence
6 # rsi <- address of place to store the int
7 # rdi <- address of string
8 # call dec2sInt
9 # returns 0
10
11 # Stack frame
12 .equ negFlag,-4
13 .equ localSize,-16
14 # Code
15 .text
16 .globl dec2sInt
17 .type dec2sInt, @function
18 dec2sInt:
19 pushq %rbp # save caller frame ptr
20 movq %rsp, %rbp # our stack frame
21 addq $localSize, %rsp # space for local var
22
1 # sInt2dec.s
E.13. DATA STRUCTURES 457
5 # Calling sequence
6 # esi <- value of the int
7 # rdi <- address of string
8 # call sInt2dec
9 # returns zero
10 # Code
11 .section .rodata
12 ten: .long 10
13
14 .text
15 .globl sInt2dec
16 .type sInt2dec, @function
17 sInt2dec:
18 pushq %rbp # save callers frame ptr
19 movq %rsp, %rbp # our stack frame
20
See above for uInt2dec and dec2uInt. See Section E.11 for writeStr and readLn.
20 .text
21 .globl main
22 .type main, @function
23 main:
24 pushq %rbp # save caller frame pointer
25 movq %rsp, %rbp # set our frame pointer
26 addq $localSize, %rsp # local variables
27 andq $-16, %rsp # 16-byte alignment
28 movq %rbx, rbxSave(%rbp) # save reg for OS
29
44 display:
45 movq $0, index(%rbp) # restart at beginning
46 displayLup:
47 movq index(%rbp), %rax # get index value
48 cmpq $nInts, %rax # any more?
49 jae done # no, all done
50
60 done:
61 movl $0, %eax # return 0;
62 movq rbxSave(%rbp), %rbx # restore reg
63 movq %rbp, %rsp # remove local vars
64 popq %rbp # restore caller frame ptr
65 ret # back to OS
1 # putInt.s
2 # writes a signed int to standard out
3 # Bob Plantz - 28 June 2009
4
5 # Calling sequence
6 # edi <- value of the int
7 # call putInt
8
9 # Stack frame
E.13. DATA STRUCTURES 459
10 .equ buffer,-12
11 .equ localSize,-16
12 # Code
13 .text
14 .globl putInt
15 .type putInt, @function
16 putInt:
17 pushq %rbp # save callers frame ptr
18 movq %rsp, %rbp # our stack frame
19 addq $localSize, %rsp # local vars.
20
See Section E.12 fot sInt2dec. See Section E.11 for writeStr.
13 -2
1 # sumInts.s
2 # Prompts user for 10 integers, stores them in an array, then
3 # displays their sum.
4 # Bob Plantz - 28 June 2009
5
32
51 sum:
52 movq $0, index(%rbp) # restart at beginning
53 movl $0, total(%rbp) # init total
54 sumLup:
55 cmpl $nInts, index(%rbp) # all summed?
56 jae display # yes, display total
57
65 display:
66 movl $msg, %edi # tell user about it
67 call writeStr
68
1 # getInt.s
2 # reads an int from standard in
3 # Bob Plantz - 28 June 2009
4
5 # Calling sequence
6 # rdi <- pointer where to store the int
7 # call getInt
E.13. DATA STRUCTURES 461
8 # returns 0
9
10 # Stack frame
11 .equ outPtr,-24
12 .equ buffer,-12
13 .equ localSize,-32
14 # Code
15 .text
16 .globl getInt
17 .type getInt, @function
18 getInt:
19 pushq %rbp # save callers frame ptr
20 movq %rsp, %rbp # our stack frame
21 addq $localSize, %rsp # local vars.
22
See above for putInt. See Section E.12 for dec2sInt See Section E.11 for writeStr and
readLn.
13 -3
1 # averageInts
2 # Prompts user for 10 integers, stores them in an array, then
3 # displays their average.
4 # Bob Plantz - 29 June 2009
5
24 .globl main
25 .type main, @function
26 main:
27 pushq %rbp # save caller frame pointer
28 movq %rsp, %rbp # set our frame pointer
29 addq $localSize, %rsp # local variables
30 andq $-16, %rsp # 16-byte boundary
31 movq %rbx, rbxSave(%rbp) # save reg for OS
32
51 sum:
52 movq $0, index(%rbp) # restart at beginning
53 movl $0, total(%rbp) # init total
54 sumLup:
55 cmpl $nInts, index(%rbp) # all summed?
56 jae display # yes, display total
57
65 display:
66 movl $msg, %edi # tell user about it
67 call writeStr
68
69
80 call putInt
81
See above for putInt and getInt. See Section E.11 for writeStr and readLn.
13 -8
1 # structFields.s
2 # Stores user input values in three structs and echoes them
3 # Bob Plantz - 28 June 2009
4
5 .include "structDef.h"
6 # Stack frame
7 .equ buffer,z-12
8 .equ z,y-structSize
9 .equ y,x-structSize
10 .equ x,-structSize
11 .equ localSize,buffer
12 # Read only data
13 .section .rodata
14 userPrompt:
15 .string "Enter data for the three structs.\n"
16 echoMsg:
17 .string "You entered:\n"
18 endl:
19 .string "\n"
20 # Code
21 .text
22 .globl main
23 .type main, @function
24 main:
25 pushq %rbp # save frame pointer
26 movq %rsp, %rbp # our frame pointer
27 addq $localSize, %rsp # local variables
28 andq $-16, %rsp # stack alignment
29
44 call writeStr
45
1 # structDef.h
2 # Defines the struct field offsets.
3 # Bob Plantz - 28 June 2009
4
5 # struct definition
6 .equ aChar,0
7 .equ anInt,4
8 .equ structSize,8
1 # getData.s
2 # Gets user input values and stores them in a struct.
3 # Bob Plantz - 28 June 2009
4 # Calling sequence:
5 # rdi <- address of struct
6 # call putData
7
8 .include "structDef.h"
9 # Useful constant
10 .equ STDOUT,1
11 # Stack frame
12 .equ structPtr,-32
13 .equ buffer,-2
14 .equ localSize,-32
15 # Read only data
16 .section .rodata
17 charPrompt:
18 .string "Enter a single character: "
19 intPrompt:
20 .string "Enter an integer: "
21 # Code
22 .text
23 .globl getData
24 .type getData, @function
25 getData:
26 pushq %rbp # save frame pointer
27 movq %rsp, %rbp # our frame pointer
28 addq $localSize, %rsp # local var. and arg.
E.13. DATA STRUCTURES 465
1 # putData.s
2 # Displays values stored in a struct.
3 # Bob Plantz - 28 June 2009
4 # Calling sequence:
5 # rdi <- address of struct
6 # call putData
7
8 .include "structDef.h"
9 # Useful constant
10 .equ STDOUT,1
11 # Stack frame
12 .equ structPtr,-16
13 .equ localSize,-16
14 # Read only data
15 .section .rodata
16 charMsg:
17 .string "The char is: "
18 intMsg:
19 .string "The int is: "
20 endl:
21 .string "\n"
22 # Code
23 .text
24 .globl putData
25 .type putData, @function
26 putData:
27 pushq %rbp # save frame pointer
28 movq %rsp, %rbp # our frame pointer
29 addq $localSize, %rsp # argument save area
30 movq %rdi, structPtr(%rbp) # save struct addr.
31
32
466 APPENDIX E. EXERCISE SOLUTIONS
See above for putInt and getInt. See Section E.11 for writeStr and readLn.
13 -9
1 # totalCost.s
2 # Gets names and prices for three items and shows total cost
3 # Bob Plantz - 29 June 2009
4
5 .include "item.h"
6 # Stack frame
7 .equ third,second-itemSize
8 .equ second,first-itemSize
9 .equ first,-itemSize
10 .equ localSize,third
11 # Read only data
12 .section .rodata
13 endl: .string "\n"
14 totalMsg:
15 .string "Their total cost is $"
16 # Code
17 .text
18 .globl main
19 .type main, @function
20 main:
21 pushq %rbp # save frame pointer
22 movq %rsp, %rbp # our frame pointer
23 addq $localSize, %rsp # local variables
24 andq $-16, %rsp # 16-byte boundary
25
32 call getItem
33
34 # display them
35 leaq first(%rbp), %rdi # address of first struct
36 call displayItem # displays the values
37 leaq second(%rbp), %rdi # address of second struct
38 call displayItem
39 leaq third(%rbp), %rdi # address of third struct
40 call displayItem
41
1 # item.h
2 # Fields and size of an item struct
3 # Bob Plantz - 29 June 2009
4
5 .equ name,0
6 .equ cost,52
7 .equ itemSize,56
1 # displayItem.s
2 # displays an item
3 # Bob Plantz - 29 June 2009
4
5 # Calling sequence
6 # rdi <- address of item struct
7 # call getItem
8 # returns void
9
10 .include "item.h"
11 # Stack frame
12 .equ structPtr,-16
13 .equ localSize,-16
14 # Read only data
15 .section .rodata
16 costMsg:
17 .string "Cost: $"
18 nameMsg:
468 APPENDIX E. EXERCISE SOLUTIONS
1 # getItem.s
2 # prompts user to enter an item name and it’s cost
3 # Bob Plantz - 29 June 2009
4
5 # Calling sequence
6 # rdi <- address of item struct
7 # call getItem
8 # returns void
9
10 .include "item.h"
11 # Stack frame
12 .equ structPtr,-16
13 .equ localSize,-16
14 # Read only data
15 .section .rodata
16 costMsg:
17 .string "Enter cost: $"
18 nameMsg:
19 .string "Name: "
E.13. DATA STRUCTURES 469
20 # Code
21 .text
22 .globl getItem
23 .type getItem, @function
24 getItem:
25 pushq %rbp # save caller’s frame ptr
26 movq %rsp, %rbp # our stack frame
27 addq $localSize, %rsp # local vars.
28 movq %rdi, structPtr(%rbp) # save arg.
29
See above for putInt and getInt. See Section E.11 for writeStr and readLn.
13 -10
1 # addInt2Frac.s
2 # creates a fraction and gets user values, then gets an
3 # integer from user and adds it to the fraction.
4 # Bob Plantz - 29 June 2009
5
6 .include "fraction.h"
7
8 # Stack frame
9 .equ anInt,x-4
10 .equ x,-fracSize
11 .equ localSize,anInt
12
28
13 -11
1 # addFrac2Frac.s
2 # creates two fractions and gets user values, then adds
3 # one to the other and displays the sum.
4 # Bob Plantz - 30 June 2009
5
6 .include "fraction.h"
7
8 # Stack frame
9 .equ y,x-fracSize
10 .equ x,-fracSize
11 .equ localSize,y
12 # Read only data
13 .section .rodata
14 prompt:
15 .string "Enter two fractions:\n"
16 msg:
17 .string "Their sum is:\n"
18 endl:
19 .string "\n"
20 # Code
21 .text
22 .globl main
23 .type main, @function
24 main:
25 pushq %rbp # save frame pointer
26 movq %rsp, %rbp # our frame pointer
27 addq $localSize, %rsp # local vars.
E.13. DATA STRUCTURES 471
1 # fractionsAdd.s
2 # adds a fraction to this fraction
3 # Bob Plantz - 30 June 2009
4
5 # Calling sequence
6 # rsi <- address of object to add
7 # rdi <- address of this object
8 # call fractionsAdd
9 # returns void
10
11 .include "fraction.h"
12 # Stack frame
13 .equ localFraction,-fracSize
14 .equ localSize,localFraction
15 # Code
16 .text
17 .globl fractionsAdd
18 .type fractionAdd, @function
19 fractionsAdd:
20 pushq %rbp # save frame pointer
21 movq %rsp, %rbp # our frame pointer
22 addq $localSize, %rsp # for object address
23 andq $-16, %rsp # align stack pointer
24 leaq localFraction(%rbp), %rcx # pointer to local fraction
25
27 mull den(%rdi)
28 movl %eax, den(%rcx) # new denominator
29
13 -12
1 # addressBook.s
2 # Allows up to MAX address cards to be stored.
3 # Bob Plantz - 30 June 2009
4
5 .include "cardDef.h"
6 # Set MAX for the maximum number of cards
7 .equ MAX,3
8 # Stack frame
9 .equ count,index-4
10 .equ index,cards-4
11 .equ cards,buffer-(MAX*cardSize)
12 .equ buffer,-32
13 .equ localSize,count
14 # Read only data
15 .section .rodata
16 prompt:
17 .string "Command (Add, Delete, Show, List, Quit): "
18 addMsg:
19 .string "Add new person.\n "
20 delMsg:
21 .string "Delete last person.\n"
22 showMsg:
23 .string "Your addresses:\n"
24 listMsg:
25 .string "Here is the entire array:\n"
26 fullMsg:
27 .string "Address book is full.\n"
28 emptyMsg:
29 .string "Address book is empty.\n"
30 endl:
31 .string "\n"
32 # Code
33 .text
34 .globl main
E.13. DATA STRUCTURES 473
55 doProg:
56 movl $0, count(%rbp) # no people yet
57 runLoop:
58 movl $prompt, %edi # tell user what to do
59 call writeStr
60
87 chkDel:
88 cmpb $’d’, buffer(%rbp) # check for delete command
89 jne chkShow
90 cmpl $0, count(%rbp) # anybody on the list?
474 APPENDIX E. EXERCISE SOLUTIONS
101 chkShow:
102 cmpb $’s’, buffer(%rbp) # check for show command
103 jne chkList
104 cmpl $0, count(%rbp) # anybody on the list?
105 ja doShow # yes, show them
106 movl $emptyMsg, %edi # no, tell user
107 call writeStr
108 jmp cont
109 doShow:
110 movl $showMsg, %edi # feedback for user
111 call writeStr
112
126 chkList:
127 cmpb $’l’, buffer(%rbp) # check for list command
128 jne cont
129 movl $listMsg, %edi # feedback for user
130 call writeStr
131
1 # cardDef.h
2 # Defines the address card field offsets.
3 # Bob Plantz - 30 June 2009
4
5 # card definition
6 .equ name,0
7 .equ address,name+48
8 .equ city,address+80
9 .equ state,city+24
10 .equ zip,state+20
11 .equ cardSize,zip+6
1 # card.s
2 # card object default constructor.
3 # Bob Plantz - 30 June 2009
4 # Calling sequence:
5 # rdi <- address of object
6 # call card
7 # returns void
8
9 .include "cardDef.h"
10 # Stack frame
11 .equ thisPtr,-16
12 .equ localSize,-16
13 # Read only data
14 .section .rodata
15 nameDefault:
16 .string "J. Doe"
17 addressDefault:
18 .string "123 Main St."
19 cityDefault:
20 .string "Middle Town"
21 stateDefault:
22 .string "Kansas"
23 zipDefault:
24 .string "12345"
25 # Code
26 .text
27 .globl card
28 .type card, @function
29 card:
30 pushq %rbp # save frame pointer
31 movq %rsp, %rbp # our frame pointer
32 addq $localSize, %rsp # for saving argument
33 movq %rdi, thisPtr(%rbp) # save it
34
41
1 # copyStr.s
2 # Copies a C-style text string.
3 #
4 # Calling sequence:
5 # rdx <- address of source
6 # rsi <- address of destination
7 # edi <- maximum length to copy (including NULL)
8 # call copyStr
9 # returns number of chars copied, not including NULL.
10 # assumes maximum length is at least 1.
11 # Bob Plantz - 30 June 2009
12
13 # Code
14 .text
15 .globl copyStr
16 .type copyStr, @function
17 copyStr:
18 pushq %rbp # save frame pointer
19 movq %rsp, %rbp # our frame pointer
20
1 # cardGet.s
2 # Gets user input values and stores them in a card object.
3 #
4 # Calling sequence:
5 # rdi <- address of object
6 # call cardGet
7 # Bob Plantz - 30 June 2009
8
9 .include "cardDef.h"
10 # Stack frame
11 .equ thisPtr,-16
12 .equ localSize,-16
13 # Read only data
14 .section .rodata
15 Prompt:
16 .string "Enter the data\n"
17 namePrompt:
18 .string " name: "
19 addressPrompt:
20 .string " address: "
21 cityPrompt:
22 .string " city: "
23 statePrompt:
24 .string " state: "
25 zipPrompt:
26 .string " zip code: "
27 # Code
28 .text
29 .globl cardGet
30 .type cardGet, @function
31 cardGet:
32 pushq %rbp # save frame pointer
33 movq %rsp, %rbp # our frame pointer
34 addq $localSize, %rsp # local vars.
35 movq %rdi, thisPtr(%rbp) # address of object
36
1 # cardPut.s
2 # Displays a card object.
3 #
4 # Calling sequence:
5 # rdi <- address of object
6 # call cardPut
7 # Bob Plantz - 30 June 2009
8
9 .include "cardDef.h"
10 # Stack frame
11 .equ thisPtr,-16
12 .equ localSize,-16
13 # Read only data
14 .section .rodata
15 Msg:
16 .string "*** Address Card ***\n"
17 nameMsg:
18 .string " name: "
19 addressMsg:
20 .string " address: "
E.13. DATA STRUCTURES 479
21 cityMsg:
22 .string " city: "
23 stateMsg:
24 .string " state: "
25 zipMsg:
26 .string " zip code: "
27 endl:
28 .string "\n"
29 # Code
30 .text
31 .globl cardPut
32 .type cardPut, @function
33 cardPut:
34 pushq %rbp # save frame pointer
35 movq %rsp, %rbp # our frame pointer
36 addq $localSize, %rsp # local vars.
37 movq %rdi, thisPtr(%rbp) # address of object
38
7 #include <stdio.h>
8
9 int main()
10 {
11 float number;
12 int counter = 10;
13
14 number = 0.5;
15 while ((number != 0.0) && (counter != 0))
16 {
17 printf("number = %f and counter = %i\n", number, counter);
18
23 return 0;
24 }
14 -2
1 /*
2 * floatRoundoff.c
3 * shows the effects of adding a small float to a large one.
4 * Bob Plantz - 1 July 2009
5 */
6
7 #include <stdio.h>
8
9 int main()
10 {
11 float fNumber = 2147483646.0;
12 int iNumber = 2147483646;
13
15 fNumber, iNumber);
16 fNumber += 1.0;
17 iNumber += 1;
18 printf("After adding 1 the float is %f and the integer is %i\n",
19 fNumber, iNumber);
20
21 return 0;
22 }
14 -5 The following program is provided for you to work with these conversions.
1 /*
2 * float2hex.c
3 * allows user to see bit pattern of a float
4 * Bob Plantz - 1 July 2009
5 */
6
7 #include <stdio.h>
8
9 int main()
10 {
11 float number;
12 unsigned int *ptr = (unsigned int *)&number;
13 char ans[50];
14
15 *ans = ’y’;
16 while ((*ans == ’y’) || (*ans == ’Y’))
17 {
18 printf("Enter a decimal number: ");
19 scanf("%f", &number);
20 printf("%f => %#0x\n", number, *ptr);
21
26 return 0;
27 }
a) 3f800000 e) c5435500
b) bdcccccd
f) 3ea8f5c3
c) 44faa000
d) 3b800000 g) 4048f5c3
14 -6 The following program is provided for you to work with these conversion.
1 /*
2 * hex2float.c
3 * converts hex pattern to float
4 * Bob Plantz - 1 July 2009
5 */
6
7 #include <stdio.h>
8
9 int main()
10 {
482 APPENDIX E. EXERCISE SOLUTIONS
15 *ans = ’y’;
16 while ((*ans == ’y’) || (*ans == ’Y’))
17 {
18 printf("Enter a hex number: ");
19 scanf("%x", &number);
20 printf("%#0x => %f\n", number, *ptr);
21
26 return 0;
27 }
a) +2.0 e) 100.03125
b) -1.0 f) 1.2
c) +0.0625 g) 123.449997
d) -16.03125 h) -54.320999
14 -7 The bit pattern for +2.0 is 01000...0. Because IEEE 754 uses a biased exponent for-
mat, all the floating point numbers in the range 0.0 – +2.0 are within the bit pattern
range 00000...0 – 01000...0. So half the positive floating point numbers are in the range
00000...0 – 00111...0, and the other half in the range 01000...0 – 01111...1.
The same argument applies to the negative floating point numbers.
14 -8
1 .file "casting.c"
2 .section .rodata
3 .LC0:
4 .string "Enter an integer: "
5 .LC1:
6 .string "%i"
7 .LC3:
8 .string "%i + %lf = %lf\n"
9 .text
10 .globl main
11 .type main, @function
12 main:
13 pushq %rbp
14 movq %rsp, %rbp
15 subq $48, %rsp
16 movl $.LC0, %edi
17 movl $0, %eax
18 call printf
19 leaq -4(%rbp), %rsi
20 movl $.LC1, %edi
21 movl $0, %eax
22 call scanf
23 movabsq $4608218246714312622, %rax # y = 1.23;
24 movq %rax, -16(%rbp) # store x
25 movl -4(%rbp), %eax # load x
E.15. INTERRUPTS AND EXCEPTIONS 483
6 # Useful constants
7 .equ STDIN,0
8 .equ STDOUT,1
9 .equ theArg,8
10 # from asm/unistd_64.h
11 .equ READ,0
12 .equ WRITE,1
13 .equ OPEN,2
14 .equ CLOSE,3
15 .equ EXIT,60
16 # from bits/fcntl.h
17 .equ O_RDONLY,0
18 .equ O_WRONLY,1
19 .equ O_RDWR,3
20 # Stack frame
21 .equ aLetter,-16
22 .equ fd, -8
23 .equ localSize,-16
24 # Code
25 .text # switch to text segment
26 .globl main
27 .type main, @function
28 main:
29 pushq %rbp # save caller’s frame pointer
30 movq %rsp, %rbp # establish our frame pointer
31 addq $localSize, %rsp # for local variable
32
45 writeLoop:
46 cmpl $0, %eax # any chars?
47 je allDone # no, must be end of file
48 movl $1, %edx # yes, 1 character
49 leaq aLetter(%rbp), %rsi # place to store character
50 movl $STDOUT, %edi # standard out
51 movl $WRITE, %eax
52 syscall # request kernel service
53
[1] Peter Abel. IBM PC Assembly Language and Programming, Fifth Edition. Prentice-Hall,
2001
[4] AMD64 Architecture Programmer’s Manual, Volume 3: General-Purpose and System In-
structions Programming; http://developer.amd.com/devguides.jsp
[6] AMD64 Architecture Programmer’s Manual, Volume 5: 64-Bit Media and x87 Floating-
Point Instructions; http://developer.amd.com/devguides.jsp
[7] Jonathan Bartlett. Programming from the Ground Up. Bartlett Publishing, 2004
[8] Barry B. Brey. The Intel Microprocessors, Fifth Edition. Prentice Hall, 2000
[9] Randal E. Bryant and David R. O’Hallaron. Computer Systems. Prentice Hall, 2003
[11] Richard C. Detmer. Introduction to 80x86 Assembly Language and Computer Architecture.
Jones and Bartlett Publishers, 2001
[12] Jeff Duntemann. Assembly Language Step-By-Step: Programming with DOS and Linux,
Second Edition. John Wiley & Sons, 2000
[14] IA-32 Intel® 64 and IA-32 Architecture Software Developer’s Manual, Volume 1: Basic Ar-
chitecture; http://www.intel.com/products/processor/manuals/index.htm
[15] IA-32 Intel® 64 and IA-32 Architecture Software Developer’s Manual, Volume 2A: Instruc-
tion Set Reference A-M; http://www.intel.com/products/processor/manuals/index.htm
[16] IA-32 Intel® 64 and IA-32 Architecture Software Developer’s Manual, Volume 2B: Instruc-
tion Set Reference N-Z; http://www.intel.com/products/processor/manuals/index.htm
[17] IA-32 Intel® 64 and IA-32 Architecture Software Developer’s Manual, Volume 3A: System
Programming Guide; http://www.intel.com/products/processor/manuals/index.htm
[18] IA-32 Intel® 64 and IA-32 Architecture Software Developer’s Manual, Volume 3B: System
Programming Guide; http://www.intel.com/products/processor/manuals/index.htm
485
486 BIBLIOGRAPHY
[19] Kip R. Irvine. Assembly Language for Intel-Based Computers, Fourth Edition. Prentice
Hall, 2003
[20] Bruce F. Katz. Digital Design: From Gates to Intelligent Machines. Da Vinci Engineering
Press, 2006
[21] John R. Levine. Linkers & Loaders. Elsevier Science & Technology Books, 1999
[22] Mike Loukides and Andy Oram. Programming with GNU Software. O’Reilly, 1997
[23] M. Morris Mano. Digital Design, Third Edition. Prentice Hall, 2002
[24] Alan B. Marcovitz. Introduction to Logic Design, Second Edition. McGraw-Hill, 2005
[25] Michael Matz, Jan Hubicka, Andreas Jaeger, and Mark Mitchell. System V Application
Binary Interface AMD64 Architecture Processor Supplement, Draft Version 0.99, December
7, 2007; http://www.x86-64.org/documentation.html
[26] Merriam-Webster’s Online Dictionary; http://m-w.com
[27] Bob Neveln. Linux Assembly Language Programming. Prentice Hall, 2000
[28] David A. Patterson and John L. Hennessy. Computer Organization and Design, Third Edi-
tion. Morgan Kaufmann, 2005
[29] Richard M. Stallman, Roland Pesch, Stan Shebs, et al. Debugging with GDB. GNU Press,
2003
[30] Richard M. Stallman and Roland McGrath. GNU Make. GNU Press, 2002
[31] William Stallings. Computer Organization & Architecture: Designing for Performance,
Sixth Edition. Prentice Hall, 2002
[32] Bjarne Stroustrup. The Design and Evolution of C++. Addison-Wesley, 1994
[33] System V Application Binary Interface, Intel386™ Architecture Processor Support, Fourth
Edition, The SCO Group, 1997; http://www.sco.com/developers/devspecs/
[34] Andrew S. Tanenbaum. Structured Computer Organization, Fifth Edition. Prentice Hall,
2006
[35] John von Neumann. First Draft of a Report on the EDVAC Moore School of Electrical Engi-
neering, University of Pennsylvania, 1945
Index
487
488 INDEX
circuit resistance, 70
combinational, 82 resistor, 70
clock, 95 series, 71
clock generator, 95 time constant, 72
clock pulses, 95 transient, 70
COBOL, 50 voltage, 69
comment field, 137 voltage level, 70
comment line, 136 volts, 69
compile, 133 watt, 69
compiler-generated label, 143 ELF, 137
complement, 56 ELF:section, 137
condition codes, 121 ELF:segment, 138
control characters, 20 endian
Control Unit, 118 big, 19
control unit, 6 little, 19, 128
convert exception processing cycle, 345
binary to decimal, 9 Executable and Linking Format, 137
binary to signed decimal, 37
hexadecimal to decimal, 9 finite state machine, 94
signed decimal-to-binary, 38 fixed point, 320
unsigned decimal to binary, 9 Flags Register, 118
CPU, 3, 116 flip-flop
block diagram, 117 D, 100
overview, 116 JK, 102
current, 69 T, 102
floating point, 321
data errors, 321
storing in memory, 12 extended format, 330
data types, 12 fpn registers, 330
debugger, 16 limitation, 324
decimal fractions, 319 range, 322
decoder, 86 stack, 331
DeMorgan’s Law, 58 x87, 326
device handler, 355 fractional values, 319
division, 280 FSM, 94
D latch, 99 function
do-while, 221 called, 251
don’t care, 69 calling, 250
DRAM, 114 designing, 173
epilogue, 140
effective address, 167 prologue, 140
electronics, 69 writing, 176
AC, 70 functions
amp, 69 32-bit mode, 251
ampere, 69 64-bit mode, 242
battery, 70
capacitance, 70 gate
capacitor, 72 AND, 55
coulomb, 69 NAND, 77
DC, 70 NOR, 77
direct current, 70 NOT, 56
inductance, 70 OR, 56
inductor, 73 XOR, 68
ohms, 70 gate descriptor, 343
parallel, 71 gdb, 16
passive elements, 70 commands, 16, 126, 378
power supply, 70 Gray code, 50
INDEX 489
unistd.h, 22
variable
automatic, 168
static, 168
variable argument list, 241
variables